Compute Express Link: cache coherent memory expansion over PCIe physical layer
CXL (Compute Express Link) is an open standard for high-speed, low-latency interconnect between CPUs and devices. It runs over the PCIe physical layer but adds cache coherence and memory protocols.
┌─────────────────────────────────────────────────────────────────┐
│ CXL Protocol Layers │
├─────────────────────────────────────────────────────────────────┤
│ CXL.io │ CXL.cache │ CXL.mem │
│ (PCIe TLP) │ (Coherence) │ (Memory Semantics) │
│ Config, MMIO │ Device cache │ Memory access │
│ DMA, Interrupt │ CPU snoop │ Load/Store │
├─────────────────────────────────────────────────────────────────┤
│ PCIe Physical Layer │
│ (Same electrical, encoding, LTSSM) │
└─────────────────────────────────────────────────────────────────┘
| Type | Protocols | Description | Examples |
|---|---|---|---|
| Type 1 | CXL.io + CXL.cache | Caching accelerator | Smart NIC, FPGA |
| Type 2 | CXL.io + CXL.cache + CXL.mem | Accelerator with memory | GPU, AI accelerator |
| Type 3 | CXL.io + CXL.mem | Memory expander | Memory buffer, PMem |
Type 1 (Caching Accelerator):
┌───────────────────────────────────────┐
│ Device with coherent cache │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Compute │ │ Cache │ │
│ │ Logic │ │ (coherent) │ │
│ └─────────────┘ └─────────────┘ │
│ CXL.io + CXL.cache │
└───────────────────────────────────────┘
Type 2 (Accelerator with Memory):
┌───────────────────────────────────────┐
│ Device with memory │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Compute │ │ Memory │ │
│ │ (GPU/AI) │ │ (HBM) │ │
│ └─────────────┘ └─────────────┘ │
│ CXL.io + CXL.cache + CXL.mem │
└───────────────────────────────────────┘
Type 3 (Memory Expander):
┌───────────────────────────────────────┐
│ Memory-only device │
│ ┌───────────────────────────────┐ │
│ │ DDR5 / HBM / PMem │ │
│ │ (expandable memory) │ │
│ └───────────────────────────────┘ │
│ CXL.io + CXL.mem │
└───────────────────────────────────────┘
CXL.io is essentially PCIe Transaction Layer Protocol running over CXL:
| Aspect | PCIe | CXL.io |
|---|---|---|
| Physical Layer | PCIe PHY | Same PCIe PHY |
| Transaction Layer | PCIe TLP | PCIe TLP (identical) |
| Flit Mode | Optional (6.0+) | Required |
| Link Training | PCIe LTSSM | PCIe LTSSM + CXL negotiation |
CXL.cache enables devices to cache host memory coherently. The device can request cache lines from host memory, and the host can snoop device caches.
Host CPU CXL Device
┌─────────────┐ ┌─────────────┐
│ Cache │◄────── Snoop ────────────│ Cache │
│ (coherent) │─────── Response ────────►│ (coherent) │
└─────────────┘ └─────────────┘
│ │
│ CXL.cache Messages: │
│ - D2H Request (device→host) │
│ - D2H Response │
│ - H2D Request (host→device) │
│ - H2D Response │
│ │
┌─────▼─────┐ ┌──────▼──────┐
│ Host │ │ Device │
│ Memory │ │ Memory │
└───────────┘ │ (if any) │
└─────────────┘
CXL.mem provides memory semantics, allowing the host CPU to access device-attached memory with load/store operations as if it were local memory.
CPU Load Instruction:
CPU Core ─── Load 0xCXL_ADDR ──► Memory Controller
│
Is address in CXL range?
│
Yes
│
▼
┌─── CXL Root Port ───┐
│ M2S Req (Read) │
│ ───────────────────►
│ │
│ S2M Data │
│ ◄───────────────────
└─────────────────────┘
│
▼
┌─── CXL Memory ──────┐
│ Device │
│ (Type 3) │
└─────────────────────┘
| Direction | Message Type | Description |
|---|---|---|
| M2S (Host→Device) | MemRd | Memory read request |
| M2S | MemWr | Memory write request |
| S2M (Device→Host) | MemData | Memory read data response |
| S2M | Cmp | Completion (write ack) |
Link Training:
1. Standard PCIe LTSSM (Detect → Polling → Config)
2. CXL capability exchange (in TS1/TS2)
3. Determine device type (PCIe-only or CXL)
4. If CXL: Enable CXL.io, optionally CXL.cache/mem
5. Enter L0 with appropriate protocols active
┌─────────────────────────────────────────────────────────────────┐
│ CXL Memory Pool │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ CXL Mem 1 │ │ CXL Mem 2 │ │ CXL Mem 3 │ │
│ │ (512GB) │ │ (512GB) │ │ (1TB) │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │
│ └───────────────┼───────────────┘ │
│ │ │
│ CXL Switch │
│ │ │
└─────────────────────────┼──────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Server 1 │ │ Server 2 │ │ Server 3 │
│ (256GB │ │ (256GB │ │ (256GB │
│ local) │ │ local) │ │ local) │
└───────────┘ └───────────┘ └───────────┘
Each server sees: Local DDR + Portion of CXL Pool
CXL enables memory disaggregation, allowing flexible memory allocation across servers and reducing stranded memory capacity.
| Use Case | PCIe | CXL |
|---|---|---|
| NVMe Storage | Ideal | Overkill |
| Network Card | Good | Type 1 for coherent NIC |
| GPU | Traditional | Type 2 for unified memory |
| Memory Expansion | Not possible | Type 3 (primary use) |
| AI Accelerator | Works | Type 2 preferred |