CXL Integration Complete Guide

1. What is CXL?

What is Compute Express Link?

CXL (Compute Express Link) is an open standard for high-speed, low-latency interconnect between CPUs and devices. It runs over the PCIe physical layer but adds cache coherence and memory protocols.

CXL Protocol Stack

    ┌─────────────────────────────────────────────────────────────────┐
    │                    CXL Protocol Layers                         │
    ├─────────────────────────────────────────────────────────────────┤
    │  CXL.io         │  CXL.cache       │  CXL.mem                  │
    │  (PCIe TLP)     │  (Coherence)     │  (Memory Semantics)       │
    │  Config, MMIO   │  Device cache    │  Memory access            │
    │  DMA, Interrupt │  CPU snoop       │  Load/Store               │
    ├─────────────────────────────────────────────────────────────────┤
    │                    PCIe Physical Layer                         │
    │           (Same electrical, encoding, LTSSM)                   │
    └─────────────────────────────────────────────────────────────────┘

2. CXL Device Types

Type	Protocols	Description	Examples
Type 1	CXL.io + CXL.cache	Caching accelerator	Smart NIC, FPGA
Type 2	CXL.io + CXL.cache + CXL.mem	Accelerator with memory	GPU, AI accelerator
Type 3	CXL.io + CXL.mem	Memory expander	Memory buffer, PMem

Device Type Architecture

    Type 1 (Caching Accelerator):
    ┌───────────────────────────────────────┐
    │    Device with coherent cache         │
    │   ┌─────────────┐ ┌─────────────┐    │
    │   │  Compute    │ │   Cache     │    │
    │   │  Logic      │ │ (coherent)  │    │
    │   └─────────────┘ └─────────────┘    │
    │              CXL.io + CXL.cache       │
    └───────────────────────────────────────┘
    
    Type 2 (Accelerator with Memory):
    ┌───────────────────────────────────────┐
    │    Device with memory                 │
    │   ┌─────────────┐ ┌─────────────┐    │
    │   │  Compute    │ │   Memory    │    │
    │   │  (GPU/AI)   │ │   (HBM)     │    │
    │   └─────────────┘ └─────────────┘    │
    │        CXL.io + CXL.cache + CXL.mem  │
    └───────────────────────────────────────┘
    
    Type 3 (Memory Expander):
    ┌───────────────────────────────────────┐
    │    Memory-only device                 │
    │   ┌───────────────────────────────┐  │
    │   │        DDR5 / HBM / PMem      │  │
    │   │        (expandable memory)    │  │
    │   └───────────────────────────────┘  │
    │              CXL.io + CXL.mem        │
    └───────────────────────────────────────┘

3. CXL.io (PCIe Compatible)

CXL.io is essentially PCIe Transaction Layer Protocol running over CXL:

Configuration space access
MMIO (Memory-Mapped I/O)
DMA operations
Interrupts (MSI-X)

PCIe vs CXL.io

Aspect	PCIe	CXL.io
Physical Layer	PCIe PHY	Same PCIe PHY
Transaction Layer	PCIe TLP	PCIe TLP (identical)
Flit Mode	Optional (6.0+)	Required
Link Training	PCIe LTSSM	PCIe LTSSM + CXL negotiation

4. CXL.cache (Coherence)

What is CXL.cache?

CXL.cache enables devices to cache host memory coherently. The device can request cache lines from host memory, and the host can snoop device caches.

Coherence Protocol

    Host CPU                                    CXL Device
    ┌─────────────┐                          ┌─────────────┐
    │    Cache    │◄────── Snoop ────────────│    Cache    │
    │  (coherent) │─────── Response ────────►│  (coherent) │
    └─────────────┘                          └─────────────┘
          │                                        │
          │       CXL.cache Messages:              │
          │       - D2H Request (device→host)      │
          │       - D2H Response                   │
          │       - H2D Request (host→device)      │
          │       - H2D Response                   │
          │                                        │
    ┌─────▼─────┐                          ┌──────▼──────┐
    │   Host    │                          │   Device    │
    │  Memory   │                          │   Memory    │
    └───────────┘                          │  (if any)   │
                                           └─────────────┘

5. CXL.mem (Memory Protocol)

What is CXL.mem?

CXL.mem provides memory semantics, allowing the host CPU to access device-attached memory with load/store operations as if it were local memory.

Memory Access Flow

    CPU Load Instruction:
    
    CPU Core ─── Load 0xCXL_ADDR ──► Memory Controller
                                            │
                                    Is address in CXL range?
                                            │
                                           Yes
                                            │
                                            ▼
                               ┌─── CXL Root Port ───┐
                               │  M2S Req (Read)     │
                               │  ───────────────────►
                               │                     │
                               │  S2M Data           │
                               │  ◄───────────────────
                               └─────────────────────┘
                                            │
                                            ▼
                               ┌─── CXL Memory ──────┐
                               │    Device           │
                               │   (Type 3)          │
                               └─────────────────────┘

CXL.mem Messages

Direction	Message Type	Description
M2S (Host→Device)	MemRd	Memory read request
M2S	MemWr	Memory write request
S2M (Device→Host)	MemData	Memory read data response
S2M	Cmp	Completion (write ack)

6. CXL over PCIe 7.0

Shared Physical Layer

Speed: 128 GT/s (same as PCIe 7.0)
Encoding: PAM4, 1b/1b, same as PCIe 6.0+
Flit Mode: 256-byte Flits (mandatory for CXL)
FEC: Required for reliable operation

Protocol Negotiation

    Link Training:
    
    1. Standard PCIe LTSSM (Detect → Polling → Config)
    2. CXL capability exchange (in TS1/TS2)
    3. Determine device type (PCIe-only or CXL)
    4. If CXL: Enable CXL.io, optionally CXL.cache/mem
    5. Enter L0 with appropriate protocols active

7. System Architecture

CXL Memory Pooling

    ┌─────────────────────────────────────────────────────────────────┐
    │                     CXL Memory Pool                            │
    │                                                                 │
    │   ┌───────────┐   ┌───────────┐   ┌───────────┐               │
    │   │ CXL Mem 1 │   │ CXL Mem 2 │   │ CXL Mem 3 │               │
    │   │  (512GB)  │   │  (512GB)  │   │  (1TB)    │               │
    │   └─────┬─────┘   └─────┬─────┘   └─────┬─────┘               │
    │         │               │               │                      │
    │         └───────────────┼───────────────┘                      │
    │                         │                                      │
    │                   CXL Switch                                   │
    │                         │                                      │
    └─────────────────────────┼──────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          │                   │                   │
    ┌─────▼─────┐       ┌─────▼─────┐       ┌─────▼─────┐
    │  Server 1 │       │  Server 2 │       │  Server 3 │
    │  (256GB   │       │  (256GB   │       │  (256GB   │
    │   local)  │       │   local)  │       │   local)  │
    └───────────┘       └───────────┘       └───────────┘
    
    Each server sees: Local DDR + Portion of CXL Pool

CXL Benefits

CXL enables memory disaggregation, allowing flexible memory allocation across servers and reducing stranded memory capacity.

8. PCIe vs CXL Decision

Use Case	PCIe	CXL
NVMe Storage	Ideal	Overkill
Network Card	Good	Type 1 for coherent NIC
GPU	Traditional	Type 2 for unified memory
Memory Expansion	Not possible	Type 3 (primary use)
AI Accelerator	Works	Type 2 preferred