MEMORY EXPANSION

CXL Integration with PCIe

Compute Express Link: cache coherent memory expansion over PCIe physical layer

1. What is CXL?

What is Compute Express Link?

CXL (Compute Express Link) is an open standard for high-speed, low-latency interconnect between CPUs and devices. It runs over the PCIe physical layer but adds cache coherence and memory protocols.

CXL Protocol Stack

    ┌─────────────────────────────────────────────────────────────────┐
    │                    CXL Protocol Layers                         │
    ├─────────────────────────────────────────────────────────────────┤
    │  CXL.io         │  CXL.cache       │  CXL.mem                  │
    │  (PCIe TLP)     │  (Coherence)     │  (Memory Semantics)       │
    │  Config, MMIO   │  Device cache    │  Memory access            │
    │  DMA, Interrupt │  CPU snoop       │  Load/Store               │
    ├─────────────────────────────────────────────────────────────────┤
    │                    PCIe Physical Layer                         │
    │           (Same electrical, encoding, LTSSM)                   │
    └─────────────────────────────────────────────────────────────────┘

2. CXL Device Types

Type Protocols Description Examples
Type 1 CXL.io + CXL.cache Caching accelerator Smart NIC, FPGA
Type 2 CXL.io + CXL.cache + CXL.mem Accelerator with memory GPU, AI accelerator
Type 3 CXL.io + CXL.mem Memory expander Memory buffer, PMem

Device Type Architecture

    Type 1 (Caching Accelerator):
    ┌───────────────────────────────────────┐
    │    Device with coherent cache         │
    │   ┌─────────────┐ ┌─────────────┐    │
    │   │  Compute    │ │   Cache     │    │
    │   │  Logic      │ │ (coherent)  │    │
    │   └─────────────┘ └─────────────┘    │
    │              CXL.io + CXL.cache       │
    └───────────────────────────────────────┘
    
    Type 2 (Accelerator with Memory):
    ┌───────────────────────────────────────┐
    │    Device with memory                 │
    │   ┌─────────────┐ ┌─────────────┐    │
    │   │  Compute    │ │   Memory    │    │
    │   │  (GPU/AI)   │ │   (HBM)     │    │
    │   └─────────────┘ └─────────────┘    │
    │        CXL.io + CXL.cache + CXL.mem  │
    └───────────────────────────────────────┘
    
    Type 3 (Memory Expander):
    ┌───────────────────────────────────────┐
    │    Memory-only device                 │
    │   ┌───────────────────────────────┐  │
    │   │        DDR5 / HBM / PMem      │  │
    │   │        (expandable memory)    │  │
    │   └───────────────────────────────┘  │
    │              CXL.io + CXL.mem        │
    └───────────────────────────────────────┘

3. CXL.io (PCIe Compatible)

CXL.io is essentially PCIe Transaction Layer Protocol running over CXL:

PCIe vs CXL.io

Aspect PCIe CXL.io
Physical Layer PCIe PHY Same PCIe PHY
Transaction Layer PCIe TLP PCIe TLP (identical)
Flit Mode Optional (6.0+) Required
Link Training PCIe LTSSM PCIe LTSSM + CXL negotiation

4. CXL.cache (Coherence)

What is CXL.cache?

CXL.cache enables devices to cache host memory coherently. The device can request cache lines from host memory, and the host can snoop device caches.

Coherence Protocol

    Host CPU                                    CXL Device
    ┌─────────────┐                          ┌─────────────┐
    │    Cache    │◄────── Snoop ────────────│    Cache    │
    │  (coherent) │─────── Response ────────►│  (coherent) │
    └─────────────┘                          └─────────────┘
          │                                        │
          │       CXL.cache Messages:              │
          │       - D2H Request (device→host)      │
          │       - D2H Response                   │
          │       - H2D Request (host→device)      │
          │       - H2D Response                   │
          │                                        │
    ┌─────▼─────┐                          ┌──────▼──────┐
    │   Host    │                          │   Device    │
    │  Memory   │                          │   Memory    │
    └───────────┘                          │  (if any)   │
                                           └─────────────┘

5. CXL.mem (Memory Protocol)

What is CXL.mem?

CXL.mem provides memory semantics, allowing the host CPU to access device-attached memory with load/store operations as if it were local memory.

Memory Access Flow

    CPU Load Instruction:
    
    CPU Core ─── Load 0xCXL_ADDR ──► Memory Controller
                                            │
                                    Is address in CXL range?
                                            │
                                           Yes
                                            │
                                            ▼
                               ┌─── CXL Root Port ───┐
                               │  M2S Req (Read)     │
                               │  ───────────────────►
                               │                     │
                               │  S2M Data           │
                               │  ◄───────────────────
                               └─────────────────────┘
                                            │
                                            ▼
                               ┌─── CXL Memory ──────┐
                               │    Device           │
                               │   (Type 3)          │
                               └─────────────────────┘

CXL.mem Messages

Direction Message Type Description
M2S (Host→Device) MemRd Memory read request
M2S MemWr Memory write request
S2M (Device→Host) MemData Memory read data response
S2M Cmp Completion (write ack)

6. CXL over PCIe 7.0

Shared Physical Layer

Protocol Negotiation

    Link Training:
    
    1. Standard PCIe LTSSM (Detect → Polling → Config)
    2. CXL capability exchange (in TS1/TS2)
    3. Determine device type (PCIe-only or CXL)
    4. If CXL: Enable CXL.io, optionally CXL.cache/mem
    5. Enter L0 with appropriate protocols active

7. System Architecture

CXL Memory Pooling

    ┌─────────────────────────────────────────────────────────────────┐
    │                     CXL Memory Pool                            │
    │                                                                 │
    │   ┌───────────┐   ┌───────────┐   ┌───────────┐               │
    │   │ CXL Mem 1 │   │ CXL Mem 2 │   │ CXL Mem 3 │               │
    │   │  (512GB)  │   │  (512GB)  │   │  (1TB)    │               │
    │   └─────┬─────┘   └─────┬─────┘   └─────┬─────┘               │
    │         │               │               │                      │
    │         └───────────────┼───────────────┘                      │
    │                         │                                      │
    │                   CXL Switch                                   │
    │                         │                                      │
    └─────────────────────────┼──────────────────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          │                   │                   │
    ┌─────▼─────┐       ┌─────▼─────┐       ┌─────▼─────┐
    │  Server 1 │       │  Server 2 │       │  Server 3 │
    │  (256GB   │       │  (256GB   │       │  (256GB   │
    │   local)  │       │   local)  │       │   local)  │
    └───────────┘       └───────────┘       └───────────┘
    
    Each server sees: Local DDR + Portion of CXL Pool
CXL Benefits

CXL enables memory disaggregation, allowing flexible memory allocation across servers and reducing stranded memory capacity.

8. PCIe vs CXL Decision

Use Case PCIe CXL
NVMe Storage Ideal Overkill
Network Card Good Type 1 for coherent NIC
GPU Traditional Type 2 for unified memory
Memory Expansion Not possible Type 3 (primary use)
AI Accelerator Works Type 2 preferred