PCIe 6.0+ / CXL

Deferrable Memory (DfMRd)

Non-blocking memory reads for high-latency memory access with latency hiding

1. What is Deferrable Memory?

What is Deferrable Memory Read?

Deferrable Memory Read (DfMRd) is a PCIe 6.0+ transaction type that allows a completer to acknowledge a read request immediately and deliver the actual data later. This enables non-blocking access to high-latency memory.

Standard MRd vs DfMRd

    Standard Memory Read:
    
    Requester                           Completer
        │                                   │
        │ ─── MRd (blocking) ──────────────►│
        │                                   │ Wait for data
        │     (Requester blocked)           │ (high latency)
        │                                   │
        │ ◄── CplD (data) ──────────────────│
        │                                   │
    Total latency experienced by requester
    
    Deferrable Memory Read:
    
    Requester                           Completer
        │                                   │
        │ ─── DfMRd ───────────────────────►│
        │                                   │
        │ ◄── DfCpl (Deferred) ─────────────│  Immediate!
        │                                   │
        │ (Requester continues other work)  │  Fetch data
        │                                   │  (high latency)
        │ ◄── DfCplD (actual data) ─────────│
        │                                   │
    Requester not blocked during data fetch

2. Why Deferrable Memory?

Why defer memory reads?

Emerging memory technologies (CXL memory, persistent memory, far memory) have higher latency than local DRAM. Deferrable memory enables efficient access by hiding this latency.

Memory Latency Comparison

Memory Type Typical Latency
Local DDR5 DRAM ~80-100 ns
CXL Memory (Type 2/3) ~150-300 ns
Persistent Memory ~200-400 ns
Remote NUMA Node ~150-200 ns
Far Memory Pool ~500-1000+ ns

Use Cases

3. DfMRd Transaction Flow

Deferred Completion Protocol

    Step 1: DfMRd Request
    ┌───────────┐                           ┌───────────────┐
    │ Requester │ ─── DfMRd ───────────────►│   Completer   │
    │           │     Tag=5, Addr=X         │ (CXL Device)  │
    └───────────┘                           └───────────────┘
    
    Step 2: Deferred Completion (immediate)
    ┌───────────┐                           ┌───────────────┐
    │ Requester │ ◄── DfCpl ────────────────│   Completer   │
    │           │     Tag=5, Status=Deferred│               │
    └───────────┘                           └───────────────┘
    
    Requester continues other operations...
    Completer fetches data from memory...
    
    Step 3: Data Completion (later)
    ┌───────────┐                           ┌───────────────┐
    │ Requester │ ◄── DfCplD ───────────────│   Completer   │
    │           │     Tag=5, Data           │               │
    └───────────┘                           └───────────────┘

TLP Types

TLP Direction Description
DfMRd Requester → Completer Deferrable Memory Read request
DfCpl Completer → Requester Deferred completion (no data yet)
DfCplD Completer → Requester Deferred completion with data

4. Completion Handling

Completion Status Options

Status Meaning Action
Immediate CplD Data available immediately Normal completion
Deferred (DfCpl) Request accepted, data later Wait for DfCplD
UR Unsupported Request Use standard MRd

Tag Management

The same Tag is used for the initial request and all related completions:

5. Credit Considerations

Flow Control

    DfMRd uses Non-Posted credits (like standard MRd)
    DfCpl uses Completion credits (1 credit, no data)
    DfCplD uses Completion credits (based on data size)
    
    Credit flow:
    
    1. DfMRd sent:
       Requester: -1 NP credit
       
    2. DfCpl received:
       Requester: holds tag (waiting for data)
       
    3. DfCplD received:
       Requester: releases tag
Important

Deferred completions may arrive out of order relative to other transactions. The requester must handle this properly.

6. System Integration

CXL Type 3 Device Example

    ┌─────────────────────────────────────────────────────────────┐
    │                        Host CPU                             │
    │                    ┌────────────┐                           │
    │                    │   Cache    │                           │
    │                    └─────┬──────┘                           │
    │                          │                                  │
    │                    ┌─────▼──────┐                           │
    │                    │ Memory     │                           │
    │                    │ Controller │                           │
    │                    └─────┬──────┘                           │
    └──────────────────────────┼──────────────────────────────────┘
                               │ PCIe/CXL
                               │
    ┌──────────────────────────▼──────────────────────────────────┐
    │                   CXL Memory Expander                       │
    │  ┌───────────┐    ┌───────────┐    ┌───────────────────┐   │
    │  │ CXL.io    │    │ CXL.cache │    │    CXL.mem        │   │
    │  │ (Config)  │    │ (Future)  │    │  (DfMRd target)   │   │
    │  └───────────┘    └───────────┘    └─────────┬─────────┘   │
    │                                              │              │
    │                                    ┌─────────▼─────────┐   │
    │                                    │   DDR5/HBM/etc    │   │
    │                                    │   (Memory Media)  │   │
    │                                    └───────────────────┘   │
    └─────────────────────────────────────────────────────────────┘

7. Performance Benefits

Latency Hiding

    Without DfMRd (blocking):
    
    |─ MRd ─|─────────── wait ───────────|─ CplD ─|─ next op ─|
    
    Total time = request + wait + next_op
    
    With DfMRd (non-blocking):
    
    |─DfMRd─|─DfCpl─|─── other work ───|─DfCplD─|
                    |────────────────────|
                    Parallel with memory access
    
    Effective time = max(other_work, memory_access)

Throughput Improvement

8. Capability Structure

Deferrable Memory Extended Capability

Devices supporting deferrable memory advertise via extended capability: