Deferrable Memory (DfMRd) Complete Guide

1. What is Deferrable Memory?

What is Deferrable Memory Read?

Deferrable Memory Read (DfMRd) is a PCIe 6.0+ transaction type that allows a completer to acknowledge a read request immediately and deliver the actual data later. This enables non-blocking access to high-latency memory.

Standard MRd vs DfMRd

    Standard Memory Read:
    
    Requester                           Completer
        │                                   │
        │ ─── MRd (blocking) ──────────────►│
        │                                   │ Wait for data
        │     (Requester blocked)           │ (high latency)
        │                                   │
        │ ◄── CplD (data) ──────────────────│
        │                                   │
    Total latency experienced by requester
    
    Deferrable Memory Read:
    
    Requester                           Completer
        │                                   │
        │ ─── DfMRd ───────────────────────►│
        │                                   │
        │ ◄── DfCpl (Deferred) ─────────────│  Immediate!
        │                                   │
        │ (Requester continues other work)  │  Fetch data
        │                                   │  (high latency)
        │ ◄── DfCplD (actual data) ─────────│
        │                                   │
    Requester not blocked during data fetch

2. Why Deferrable Memory?

Why defer memory reads?

Emerging memory technologies (CXL memory, persistent memory, far memory) have higher latency than local DRAM. Deferrable memory enables efficient access by hiding this latency.

Memory Latency Comparison

Memory Type	Typical Latency
Local DDR5 DRAM	~80-100 ns
CXL Memory (Type 2/3)	~150-300 ns
Persistent Memory	~200-400 ns
Remote NUMA Node	~150-200 ns
Far Memory Pool	~500-1000+ ns

Use Cases

CXL Memory Expanders: Type 3 memory devices
Memory Pooling: Shared memory resources
Persistent Memory: Storage-class memory
Tiered Memory: Hot/cold memory tiering

3. DfMRd Transaction Flow

Deferred Completion Protocol

    Step 1: DfMRd Request
    ┌───────────┐                           ┌───────────────┐
    │ Requester │ ─── DfMRd ───────────────►│   Completer   │
    │           │     Tag=5, Addr=X         │ (CXL Device)  │
    └───────────┘                           └───────────────┘
    
    Step 2: Deferred Completion (immediate)
    ┌───────────┐                           ┌───────────────┐
    │ Requester │ ◄── DfCpl ────────────────│   Completer   │
    │           │     Tag=5, Status=Deferred│               │
    └───────────┘                           └───────────────┘
    
    Requester continues other operations...
    Completer fetches data from memory...
    
    Step 3: Data Completion (later)
    ┌───────────┐                           ┌───────────────┐
    │ Requester │ ◄── DfCplD ───────────────│   Completer   │
    │           │     Tag=5, Data           │               │
    └───────────┘                           └───────────────┘

TLP Types

TLP	Direction	Description
DfMRd	Requester → Completer	Deferrable Memory Read request
DfCpl	Completer → Requester	Deferred completion (no data yet)
DfCplD	Completer → Requester	Deferred completion with data

4. Completion Handling

Completion Status Options

Status	Meaning	Action
Immediate CplD	Data available immediately	Normal completion
Deferred (DfCpl)	Request accepted, data later	Wait for DfCplD
UR	Unsupported Request	Use standard MRd

Tag Management

The same Tag is used for the initial request and all related completions:

Requester assigns Tag to DfMRd
DfCpl references same Tag
DfCplD uses same Tag for matching
Tag released only after DfCplD received

5. Credit Considerations

Flow Control

    DfMRd uses Non-Posted credits (like standard MRd)
    DfCpl uses Completion credits (1 credit, no data)
    DfCplD uses Completion credits (based on data size)
    
    Credit flow:
    
    1. DfMRd sent:
       Requester: -1 NP credit
       
    2. DfCpl received:
       Requester: holds tag (waiting for data)
       
    3. DfCplD received:
       Requester: releases tag

Important

Deferred completions may arrive out of order relative to other transactions. The requester must handle this properly.

6. System Integration

CXL Type 3 Device Example

    ┌─────────────────────────────────────────────────────────────┐
    │                        Host CPU                             │
    │                    ┌────────────┐                           │
    │                    │   Cache    │                           │
    │                    └─────┬──────┘                           │
    │                          │                                  │
    │                    ┌─────▼──────┐                           │
    │                    │ Memory     │                           │
    │                    │ Controller │                           │
    │                    └─────┬──────┘                           │
    └──────────────────────────┼──────────────────────────────────┘
                               │ PCIe/CXL
                               │
    ┌──────────────────────────▼──────────────────────────────────┐
    │                   CXL Memory Expander                       │
    │  ┌───────────┐    ┌───────────┐    ┌───────────────────┐   │
    │  │ CXL.io    │    │ CXL.cache │    │    CXL.mem        │   │
    │  │ (Config)  │    │ (Future)  │    │  (DfMRd target)   │   │
    │  └───────────┘    └───────────┘    └─────────┬─────────┘   │
    │                                              │              │
    │                                    ┌─────────▼─────────┐   │
    │                                    │   DDR5/HBM/etc    │   │
    │                                    │   (Memory Media)  │   │
    │                                    └───────────────────┘   │
    └─────────────────────────────────────────────────────────────┘

7. Performance Benefits

Latency Hiding

    Without DfMRd (blocking):
    
    |─ MRd ─|─────────── wait ───────────|─ CplD ─|─ next op ─|
    
    Total time = request + wait + next_op
    
    With DfMRd (non-blocking):
    
    |─DfMRd─|─DfCpl─|─── other work ───|─DfCplD─|
                    |────────────────────|
                    Parallel with memory access
    
    Effective time = max(other_work, memory_access)

Throughput Improvement

More outstanding requests possible
Better utilization of link bandwidth
Reduced head-of-line blocking

8. Capability Structure

Deferrable Memory Extended Capability

Devices supporting deferrable memory advertise via extended capability:

DfMRd support indication
Maximum outstanding DfMRd count
Maximum deferral time