FLIT MODE OPTIMIZATION

OHC (Optimized Header Compression)

Header compression for improved bandwidth efficiency in PCIe 6.0/7.0 Flit Mode

1. What is OHC?

What is Optimized Header Compression?

OHC (Optimized Header Compression) is a feature in PCIe 6.0/7.0 Flit Mode that compresses TLP headers to reduce overhead, improving effective bandwidth utilization. It removes redundant or predictable header fields.

OHC vs Traditional Headers

Header Type Size Mode
3DW Header (32-bit addr) 12 bytes Legacy/Flit
4DW Header (64-bit addr) 16 bytes Legacy/Flit
OHC-A (2DW) 8 bytes Flit Mode Only
OHC-B (1DW) 4 bytes Flit Mode Only
OHC-C (0.5DW) 2 bytes Flit Mode Only

2. Why OHC?

Why compress headers?

TLP headers consume a significant portion of link bandwidth, especially for small transfers. OHC recovers this overhead, improving effective throughput by 10-30% for typical workloads.

Bandwidth Impact Example

    64-byte payload with 16-byte header:
    Overhead = 16 / (16 + 64) = 20%
    
    64-byte payload with 4-byte OHC header:
    Overhead = 4 / (4 + 64) = 5.9%
    
    Bandwidth improvement: ~15%
    
    For small NVMe commands (4KB with many 64B TLPs):
    Traditional: 16B header × 64 TLPs = 1024B overhead
    OHC-B:       4B header × 64 TLPs = 256B overhead
    Savings: 768 bytes per 4KB transfer = 18.75%

3. OHC Header Formats

OHC-A Format (2 DW / 8 bytes)

    DW0:
    ┌────────────────────────────────────────────────────────────────┐
    │  OH Type  │  TC  │ Attr │  TH  │  Len[9:0]  │   Tag[9:0]      │
    │   (3b)    │ (3b) │ (3b) │ (1b) │   (10b)    │    (10b)        │
    └────────────────────────────────────────────────────────────────┘
    
    DW1:
    ┌────────────────────────────────────────────────────────────────┐
    │           Requester ID (16b)          │   Tag[13:10]  │  Rsvd │
    │                                       │     (4b)      │       │
    └────────────────────────────────────────────────────────────────┘
    
    Usage: When full Requester ID and 14-bit Tag needed
    Supports: Memory Read/Write, Completions

OHC-B Format (1 DW / 4 bytes)

    DW0:
    ┌────────────────────────────────────────────────────────────────┐
    │ OH Type │  TC  │ TH │   Len[9:0]   │      Tag[13:0]           │
    │  (3b)   │ (3b) │(1b)│    (10b)     │       (14b)              │
    └────────────────────────────────────────────────────────────────┘
    
    Usage: When Requester ID can be derived from context
    Supports: Completions (implied Completer ID)

OHC-C Format (0.5 DW / 2 bytes)

    ┌────────────────────────────────────────────────────────────────┐
    │ OH Type │  Len[3:0]  │          Tag[9:0]                      │
    │  (3b)   │    (4b)    │           (10b)                        │
    └────────────────────────────────────────────────────────────────┘
    
    Usage: Very short completions (≤ 64 bytes)
    Maximum payload: 16 DW (64 bytes)
    Supports: CplD with constrained parameters

OH Type Encoding

OH Type Value Description
OHC-A1 000 Memory Read, 64-bit address
OHC-A2 001 Memory Write, 64-bit address
OHC-A3 010 Completion with Data
OHC-B 100 Reduced Completion
OHC-C 110 Minimal Completion

4. When OHC Can Be Used

OHC Eligibility Criteria

TLP Types Supporting OHC

TLP Type OHC-A OHC-B OHC-C
Memory Read (MRd) Yes No No
Memory Write (MWr) Yes No No
Completion (Cpl) Yes No No
Completion w/Data (CplD) Yes Yes Yes
Config/IO No No No
Messages No No No

5. Address Compression

Address Delta Encoding

OHC can compress addresses by encoding only the difference from a base address, when the address falls within a predictable range.

    Base Address (maintained in context): 0x0000_0001_0000_0000
    
    TLP 1: Address 0x0000_0001_0000_0100
           Delta = 0x100 (fits in 12 bits)
           Send: 12-bit delta instead of 64-bit address
    
    TLP 2: Address 0x0000_0001_0000_0200
           Delta = 0x200 (fits in 12 bits)
           Send: 12-bit delta
    
    Savings: 52 bits per TLP with sequential access patterns

6. System-Level Requirements

OHC Requirements

  1. Both endpoints MUST support Flit Mode
  2. Link MUST be operating at 64 GT/s or higher
  3. OHC capability MUST be advertised and enabled
  4. Compression context MUST be synchronized
  5. Switch/RC MUST preserve OHC format through fabric

Capability Negotiation

7. Implementation Considerations

Context Management

Error Handling

8. Performance Analysis

Efficiency Gains by Workload

Workload Avg TLP Size OHC Gain
NVMe 4KB Random Read 64-256 B 15-20%
GPU Texture Fetch 64-128 B 18-25%
Network (100GbE) 64-1500 B 5-15%
Large Sequential 4096 B < 5%
Key Insight

OHC provides the greatest benefit for workloads with many small TLPs (NVMe, GPU), where header overhead is proportionally larger. Large sequential transfers already have high efficiency.