OHC (Optimized Header Compression) Complete Guide

1. What is OHC?

What is Optimized Header Compression?

OHC (Optimized Header Compression) is a feature in PCIe 6.0/7.0 Flit Mode that compresses TLP headers to reduce overhead, improving effective bandwidth utilization. It removes redundant or predictable header fields.

OHC vs Traditional Headers

Header Type	Size	Mode
3DW Header (32-bit addr)	12 bytes	Legacy/Flit
4DW Header (64-bit addr)	16 bytes	Legacy/Flit
OHC-A (2DW)	8 bytes	Flit Mode Only
OHC-B (1DW)	4 bytes	Flit Mode Only
OHC-C (0.5DW)	2 bytes	Flit Mode Only

2. Why OHC?

Why compress headers?

TLP headers consume a significant portion of link bandwidth, especially for small transfers. OHC recovers this overhead, improving effective throughput by 10-30% for typical workloads.

Bandwidth Impact Example

    64-byte payload with 16-byte header:
    Overhead = 16 / (16 + 64) = 20%
    
    64-byte payload with 4-byte OHC header:
    Overhead = 4 / (4 + 64) = 5.9%
    
    Bandwidth improvement: ~15%
    
    For small NVMe commands (4KB with many 64B TLPs):
    Traditional: 16B header × 64 TLPs = 1024B overhead
    OHC-B:       4B header × 64 TLPs = 256B overhead
    Savings: 768 bytes per 4KB transfer = 18.75%

3. OHC Header Formats

OHC-A Format (2 DW / 8 bytes)

    DW0:
    ┌────────────────────────────────────────────────────────────────┐
    │  OH Type  │  TC  │ Attr │  TH  │  Len[9:0]  │   Tag[9:0]      │
    │   (3b)    │ (3b) │ (3b) │ (1b) │   (10b)    │    (10b)        │
    └────────────────────────────────────────────────────────────────┘
    
    DW1:
    ┌────────────────────────────────────────────────────────────────┐
    │           Requester ID (16b)          │   Tag[13:10]  │  Rsvd │
    │                                       │     (4b)      │       │
    └────────────────────────────────────────────────────────────────┘
    
    Usage: When full Requester ID and 14-bit Tag needed
    Supports: Memory Read/Write, Completions

OHC-B Format (1 DW / 4 bytes)

    DW0:
    ┌────────────────────────────────────────────────────────────────┐
    │ OH Type │  TC  │ TH │   Len[9:0]   │      Tag[13:0]           │
    │  (3b)   │ (3b) │(1b)│    (10b)     │       (14b)              │
    └────────────────────────────────────────────────────────────────┘
    
    Usage: When Requester ID can be derived from context
    Supports: Completions (implied Completer ID)

OHC-C Format (0.5 DW / 2 bytes)

    ┌────────────────────────────────────────────────────────────────┐
    │ OH Type │  Len[3:0]  │          Tag[9:0]                      │
    │  (3b)   │    (4b)    │           (10b)                        │
    └────────────────────────────────────────────────────────────────┘
    
    Usage: Very short completions (≤ 64 bytes)
    Maximum payload: 16 DW (64 bytes)
    Supports: CplD with constrained parameters

OH Type Encoding

OH Type	Value	Description
OHC-A1	000	Memory Read, 64-bit address
OHC-A2	001	Memory Write, 64-bit address
OHC-A3	010	Completion with Data
OHC-B	100	Reduced Completion
OHC-C	110	Minimal Completion

4. When OHC Can Be Used

OHC Eligibility Criteria

Flit Mode: OHC only works in Flit Mode (PCIe 6.0+)
No TLP Prefix: TLPs with prefixes use full headers
Standard Attributes: Some attribute combinations excluded
Known Context: Both ends must maintain compression context

TLP Types Supporting OHC

TLP Type	OHC-A	OHC-B	OHC-C
Memory Read (MRd)	Yes	No	No
Memory Write (MWr)	Yes	No	No
Completion (Cpl)	Yes	No	No
Completion w/Data (CplD)	Yes	Yes	Yes
Config/IO	No	No	No
Messages	No	No	No

5. Address Compression

Address Delta Encoding

OHC can compress addresses by encoding only the difference from a base address, when the address falls within a predictable range.

    Base Address (maintained in context): 0x0000_0001_0000_0000
    
    TLP 1: Address 0x0000_0001_0000_0100
           Delta = 0x100 (fits in 12 bits)
           Send: 12-bit delta instead of 64-bit address
    
    TLP 2: Address 0x0000_0001_0000_0200
           Delta = 0x200 (fits in 12 bits)
           Send: 12-bit delta
    
    Savings: 52 bits per TLP with sequential access patterns

6. System-Level Requirements

OHC Requirements

Both endpoints MUST support Flit Mode
Link MUST be operating at 64 GT/s or higher
OHC capability MUST be advertised and enabled
Compression context MUST be synchronized
Switch/RC MUST preserve OHC format through fabric

Capability Negotiation

OHC-A/B/C support indicated in extended capabilities
Negotiated during link training (Flit Mode)
Can be enabled/disabled per Virtual Channel

7. Implementation Considerations

Context Management

Hardware maintains compression context tables
Context invalidated on errors or resets
Fallback to full headers when context unknown

Error Handling

Decompression errors treated as Malformed TLP
Context mismatch causes fallback to full headers
Recovery via context refresh mechanism

8. Performance Analysis

Efficiency Gains by Workload

Workload	Avg TLP Size	OHC Gain
NVMe 4KB Random Read	64-256 B	15-20%
GPU Texture Fetch	64-128 B	18-25%
Network (100GbE)	64-1500 B	5-15%
Large Sequential	4096 B	< 5%

Key Insight

OHC provides the greatest benefit for workloads with many small TLPs (NVMe, GPU), where header overhead is proportionally larger. Large sequential transfers already have high efficiency.