Detailed specifications, algorithms, timing, state machines, and implementation guidance
┌─────────────────────────────────────────┐
│ TX STATE MACHINE │
└─────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ TX_IDLE │
│ - Waiting for TLP from Transaction Layer │
│ - REPLAY_TIMER may be running │
└──────────────────────┬───────────────────────┘
│
┌───────────────────────────┼───────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ TLP Available │ │ ACK Received │ │ NAK Received / │
│ │ │ │ │ Timer Expired │
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Assign Seq Num │ │ Update ACKD_SEQ │ │ Set Replay Ptr │
│ NEXT_TX_SEQ++ │ │ Purge Buffer │ │ REPLAY_NUM++ │
└────────┬─────────┘ │ entries ≤ ACK'd │ └────────┬─────────┘
│ └────────┬─────────┘ │
▼ │ ▼
┌──────────────────┐ │ ┌──────────────────┐
│ Calculate LCRC │ │ │ TX_REPLAY │
│ Append to TLP │ │ │ Retransmit from │
└────────┬─────────┘ │ │ replay pointer │
│ │ └────────┬─────────┘
▼ │ │
┌──────────────────┐ │ │
│ Store in Replay │ │ │
│ Buffer │ │ │
└────────┬─────────┘ │ │
│ │ │
▼ │ │
┌──────────────────┐ │ │
│ Send to PHY │◄──────────────┴───────────────────────┘
│ Start/Reset Timer│
└────────┬─────────┘
│
└──────────────► Return to TX_IDLE
┌─────────────────────────────────────────┐
│ RX STATE MACHINE │
└─────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ RX_IDLE │
│ - Waiting for packet from Physical Layer │
│ - ACK_LATENCY_TIMER may be running │
└──────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Packet Received │
└──────────────────────┬───────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ Check LCRC │
└──────────────────────┬───────────────────────┘
│
┌───────────────────────────┴───────────────────────┐
│ LCRC Valid LCRC Invalid │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Extract Seq Num │ │ Discard Packet │
└────────┬─────────┘ │ Schedule NAK │
│ │ (NEXT_RCV_SEQ) │
▼ └──────────────────┘
┌────────────────────────────────────────┐
│ Compare Seq with NEXT_RCV_SEQ │
└────────────────────┬───────────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
Seq == Expected Seq < Expected Seq > Expected
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Accept TLP │ │ Duplicate │ │ Gap Detected │
│ NEXT_RCV_SEQ++│ │ Discard │ │ Schedule NAK │
│ Schedule ACK │ │ (silently) │ │ (NEXT_RCV_SEQ)│
│ Forward to TL│ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
| Parameter | Description | Typical Value | Notes |
|---|---|---|---|
REPLAY_TIMER |
Max time to wait for ACK before replay | ~711ns - 3.4μs | Link speed dependent |
ACK_LATENCY_TIMER |
Max delay before sending ACK | < REPLAY_TIMER/4 | Ensures ACK arrives before timeout |
REPLAY_NUM_ROLLOVER |
Max consecutive replays before Recovery | 3 or 4 | Implementation choice |
UpdateFC_TIMER |
Interval for sending UpdateFC DLLPs | ~30μs | Prevents FC timeout |
Where:
T_tx = Time to transmit max-size TLP
= (Max_TLP_Size_bits) / (Link_Speed × Lanes × Encoding_Efficiency)
T_prop = Propagation delay (cable + connectors)
≈ 5ns per meter of trace/cable
T_rx_proc = Receiver processing latency
= Implementation dependent (typically 50-200ns)
T_ack_tx = Time to transmit ACK DLLP
= (6 bytes × 8) / (Link_Speed × Lanes × Encoding_Efficiency)
T_ack_proc = Transmitter ACK processing latency
= Implementation dependent
Example at 32 GT/s x16, 128b/130b encoding:
Raw bandwidth = 32 × 16 × (128/130) = 502.15 Gbps = 62.77 GB/s
Max TLP (4KB) = 4096 × 8 bits = 32768 bits
T_tx = 32768 / 502.15G = ~65ns
REPLAY_TIMER ≥ 3 × (65 + 10 + 100 + 1 + 10 + 50)ns ≈ 708ns
The 12-bit sequence number (0-4095) wraps around. To correctly compare sequence numbers across wrap boundaries, modular arithmetic is essential.
Modular Comparison Algorithm: function compare_seq(A, B): // Returns: -1 if A < B, 0 if A == B, +1 if A > B diff = (A - B) mod 4096 if diff == 0: return 0 // Equal else if diff <= 2047: return +1 // A > B (A is ahead of B) else: return -1 // A < B (A is behind B) Example: A = 10, B = 4090 diff = (10 - 4090) mod 4096 = -4080 mod 4096 = 16 16 <= 2047, so A > B Interpretation: Seq 10 is "ahead" of Seq 4090 (after wrap) This is correct: 4090, 4091, 4092, 4093, 4094, 4095, 0, 1, ..., 10 Outstanding TLP Count: Outstanding = (NEXT_TX_SEQ - ACKD_SEQ - 1) mod 4096 Example: NEXT_TX_SEQ = 100, ACKD_SEQ = 90 Outstanding = (100 - 90 - 1) mod 4096 = 9 (TLPs 91-99 are outstanding) Example with wrap: NEXT_TX_SEQ = 5, ACKD_SEQ = 4090 Outstanding = (5 - 4090 - 1) mod 4096 = -4086 mod 4096 = 10 (TLPs 4091-4095, 0-4 are outstanding = 10 TLPs)
DLLP CRC-16 Specification: ═══════════════════════════════════════════════════════════════ Polynomial: x^16 + x^12 + x^5 + 1 Binary: 0x1021 (CRC-CCITT) Initial: 0xFFFF Input: First 4 bytes of DLLP (Type + Fields) Output: 16-bit CRC appended as bytes 4-5 Pseudocode: function dllp_crc16(data[0..3]): crc = 0xFFFF for byte in data[0..3]: for bit in 0..7: if (crc XOR byte) & 0x0001: crc = (crc >> 1) XOR 0x8408 // Reflected polynomial else: crc = crc >> 1 byte = byte >> 1 return crc XOR 0xFFFF // Final complement Verification: When receiver calculates CRC over all 6 bytes (including received CRC), result should be constant: 0x800D (residue) If result != 0x800D, DLLP has error and is discarded
FC Initialization Sequence Diagram: ═══════════════════════════════════════════════════════════════ Port A Port B │ │ │──── LTSSM enters L0 ─────────────────────────────►│ │ │ │ Phase 1: InitFC1 Exchange │ │ │ │───────── InitFC1-P (VC0, HdrFC, DataFC) ─────────►│ │───────── InitFC1-NP (VC0, HdrFC, DataFC) ────────►│ │───────── InitFC1-Cpl (VC0, HdrFC, DataFC) ───────►│ │ │ │◄──────── InitFC1-P (VC0, HdrFC, DataFC) ──────────│ │◄──────── InitFC1-NP (VC0, HdrFC, DataFC) ─────────│ │◄──────── InitFC1-Cpl (VC0, HdrFC, DataFC) ────────│ │ │ │ InitFC1 received for all enabled VCs │ │ │ │ Phase 2: InitFC2 Exchange │ │ │ │───────── InitFC2-P (VC0, HdrFC, DataFC) ─────────►│ │───────── InitFC2-NP (VC0, HdrFC, DataFC) ────────►│ │───────── InitFC2-Cpl (VC0, HdrFC, DataFC) ───────►│ │ │ │◄──────── InitFC2-P (VC0, HdrFC, DataFC) ──────────│ │◄──────── InitFC2-NP (VC0, HdrFC, DataFC) ─────────│ │◄──────── InitFC2-Cpl (VC0, HdrFC, DataFC) ────────│ │ │ │ FC Init Complete for all VCs │ │ │ │───────── DLCMSM → DL_Active ─────────────────────►│ │◄──────── DL_Up Status ────────────────────────────│ │ │ │ TLP Exchange Can Begin │ │ │ Rules: - InitFC1 for each VC must be sent at least twice - InitFC2 sent after receiving InitFC1 for all VCs - InitFC2 also sent at least twice per VC - FC Init complete when InitFC2 received for all enabled VCs - Total time: typically 20-100 μs depending on link speed
256-Byte Flit Structure: ═══════════════════════════════════════════════════════════════ ┌────────────────────────────────────────────────────────────────┐ │ Byte 0-1 │ Bytes 2-235 │ Bytes 236-249 │ Bytes 250-255 │ │ Flit Hdr │ TLP Payload │ DLP Section │ FEC + CRC │ └────────────────────────────────────────────────────────────────┘ Flit Header (2 bytes): ┌─────────────────────────────────────────────────────────────────┐ │ Bit 15 │ Bits 14:12 │ Bits 11:8 │ Bits 7:0 │ │ TLP │ Flit Type │ First Byte │ Flit Sequence Number │ │ Start │ │ Offset │ (8-bit) │ └─────────────────────────────────────────────────────────────────┘ Flit Types: 000 = Payload Flit (contains TLP data) 001 = NOP.Empty Flit (idle, no TLP data) 010 = NOP.Debug Flit (debug information) 011 = Reserved 1xx = NOP.Vendor Flit DLP Section (14 bytes): ┌─────────────────────────────────────────────────────────────────┐ │ Bytes 0-1 │ Explicit ACK Sequence Number │ │ Bytes 2-3 │ Explicit NAK Sequence Number (or flags) │ │ Bytes 4-9 │ Flow Control Credits (P, NP, Cpl for VC0) │ │ Bytes 10-11│ Additional FC or flags │ │ Bytes 12-13│ DLP CRC (16-bit) │ └─────────────────────────────────────────────────────────────────┘ FEC + CRC (6 bytes): - 6 bytes of Reed-Solomon FEC parity - Can correct up to 3 symbol errors per Flit - CRC embedded in last byte for overall integrity
TX RX │ │ │──── TLP(Seq=50) ──────────────────────────►│ LCRC OK │──── TLP(Seq=51) ───X (bit error) ─────────►│ LCRC FAIL │──── TLP(Seq=52) ──────────────────────────►│ Gap detected │ │ │◄───────────────────── Nak(51) ─────────────│ │ │ │ Replay from 51: │ │──── TLP(Seq=51) ──────────────────────────►│ OK │──── TLP(Seq=52) ──────────────────────────►│ OK (duplicate, ignored) │──── TLP(Seq=53) ──────────────────────────►│ OK (new) │ │ │◄───────────────────── Ack(53) ─────────────│
TX RX │ │ │──── TLP(Seq=100) ─────────────────────────►│ OK │──── TLP(Seq=101) ─────────────────────────►│ OK │──── TLP(Seq=102) ─────────────────────────►│ OK │ │ │◄─────────── Ack(102) ───X (lost) ──────────│ │ │ │ REPLAY_TIMER expires │ │ REPLAY_NUM = 1 │ │ │ │──── TLP(Seq=100) ─────────────────────────►│ Duplicate, discard │──── TLP(Seq=101) ─────────────────────────►│ Duplicate, discard │──── TLP(Seq=102) ─────────────────────────►│ Duplicate, discard │ │ │◄───────────────────── Ack(102) ────────────│ Re-send ACK │ │ │ REPLAY_NUM reset to 0 │
TX RX │ │ │──── TLP(Seq=200) ───X────────────────────►│ LCRC FAIL │◄───────────────────── Nak(200) ────────────│ │ REPLAY_NUM = 1 │ │──── TLP(Seq=200) ───X────────────────────►│ LCRC FAIL │◄───────────────────── Nak(200) ────────────│ │ REPLAY_NUM = 2 │ │──── TLP(Seq=200) ───X────────────────────►│ LCRC FAIL │◄───────────────────── Nak(200) ────────────│ │ REPLAY_NUM = 3 │ │──── TLP(Seq=200) ───X────────────────────►│ LCRC FAIL │◄───────────────────── Nak(200) ────────────│ │ REPLAY_NUM = 4 │ │ │ │ REPLAY_NUM > REPLAY_NUM_ROLLOVER (3) │ │ │ │ ════════════════════════════════════════ │ │ Enter LTSSM Recovery State │ │ Attempt Link Retraining │ │ ════════════════════════════════════════ │