The Physical Layer is the foundation of PCIe communication, responsible for the actual transmission and reception of bits over the physical medium. It handles electrical signaling, encoding/decoding, serialization, link training, and maintaining the physical link.
Important Clarification: What Physical Layer Does NOT Do
Enumeration is NOT part of the Physical Layer. Enumeration is a software/configuration process that uses Configuration Read/Write TLPs through the Transaction Layer to discover devices, assign bus numbers, and configure BARs. The Physical Layer only provides the electrical connection and trained link.
Electrical Idle: Low-power signaling state detection
Receiver Detection: Detect presence of far-end device
3. Encoding Schemes
Generation
Data Rate
Encoding
Efficiency
Bandwidth/Lane
Signaling
1.x
2.5 GT/s
8b/10b
80%
250 MB/s
NRZ
2.x
5.0 GT/s
8b/10b
80%
500 MB/s
NRZ
3.x
8.0 GT/s
128b/130b
98.5%
~985 MB/s
NRZ
4.0
16.0 GT/s
128b/130b
98.5%
~1969 MB/s
NRZ
5.0
32.0 GT/s
128b/130b
98.5%
~3938 MB/s
NRZ
6.0
64.0 GT/s
1b/1b + Flit
~94%
~7529 MB/s
PAM4
7.0
128.0 GT/s
1b/1b + Flit
~94%
~15059 MB/s
PAM4
8b/10b Encoding (Gen1-2)
// 8b/10b Encoding OverviewPurpose:
- Ensure DC balance (equal number of 1s and 0s over time)
- Provide special control characters (K-codes)
- Enable clock recovery (sufficient transitions)
Mechanism:
- 8 data bits encoded to 10 symbol bits
- 20% overhead (only 80% efficiency)
- Two versions of each symbol: RD+ and RD- (Running Disparity)
- Transmitter alternates to maintain DC balance
Special K-Codes (Control Characters):
┌────────────┬────────────┬──────────────────────────────────┐
│ Symbol │ Encoding │ Purpose │
├────────────┼────────────┼──────────────────────────────────┤
│ K28.5 (COM)│ 001111_1010│ Comma - byte/symbol alignment │
│ K28.0 (SKP)│ 001111_0100│ Skip - clock compensation │
│ K28.1 (FTS)│ 001111_1001│ Fast Training Sequence │
│ K28.2 (SDP)│ 001111_0101│ Start DLLP │
│ K28.3 (IDL)│ 001111_0011│ Idle │
│ K28.4 (--) │ 001111_0010│ Reserved │
│ K28.6 (--) │ 001111_0110│ Reserved │
│ K28.7 (EIE)│ 001111_1000│ Electrical Idle Exit │
│ K27.7 (STP)│ 110110_1000│ Start TLP │
│ K29.7 (END)│ 101110_1000│ End Good │
│ K30.7 (EDB)│ 011110_1000│ End Bad (nullified TLP) │
│ K23.7 (PAD)│ 111010_1000│ Padding │
└────────────┴────────────┴──────────────────────────────────┘
Running Disparity (RD):
RD+ = More 1s than 0s sent recently
RD- = More 0s than 1s sent recently
Symbol chosen to move RD toward neutral
Disparity error = link error indication
128b/130b Encoding (Gen3-5)
// 128b/130b Block EncodingStructure:
┌────────────┬────────────────────────────────────────────────────┐
│ Sync Header│ Payload (128 bits) │
│ (2 bits) │ 16 bytes of data │
└────────────┴────────────────────────────────────────────────────┘
Sync Header Values:01 = Data Block (contains data symbols)
10 = Ordered Set Block (contains OS)
00, 11 = Invalid (used for error detection)
Efficiency: 128/130 = 98.46%No Running Disparity:
- DC balance achieved through scrambling
- LFSR scrambler provides pseudo-random bit distribution
- Sync headers NOT scrambled
Block Alignment:
- Receiver uses sync header pattern to find block boundaries
- 01/10 alternation helps alignment
- EIEOS used for initial alignment
PAM4 Signaling (Gen6-7)
PAM4 (Pulse Amplitude Modulation, 4-Level)
// PAM4 Voltage Levels4 voltage levels encode 2 bits per symbol (UI):
Voltage Symbol Gray Code
─────── ────── ─────────
+V_high ───── Level 3 ─── 11
+V_mid ───── Level 2 ─── 10
-V_mid ───── Level 1 ─── 01
-V_low ───── Level 0 ─── 00Gray Coding:
Adjacent levels differ by only 1 bit
Single-level error → single-bit error (not 2-bit)
Eye Diagram:
NRZ: 1 eye opening
PAM4: 3 eye openings (smaller each)
NRZ eye height ≈ 3× PAM4 eye height
→ ~9.5 dB SNR penalty
Compensation:
- Forward Error Correction (FEC) required
- More sophisticated equalization
- Tighter jitter/noise budgets
Data Rate Calculation:
128 GT/s = 64 GBaud × 2 bits/symbol
Symbol rate halved, bits/symbol doubled
PAM4: Data Rate (GT/s) = 2 × Symbol Rate (GBaud)
PCIe 7.0: 128 GT/s = 2 × 64 GBaud
4. Packet Framing
Non-Flit Mode (Gen1-5)
// TLP Framing (8b/10b and 128b/130b)8b/10b Framing (Gen1-2):
┌─────┬────────────────────────────────────────┬─────┐
│ STP │ TLP │ END │
│K27.7│ Header + Data + ECRC (DLL adds Seq+LCRC) │K29.7│
└─────┴────────────────────────────────────────┴─────┘
STP = Start TLP (K27.7)
END = End Good (K29.7) or EDB = End Bad (K30.7)
DLLP Framing (Gen1-2):
┌─────┬──────────────────┬─────┐
│ SDP │ DLLP │ END │
│K28.2│ 4 bytes + CRC16 │K29.7│
└─────┴──────────────────┴─────┘
128b/130b Framing (Gen3-5):
Data blocks (sync=01) can contain:
- STP token (4-bit): Start of TLP
- SDP token (4-bit): Start of DLLP
- END token (4-bit): End Good
- EDB token (4-bit): End Bad (nullified)
Tokens embedded in data stream with 4-bit framing codes
Flit Mode (Gen6-7)
// Flit Mode Framing (256-byte fixed Flits)
┌────────────────────────────────────────────────────────────────────┐
│ 256-Byte Flit │
├────────────┬──────────────────────────────────┬──────────┬─────────┤
│ Flit Header│ TLP Payload Area │ DLP │FEC+CRC │
│ (2 bytes) │ (236 bytes) │(12 bytes)│(6 bytes)│
└────────────┴──────────────────────────────────┴──────────┴─────────┘
Flit Header:
- Flit Type (Payload/NOP/IDE)
- Flit Sequence Number (8-bit)
- First TLP Byte Offset
TLP Payload Area:
- Can contain multiple small TLPs
- Can contain partial large TLP (continued in next Flit)
- TLP packed efficiently with OHC (Optimized Header Compression)
DLP (Data Link Payload):
- ACK/NAK information
- Flow control credits
- Replaces separate DLLPs
FEC + CRC:
- 6 bytes Reed-Solomon FEC
- Enables error correction without retry
5. Scrambling
Why Scrambling?
Spreads signal energy across frequency spectrum (reduces EMI)
Provides DC balance (for 128b/130b mode)
Ensures sufficient bit transitions for CDR
Eliminates problematic repeated patterns
// LFSR Scrambler PolynomialsGen3-5 (128b/130b):
Polynomial: G(X) = X^23 + X^21 + X^16 + X^8 + X^5 + X^2 + 1
Initial seed: FFFFFFh
Scrambler runs continuously
Sync headers (2 bits) NOT scrambled
Data payload (128 bits) XORed with LFSR output
Scrambler Reset:
- Reset at start of each Ordered Set Block
- COM symbol in 8b/10b resets scrambler
- Allows receiver to resynchronize
Compliance Pattern:
Special unscrambled pattern for testing
Entered via Polling.Compliance state
6. Ordered Sets - Complete Specification
Ordered Sets are special symbol sequences used for link training, synchronization, and management. They are NOT TLPs or DLLPs - they are Physical Layer constructs.
Ordered Set
Full Name
Length
Purpose
LTSSM States
TS1
Training Sequence 1
16 symbols
Initial link training, speed negotiation
Polling, Recovery, Configuration
TS2
Training Sequence 2
16 symbols
Link configured acknowledgment
Polling, Recovery, Configuration
SKP
Skip
4 symbols (8b/10b), varies
Clock compensation
L0
EIEOS
Electrical Idle Exit OS
16 symbols
Exit from Electrical Idle
Exit from L0s/L1/L2
EIOS
Electrical Idle OS
4 symbols
Entry to Electrical Idle
Entry to L0s/L1/L2
FTS
Fast Training Sequence
4 symbols
Quick exit from L0s
L0s exit
SDS
Start of Data Stream
4 symbols
Transition to data mode
End of training
TS1/TS2 Ordered Set Format
// TS1 Ordered Set Structure (16 symbols)
Symbol 0: COM (K28.5) - Comma for alignment
Symbol 1: Link Number - PAD (K23.7) = not yet assigned
Symbol 2: Lane Number - PAD (K23.7) = not yet assigned
Symbol 3: N_FTS - Number of FTS required for L0s exit
Symbol 4: Data Rate ID - Supported/selected data rate
Symbol 5: Training Ctrl - Control bits (Hot Reset, Loopback, etc.)
Symbol 6-9: EQ Ctrl - Equalization settings (Gen3+)
Symbol 10-13: TS1 ID - D10.2 pattern identifies TS1
Symbol 14-15: Reserved// TS2 Ordered Set Structure (16 symbols)
Same as TS1 except:
Symbol 10-13: TS2 ID - D5.2 pattern identifies TS2
Key Fields:Data Rate ID (Symbol 4):
Bit 0: 2.5 GT/s supported
Bit 1: 5.0 GT/s supported
Bit 2: 8.0 GT/s supported
Bit 3: Autonomous Change (speed change without Re-config)
Bits 4-6: Speed being trained to
Bit 7: Crosslink supported
Training Control (Symbol 5):
Bit 0: Hot Reset
Bit 1: Disable Link
Bit 2: Loopback
Bit 3: Disable Scrambling
Bit 4: Compliance Receive
Bits 5-7: Reserved
SKP Ordered Set
// SKP (Skip) Ordered SetPurpose:
Compensate for clock frequency differences between Tx and Rx
Tx and Rx clocks can differ by up to ±300 ppm
8b/10b SKP (Gen1-2):
COM + SKP + SKP + SKP (4 symbols)
Receiver can add or remove SKP symbols as needed
128b/130b SKP (Gen3-5):
Ordered Set Block containing SKP pattern
SKP_OS contains identifier + SKP symbols
Scheduling (8b/10b):
SKP must be sent every 1180-1538 symbol times
~1.2μs interval at 2.5 GT/s
Scheduling (128b/130b):
SKP_OS sent every ~370 blocks
Interval: 370 × 130 bits ÷ data rate
Receiver Behavior:
- Elastic buffer absorbs timing differences
- Add SKP if buffer too empty
- Remove SKP if buffer too full
- SKP removal/addition transparent to upper layers
Other Ordered Sets
// EIEOS (Electrical Idle Exit Ordered Set)
Pattern: Alternating 00h and FFh bytes (creates high-frequency tone)
Length: 16 symbols
Used to: Re-establish bit lock after Electrical Idle
Precedes: TS1 when exiting low-power state
// EIOS (Electrical Idle Ordered Set)
8b/10b: COM + IDL + IDL + IDL (4 symbols)
Signals: Transmitter entering Electrical Idle
Followed by: Transmitter drives Electrical Idle level
// FTS (Fast Training Sequence)
8b/10b: COM + FTS + FTS + FTS (4 symbols)
Used for: Quick L0s exit (minimal re-training)
N_FTS field in TS1/TS2 indicates how many needed
// SDS (Start of Data Stream)
Indicates: Training complete, data transmission beginning
128b/130b: Special SDS pattern in OS block
Followed by: First data block
7. LTSSM (Link Training and Status State Machine)
The LTSSM is the heart of Physical Layer operation, controlling link initialization, training, power states, and error recovery.
Detect.Quiet:
- Initial state after reset/power-on
- Transmitter drives Electrical Idle
- Wait for internal timer (12ms timeout)
- Or triggered by wake event
Detect.Active:
- Apply voltage change to Tx output
- Measure impedance/current to detect receiver termination
- Receiver Detected:
- Impedance indicates ~50Ω termination present
- Transition to Polling
- No Receiver:
- Return to Detect.Quiet
- Exponential backoff (up to 1 second)
// Detection thresholds vary by speed/implementation// Must detect receiver within specific current range
Polling State
Polling.Active:
- Transmit TS1 Ordered Sets continuously
- Receiver attempting bit lock (CDR acquisition)
- Look for valid TS1 or TS2 on receive
- Exit to Polling.Configuration: Received 8 TS1/TS2
Polling.Configuration:
- Continue transmitting TS1
- Start transmitting TS2 after receiving TS1/TS2
- Exit to Configuration:
- Received 8 consecutive TS2 with:
- Lane = any lane this port received TS1/TS2 on
- Link = PAD (not assigned)
- Timeout: 24ms → Detect
Polling.Compliance:
- Special test/debug state
- Transmit Compliance Pattern
- Entered if:
- Enter Compliance bit set in received TS1
- Or compliance mode strapping
- Exit by entering Electrical Idle
Polling.Speed:
- Speed change negotiation
- Exchange TS1/TS2 with speed bits
- Prepare for new speed operation
Configuration State
Configuration.Linkwidth.Start:Downstream Port (Root/Switch DSP):
- Transmit TS1 with Link# = N (proposed)
- Wait for Upstream to accept
Upstream Port:
- Wait for TS1 with Link# ≠ PAD
- Accept proposed configuration
Configuration.Linkwidth.Accept:
- Negotiate which lanes are active
- Handle lane reversal if needed
- Agree on link width (x1, x2, x4, x8, x16, x32)
Configuration.Lanenum.Wait:
- Receive TS1 with Lane numbers assigned
- Verify lane numbering consistency
Configuration.Lanenum.Accept:
- Accept lane number assignment
- Transmit TS2 with accepted lane numbers
Configuration.Complete:
- Both ends transmit TS2
- After receiving 8 TS2 with:
- Link# match
- Lane# match
- Proceed to Configuration.Idle
Configuration.Idle:
- Transmit Logical Idle (or SDS for 128b/130b)
- Enter L0 (normal operation)
L0 - Normal Operation
L0 State Activities
Normal TLP and DLLP exchange
Data Link Layer reports DL_Up
SKP Ordered Sets sent periodically (clock compensation)
Can transition to power states (L0s, L1, L2)
Errors trigger transition to Recovery
Recovery State
Recovery.RcvrLock:
- Re-acquire bit lock
- Transmit TS1 Ordered Sets
- Exit when TS1/TS2 received or timeout
Recovery.RcvrCfg:
- Re-verify configuration
- Similar to Configuration states
- Shorter timeouts than initial Configuration
Recovery.Equalization: (Gen3+)
- Perform link equalization
- Phase 0: Downstream Tx with preset, Upstream evaluates
- Phase 1: Downstream requests Upstream Tx changes
- Phase 2: Upstream Tx with settings, Downstream evaluates
- Phase 3: Upstream requests Downstream Tx changes
Recovery.Speed:
- Handle speed change
- Entered when Changed_Speed_Recovery bit set
- Change to new speed, then Recovery.RcvrLock
Recovery.Idle:
- Final state before returning to L0
- Verify link is stable
- Transmit Idle then enter L0
Power States (L0s, L1, L2)
L0s (Standby):
- Fast entry/exit (~100ns exit latency)
- Per-lane power savings
- Tx in Electrical Idle
- Exit via FTS Ordered Sets
- ASPM controlled
L1 (Low Power):
- Deeper power savings
- Longer exit latency (~2-32μs)
- Common clock can be stopped
- Sub-states: L1.0, L1.1, L1.2 (L1.2 lowest power)
- Exit via EIEOS → Recovery
L2 (Auxiliary Power):
- Near-off state
- Only wake signaling active
- Main power can be removed
- Exit requires full re-training (Detect)
- Used for system suspend states
Disabled and Hot Reset
Disabled:
- Link intentionally disabled
- Entered via software or TS1 with Disable bit
- Transmitter in Electrical Idle
- Exit to Detect when re-enabled
Hot Reset:
- In-band reset mechanism
- Entered via TS1 with Hot Reset bit
- Causes device reset without power cycle
- Downstream sends TS1 with Hot Reset
- Upstream enters Hot Reset state
- After reset complete → Detect
8. Link Equalization (Gen3+)
Why Equalization is Needed
At 8 GT/s and above, high-frequency signal loss in the channel (PCB traces, connectors, cables) causes inter-symbol interference (ISI). Equalization compensates for this loss to maintain signal integrity.
// Equalization ComponentsTransmitter (Tx) Equalization:
3-tap FIR filter:
Output = C(-1) × D(n+1) + C(0) × D(n) + C(+1) × D(n-1)
C(-1): Pre-cursor (affects next bit) - boosts high frequency
C(0): Main cursor (current bit amplitude)
C(+1): Post-cursor (affects previous bit) - compensates ISI
Receiver (Rx) Equalization:CTLE (Continuous Time Linear Equalizer):
- Analog high-pass filter
- Boosts high-frequency components
- Compensates for channel roll-off
DFE (Decision Feedback Equalizer):
- Digital filter using previous bit decisions
- Cancels post-cursor ISI
- 1-5 taps typical
- Non-linear, doesn't amplify noise
Equalization Phases
// Gen3+ Equalization Phases (in Recovery.Equalization)Phase 0:
- Downstream Port transmits with initial preset
- Upstream Port performs Rx equalization
- Evaluate eye quality
Phase 1:
- Downstream Port sends EQ requests to Upstream Tx
- "Increase pre-cursor" / "Decrease post-cursor" etc.
- Upstream adjusts Tx EQ settings
Phase 2:
- Upstream Port transmits with its settings
- Downstream Port evaluates and adjusts Rx EQ
Phase 3:
- Upstream Port sends EQ requests to Downstream Tx
- Downstream adjusts settings
- Final optimization
Presets (P0-P10):
Pre-defined coefficient combinations
Receiver can request specific preset
Speeds up convergence
┌─────────┬──────────┬───────────┬────────────┐
│ Preset │ C(-1) │ C(0) │ C(+1) │
├─────────┼──────────┼───────────┼────────────┤
│ P0 │ 0 dB │ -6.0 dB │ 0 dB │
│ P1 │ 0 dB │ -3.5 dB │ -3.5 dB │
│ ... │ ... │ ... │ ... │
│ P10 │ -1.9 dB │ 0 dB │ -3.0 dB │
└─────────┴──────────┴───────────┴────────────┘
9. Lane Management
Lane Reversal
Lane Reversal Support
PCIe supports lane reversal to simplify PCB routing. If lanes are connected in reverse order (Lane 0 to Lane N-1 swapped), the link can still operate.
Detected during Configuration state
Lane numbers in TS1/TS2 indicate reversal
Logical lane remapping performed internally
Polarity Inversion
Polarity Inversion Support
If D+ and D- signals are swapped on a lane, the receiver can invert the polarity.
Detected by checking TS1/TS2 patterns
Inverted data corrected in receiver
Per-lane polarity inversion
Lane-to-Lane Deskew
// Lane DeskewProblem:
Different lane lengths cause arrival time differences
Data must be aligned across all lanes for parallel processing
Solution:
- Use symbol lock (COM detection) per lane
- Elastic buffers per lane
- Align all lanes to slowest lane
- SKP insertion/removal for continuous alignment
Max Skew:
Specified per generation
Gen3+: ~20ns maximum lane-to-lane skew
10. Clock Compensation
// Clock Tolerance and CompensationClock Tolerance:
Tx and Rx reference clocks can differ by ±300 ppm
Without compensation, buffers would overflow/underflow
Mechanism:SKP Ordered Sets:
- Periodically transmitted in L0
- Receiver can add or remove SKP symbols
- Absorbs clock frequency difference
Elastic Buffer:
- FIFO between CDR and decoder
- Nominal half-full
- SKP added when buffer too empty
- SKP removed when buffer too full
SKP Scheduling:
8b/10b: Every 1180-1538 symbols
128b/130b: Every ~370 blocks (depends on implementation)
Flit Mode: SKP_OS in Flit stream
Calculation:
Clock diff = 300 ppm = 0.03%
At 8 GT/s: ~2400 symbols/second drift
SKP every ~1300 symbols compensates this
11. Retimers
What is a Retimer?
A Retimer is an active device that fully regenerates the PCIe signal, enabling longer reach for cables and lossy channels. Unlike a repeater (redriver) that only amplifies, a retimer recovers the clock, retimes the data, and retransmits.
// Retimer Architecture
┌─────────┐ ┌─────────────┐ ┌─────────┐
│ Root │◄──────────────►│ Retimer │◄──────────────►│ Device │
│ Complex │ Link Segment │ │ Link Segment │ │
└─────────┘ 1 └─────────────┘ 2 └─────────┘
Retimer Functions:
- Full CDR (Clock Data Recovery)
- Data retiming with local clock
- Equalization (Rx and Tx)
- LTSSM participation
- Protocol-aware (understands Ordered Sets)
Key Points:
- Up to 2 retimers per link (3 segments total)
- Each retimer adds ~5-10ns latency
- Transparent to software/upper layers
- Has own LTSSM (pseudo-port)
- Passes through TLPs/DLLPs without modification
Training:
- Participates in equalization
- Each segment optimized independently
- Upstream and Downstream independently trained
Use Cases:
- CEM cables (server interconnect)
- Long PCB traces (>12 inches)
- Backplanes
- U.2/U.3 storage connections