What is PCI Express?
๐ฏ Explaining Like You're 5
Imagine your computer is a city, and different parts of the computer (like the graphics card, storage, and network card) are buildings. PCI Express is like a super-fast highway system that connects all these buildings, allowing them to send packages (data) to each other really quickly!
PCI Express (Peripheral Component Interconnect Express), often abbreviated as PCIe, is a high-speed serial computer expansion bus standard designed to replace older parallel bus standards like PCI, PCI-X, and AGP.
Key characteristics of PCIe include:
- Point-to-Point Architecture: Unlike shared buses, each device has its own dedicated connection
- Serial Communication: Data is sent bit-by-bit over high-speed lanes
- Scalable Bandwidth: Multiple lanes (x1, x2, x4, x8, x16) can be combined
- Packet-Based Protocol: Data is organized into packets called TLPs
- Full-Duplex Operation: Simultaneous sending and receiving
Why was PCI Express Created?
The Problem with Parallel Buses
Traditional PCI used parallel communication where multiple bits traveled simultaneously on separate wires. While this seems faster, at high speeds the signals would interfere with each other (crosstalk), and timing became nearly impossible to maintain. The solution was serial communication - sending one bit at a time, but at much higher speeds.
Benefits of PCIe over PCI
| Feature | PCI (Parallel) | PCIe (Serial) |
|---|---|---|
| Architecture | Shared bus | Point-to-point links |
| Bandwidth | 133 MB/s (shared) | Up to 512 GB/s (PCIe 7.0 x16) |
| Scalability | Limited | Highly scalable (lanes) |
| Full Duplex | No | Yes |
| Hot Plug | Limited support | Native support |
The PCIe Link
A PCIe Link is the connection between two components. Each link consists of one or more Lanes.
๐ Lane Configurations
PCIe supports different link widths: x1, x2, x4, x8, x12, x16, and x32. Each lane provides full-duplex communication - meaning data can flow in both directions simultaneously. A x16 link has 16 lanes, providing 16ร the bandwidth of a x1 link.
Bandwidth Calculation
PCI Express Fabric Topology
PCIe uses a tree-like topology where the Root Complex sits at the top (connected to the CPU), and devices branch out below it through switches.
Root Complex
๐๏ธ What is it?
The Root Complex (RC) connects the PCIe hierarchy to the CPU and memory subsystem. It's the "top" of the PCIe tree and generates all configuration transactions.
๐ Location
Typically integrated into the CPU or chipset. Modern CPUs have the Root Complex built directly into the processor die.
โ๏ธ Functions
- Initiates configuration cycles
- Routes transactions to/from CPU
- Contains Root Ports
- Handles system errors
Endpoints
Endpoints are the final destinations in the PCIe hierarchy - they are the actual devices that perform useful work (GPUs, SSDs, NICs, etc.).
Types of Endpoints
| Type | Description | I/O Requests | Examples |
|---|---|---|---|
| PCI Express Endpoint | Native PCIe device, modern design | Must not generate | Modern GPUs, NVMe SSDs |
| Legacy Endpoint | PCI-compatible device in PCIe | May generate | Legacy adapters |
| RCiEP | Root Complex Integrated Endpoint | Implementation specific | Integrated audio, USB controller |
Switches
A PCIe Switch expands the PCIe fabric, allowing multiple endpoints to connect where only one port exists. Switches route packets between their ports.
Switch Architecture
A switch has one Upstream Port (facing the Root Complex) and one or more Downstream Ports (facing endpoints or other switches). Internally, it appears as a virtual PCI-to-PCI bridge for each port.
Bridges
Bridges connect PCIe to other bus types:
- PCIe to PCI/PCI-X Bridge: Connects legacy PCI devices
- Virtual PCI Bridge: Internal representation within switches
The Three-Layer Architecture
PCIe uses a layered protocol stack similar to networking protocols. Each layer has specific responsibilities and communicates with adjacent layers through defined interfaces.
Transaction Layer
The Transaction Layer is responsible for assembling and disassembling Transaction Layer Packets (TLPs). It defines what data is being transferred and where it's going.
Key Responsibilities:
- TLP Generation: Creates packets for memory reads/writes, I/O, configuration, and messages
- Flow Control: Manages buffer credits to prevent overflow
- Ordering: Ensures transactions complete in the correct order
- Virtual Channels: Provides multiple independent traffic streams
TLP Types:
- Memory Read/Write
- I/O Read/Write
- Configuration Read/Write
- Messages (interrupts, errors, power management)
- Completions
Data Link Layer
The Data Link Layer ensures reliable data transfer across the link using acknowledgements, sequence numbers, and CRC error detection.
Key Responsibilities:
- LCRC Generation: Adds Link CRC for error detection
- Sequence Numbers: Tracks packet order
- ACK/NAK Protocol: Confirms successful receipt
- Retry Mechanism: Retransmits corrupted packets
- DLLP Management: Handles Data Link Layer Packets
DLLP Types:
- Ack/Nak for TLP acknowledgement
- Flow Control updates
- Power Management DLLPs
Physical Layer
The Physical Layer handles the actual transmission of bits over the physical medium, including encoding, scrambling, and electrical signaling.
Key Responsibilities:
- Encoding: 8b/10b, 128b/130b, or 1b/1b (PAM4) depending on data rate
- Scrambling: Randomizes data for better signal integrity
- LTSSM: Link Training and Status State Machine
- Equalization: Compensates for channel loss at high speeds
- Ordered Sets: Special patterns for link management
Sub-blocks:
- Logical Sub-block: Encoding, framing, scrambling
- Electrical Sub-block: Transmitters, receivers, signaling
PCIe 7.0 Key Features
๐ 128 GT/s Data Rate
Doubles the bandwidth of PCIe 6.0, achieving up to 512 GB/s with x16 links using PAM4 signaling.
๐ฆ Flit Mode
Fixed 256-byte packets with integrated Forward Error Correction (FEC) for improved efficiency at high speeds.
๐ IDE Security
Integrity and Data Encryption provides hardware-based encryption for data in transit, supporting confidential computing.
โก L0p Power State
New low-power state allowing dynamic bandwidth management without full link retraining.
๐ก Retimer Support
Enhanced support for signal regeneration devices allowing longer reach and improved signal integrity.
๐ UIO (Unordered I/O)
Allows out-of-order completion for improved performance in certain workloads.
| Generation | Data Rate | Encoding | x16 Bandwidth | Year |
|---|---|---|---|---|
| PCIe 1.0 | 2.5 GT/s | 8b/10b | 4 GB/s | 2003 |
| PCIe 2.0 | 5 GT/s | 8b/10b | 8 GB/s | 2007 |
| PCIe 3.0 | 8 GT/s | 128b/130b | 16 GB/s | 2010 |
| PCIe 4.0 | 16 GT/s | 128b/130b | 32 GB/s | 2017 |
| PCIe 5.0 | 32 GT/s | 128b/130b | 64 GB/s | 2019 |
| PCIe 6.0 | 64 GT/s | 1b/1b (PAM4) | 128 GB/s | 2022 |
| PCIe 7.0 | 128 GT/s | 1b/1b (PAM4) | 512 GB/s | 2025 |