Chapter 1: Introduction & Overview

Understanding PCI Express - From Basics to Architecture

What is PCI Express?

๐ŸŽฏ Explaining Like You're 5

Imagine your computer is a city, and different parts of the computer (like the graphics card, storage, and network card) are buildings. PCI Express is like a super-fast highway system that connects all these buildings, allowing them to send packages (data) to each other really quickly!

PCI Express (Peripheral Component Interconnect Express), often abbreviated as PCIe, is a high-speed serial computer expansion bus standard designed to replace older parallel bus standards like PCI, PCI-X, and AGP.

Key characteristics of PCIe include:

Why was PCI Express Created?

The Problem with Parallel Buses

Traditional PCI used parallel communication where multiple bits traveled simultaneously on separate wires. While this seems faster, at high speeds the signals would interfere with each other (crosstalk), and timing became nearly impossible to maintain. The solution was serial communication - sending one bit at a time, but at much higher speeds.

Benefits of PCIe over PCI

Feature PCI (Parallel) PCIe (Serial)
Architecture Shared bus Point-to-point links
Bandwidth 133 MB/s (shared) Up to 512 GB/s (PCIe 7.0 x16)
Scalability Limited Highly scalable (lanes)
Full Duplex No Yes
Hot Plug Limited support Native support

A PCIe Link is the connection between two components. Each link consists of one or more Lanes.

Device A (Transmitter) Device B (Receiver) TX (Lane 0) RX (Lane 0) PCIe Link (x2 Example)

๐Ÿ“Š Lane Configurations

PCIe supports different link widths: x1, x2, x4, x8, x12, x16, and x32. Each lane provides full-duplex communication - meaning data can flow in both directions simultaneously. A x16 link has 16 lanes, providing 16ร— the bandwidth of a x1 link.

Bandwidth Calculation

Bandwidth = (Data Rate ร— Lanes ร— Encoding Efficiency) / 8 Example for PCIe 7.0 x16: Bandwidth = (128 GT/s ร— 16 lanes ร— 242/256) / 8 = ~242 GB/s per direction Total Bidirectional = ~484 GB/s

PCI Express Fabric Topology

PCIe uses a tree-like topology where the Root Complex sits at the top (connected to the CPU), and devices branch out below it through switches.

CPU Root Complex GPU (Endpoint) Switch NVMe (Endpoint) NIC (Endpoint) SSD (Endpoint) Root Complex Switch Endpoint

Root Complex

๐Ÿ›๏ธ What is it?

The Root Complex (RC) connects the PCIe hierarchy to the CPU and memory subsystem. It's the "top" of the PCIe tree and generates all configuration transactions.

๐Ÿ“ Location

Typically integrated into the CPU or chipset. Modern CPUs have the Root Complex built directly into the processor die.

โš™๏ธ Functions

  • Initiates configuration cycles
  • Routes transactions to/from CPU
  • Contains Root Ports
  • Handles system errors

Endpoints

Endpoints are the final destinations in the PCIe hierarchy - they are the actual devices that perform useful work (GPUs, SSDs, NICs, etc.).

Types of Endpoints

Type Description I/O Requests Examples
PCI Express Endpoint Native PCIe device, modern design Must not generate Modern GPUs, NVMe SSDs
Legacy Endpoint PCI-compatible device in PCIe May generate Legacy adapters
RCiEP Root Complex Integrated Endpoint Implementation specific Integrated audio, USB controller

Switches

A PCIe Switch expands the PCIe fabric, allowing multiple endpoints to connect where only one port exists. Switches route packets between their ports.

Switch Architecture

A switch has one Upstream Port (facing the Root Complex) and one or more Downstream Ports (facing endpoints or other switches). Internally, it appears as a virtual PCI-to-PCI bridge for each port.

Bridges

Bridges connect PCIe to other bus types:

The Three-Layer Architecture

PCIe uses a layered protocol stack similar to networking protocols. Each layer has specific responsibilities and communicates with adjacent layers through defined interfaces.

Transaction Layer TLPs, Flow Control, Ordering Data Link Layer DLLPs, ACK/NAK, Retry, CRC Physical Layer Encoding, LTSSM, Electrical Signaling Transmit Path Receive Path

Transaction Layer

Transaction Layer - The "What" and "Where"

The Transaction Layer is responsible for assembling and disassembling Transaction Layer Packets (TLPs). It defines what data is being transferred and where it's going.

Key Responsibilities:

  • TLP Generation: Creates packets for memory reads/writes, I/O, configuration, and messages
  • Flow Control: Manages buffer credits to prevent overflow
  • Ordering: Ensures transactions complete in the correct order
  • Virtual Channels: Provides multiple independent traffic streams

TLP Types:

  • Memory Read/Write
  • I/O Read/Write
  • Configuration Read/Write
  • Messages (interrupts, errors, power management)
  • Completions

The Data Link Layer ensures reliable data transfer across the link using acknowledgements, sequence numbers, and CRC error detection.

Key Responsibilities:

  • LCRC Generation: Adds Link CRC for error detection
  • Sequence Numbers: Tracks packet order
  • ACK/NAK Protocol: Confirms successful receipt
  • Retry Mechanism: Retransmits corrupted packets
  • DLLP Management: Handles Data Link Layer Packets

DLLP Types:

  • Ack/Nak for TLP acknowledgement
  • Flow Control updates
  • Power Management DLLPs

Physical Layer

Physical Layer - Bits on the Wire

The Physical Layer handles the actual transmission of bits over the physical medium, including encoding, scrambling, and electrical signaling.

Key Responsibilities:

  • Encoding: 8b/10b, 128b/130b, or 1b/1b (PAM4) depending on data rate
  • Scrambling: Randomizes data for better signal integrity
  • LTSSM: Link Training and Status State Machine
  • Equalization: Compensates for channel loss at high speeds
  • Ordered Sets: Special patterns for link management

Sub-blocks:

  • Logical Sub-block: Encoding, framing, scrambling
  • Electrical Sub-block: Transmitters, receivers, signaling

PCIe 7.0 Key Features

๐Ÿš€ 128 GT/s Data Rate

Doubles the bandwidth of PCIe 6.0, achieving up to 512 GB/s with x16 links using PAM4 signaling.

๐Ÿ“ฆ Flit Mode

Fixed 256-byte packets with integrated Forward Error Correction (FEC) for improved efficiency at high speeds.

๐Ÿ” IDE Security

Integrity and Data Encryption provides hardware-based encryption for data in transit, supporting confidential computing.

โšก L0p Power State

New low-power state allowing dynamic bandwidth management without full link retraining.

๐Ÿ“ก Retimer Support

Enhanced support for signal regeneration devices allowing longer reach and improved signal integrity.

๐Ÿ”„ UIO (Unordered I/O)

Allows out-of-order completion for improved performance in certain workloads.

Generation Data Rate Encoding x16 Bandwidth Year
PCIe 1.0 2.5 GT/s 8b/10b 4 GB/s 2003
PCIe 2.0 5 GT/s 8b/10b 8 GB/s 2007
PCIe 3.0 8 GT/s 128b/130b 16 GB/s 2010
PCIe 4.0 16 GT/s 128b/130b 32 GB/s 2017
PCIe 5.0 32 GT/s 128b/130b 64 GB/s 2019
PCIe 6.0 64 GT/s 1b/1b (PAM4) 128 GB/s 2022
PCIe 7.0 128 GT/s 1b/1b (PAM4) 512 GB/s 2025