Device discovery, bus assignment, BAR configuration, resource allocation, and topology building
Enumeration is the software process of discovering all PCIe devices in a system, assigning bus numbers, configuring Base Address Registers (BARs), allocating resources (memory, I/O), and building the device tree. It is performed by firmware (BIOS/UEFI) and/or operating system.
Enumeration uses Configuration Read/Write TLPs through the Transaction Layer. The Physical Layer only provides the trained link; it has no involvement in device discovery or configuration.
PCIe Enumeration Process Overview
═══════════════════════════════════════════════════════════════════════════════
┌─────────────────────────────────────────────────────────────────────────┐
│ Step 1: DEVICE DISCOVERY │
│ • Scan all possible Bus/Device/Function combinations │
│ • Read Vendor ID to detect device presence │
│ • Build device tree structure │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Step 2: BUS NUMBER ASSIGNMENT │
│ • Assign Primary/Secondary/Subordinate bus numbers to bridges │
│ • Depth-first traversal of topology │
│ • Update subordinate numbers on backtrack │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Step 3: BAR SIZING │
│ • Write 0xFFFFFFFF to each BAR │
│ • Read back to determine size and type │
│ • Record resource requirements │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Step 4: RESOURCE ALLOCATION │
│ • Allocate memory ranges (MMIO, prefetchable) │
│ • Allocate I/O port ranges │
│ • Configure bridge windows │
│ • Write final addresses to BARs │
└─────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Step 5: ENABLE DEVICES │
│ • Set Memory Space Enable, I/O Space Enable, Bus Master Enable │
│ • Configure interrupts (MSI/MSI-X) │
│ • Enable device-specific features │
└─────────────────────────────────────────────────────────────────────────┘
Device Discovery Pseudocode ═══════════════════════════════════════════════════════════════════════════════ function enumerate_bus(bus_number): // Scan all 32 possible devices on this bus for device in 0..31: // Check function 0 first vendor_id = config_read(bus, device, 0, VENDOR_ID) if vendor_id == 0xFFFF: continue // No device present // Device found - configure it configure_function(bus, device, 0) // Check if multi-function device header_type = config_read(bus, device, 0, HEADER_TYPE) if (header_type & 0x80): // Multi-function bit set for function in 1..7: vendor_id = config_read(bus, device, function, VENDOR_ID) if vendor_id != 0xFFFF: configure_function(bus, device, function) function configure_function(bus, device, function): // Read device identification vendor_id = config_read(bus, device, function, VENDOR_ID) device_id = config_read(bus, device, function, DEVICE_ID) class_code = config_read(bus, device, function, CLASS_CODE) header_type = config_read(bus, device, function, HEADER_TYPE) & 0x7F // Check if this is a bridge if header_type == 0x01: // Type 1 header = Bridge configure_bridge(bus, device, function) else: // Type 0 header = Endpoint configure_endpoint(bus, device, function)
| Vendor ID Value | Meaning | Action |
|---|---|---|
0xFFFF |
No device present | Skip to next device/function |
0x0001 |
CRS (Configuration Retry Status) | Retry after delay (device initializing) |
| Valid ID | Device present and ready | Continue enumeration |
Type 1 Header Bus Number Fields (Offsets 18h-1Bh) ═══════════════════════════════════════════════════════════════════════════════ Offset 18h: Primary Bus Number └── Bus number of the port on the upstream side of the bridge Offset 19h: Secondary Bus Number └── Bus number of the bus immediately downstream of the bridge Offset 1Ah: Subordinate Bus Number └── Highest numbered bus downstream of this bridge Offset 1Bh: Secondary Latency Timer (legacy, typically 0) Example: Root Complex (Bus 0) │ ┌─────┴─────┐ │ Bridge │ Primary=0, Secondary=1, Subordinate=4 │ 00:01.0 │ └─────┬─────┘ │ Bus 1 ┌─────────────┼─────────────┐ │ │ │ ┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐ │ Endpoint │ │ Bridge │ │ Bridge │ │ 01:00.0 │ │ 01:01.0 │ │ 01:02.0 │ └───────────┘ │ P=1,S=2, │ │ P=1,S=3, │ │ Sub=2 │ │ Sub=4 │ └─────┬─────┘ └─────┬─────┘ │ │ ┌─────┴─────┐ ┌─────┴─────┐ │ Endpoint │ │ Bridge │ │ 02:00.0 │ │ 03:00.0 │ └───────────┘ │ P=3,S=4, │ │ Sub=4 │ └─────┬─────┘ │ ┌─────┴─────┐ │ Endpoint │ │ 04:00.0 │ └───────────┘
Depth-First Bus Number Assignment ═══════════════════════════════════════════════════════════════════════════════ function configure_bridge(bus, device, function): // Get next available bus number secondary_bus = next_bus_number++ // Set Primary = current bus config_write(bus, device, function, PRIMARY_BUS, bus) // Set Secondary = new bus config_write(bus, device, function, SECONDARY_BUS, secondary_bus) // Temporarily set Subordinate to max (255) to allow access config_write(bus, device, function, SUBORDINATE_BUS, 255) // Recursively enumerate downstream bus enumerate_bus(secondary_bus) // Now we know the highest bus number used downstream subordinate_bus = next_bus_number - 1 // Update subordinate to actual value config_write(bus, device, function, SUBORDINATE_BUS, subordinate_bus) Important Rules: • Bus 0 is always the Root Complex • Bus numbers must be contiguous within a hierarchy • Subordinate ≥ Secondary • Maximum 256 buses (0-255)
| Bit 0 | Bits 2:1 (Memory) | Type | Description |
|---|---|---|---|
| 0 | 00 | Memory, 32-bit | 32-bit address space |
| 0 | 10 | Memory, 64-bit | 64-bit, uses two BAR slots |
| 0 | 01 | Memory (reserved) | Legacy 1MB addressing |
| 1 | - | I/O | I/O port address space |
BAR Size Determination Process ═══════════════════════════════════════════════════════════════════════════════ Step 1: Save original BAR value original = config_read(bus, dev, func, BAR_N) Step 2: Write all 1s to BAR config_write(bus, dev, func, BAR_N, 0xFFFFFFFF) Step 3: Read back the value sizing_value = config_read(bus, dev, func, BAR_N) Step 4: Decode size and type if sizing_value == 0: BAR is not implemented else: // Clear type bits, invert, add 1 size = (~(sizing_value & ~0xF) + 1) Step 5: Restore original (or leave for later assignment) config_write(bus, dev, func, BAR_N, original) Example - Memory BAR Sizing: Write 0xFFFFFFFF to BAR Read back: 0xFFF00000 Mask type bits: 0xFFF00000 & ~0xF = 0xFFF00000 Invert: ~0xFFF00000 = 0x000FFFFF Add 1: 0x000FFFFF + 1 = 0x00100000 = 1 MB Bit 3 (Prefetchable): (original >> 3) & 1 Bits 2:1 (Type): (original >> 1) & 3 (00=32bit, 10=64bit) 64-bit BAR Example: BAR[N] sizing = 0xFFC00000, type bits show 64-bit BAR[N+1] sizing = 0xFFFFFFFF (upper 32 bits) Combined: 0xFFFFFFFF_FFC00000 Size = ~0xFFFFFFFF_FFC00000 + 1 = 0x00400000 = 4 MB
Bridge Resource Windows (Type 1 Header) ═══════════════════════════════════════════════════════════════════════════════ I/O Window (Offsets 1Ch-1Dh, 30h-33h): ┌──────────────────────────────────────────────────────────────────────────┐ │ I/O Base (1Ch): Upper 4 bits of 16-bit I/O base address │ │ I/O Limit (1Dh): Upper 4 bits of 16-bit I/O limit address │ │ I/O Base Upper (30h-31h): Upper 16 bits (for 32-bit I/O) │ │ I/O Limit Upper (32h-33h): Upper 16 bits (for 32-bit I/O) │ │ Granularity: 4KB (12-bit aligned) │ └──────────────────────────────────────────────────────────────────────────┘ Memory Window (Offsets 20h-23h): ┌──────────────────────────────────────────────────────────────────────────┐ │ Memory Base (20h-21h): Upper 12 bits of 32-bit base │ │ Memory Limit (22h-23h): Upper 12 bits of 32-bit limit │ │ Granularity: 1MB (20-bit aligned) │ │ Used for: Non-prefetchable MMIO │ └──────────────────────────────────────────────────────────────────────────┘ Prefetchable Memory Window (Offsets 24h-2Bh): ┌──────────────────────────────────────────────────────────────────────────┐ │ Prefetch Base (24h-25h): Upper 12 bits of base │ │ Prefetch Limit (26h-27h): Upper 12 bits of limit │ │ Prefetch Base Upper (28h-2Bh): Upper 32 bits (64-bit capable) │ │ Prefetch Limit Upper (2Ch-2Fh): Upper 32 bits │ │ Granularity: 1MB │ │ Used for: Prefetchable MMIO (GPU VRAM, etc.) │ └──────────────────────────────────────────────────────────────────────────┘
Bottom-Up Resource Allocation ═══════════════════════════════════════════════════════════════════════════════ Phase 1: Calculate Requirements (Bottom-Up) For each device/bridge (leaf to root): • Sum all BAR requirements • For bridges: sum all downstream requirements • Align to bridge window granularity (1MB for memory) • Track prefetchable vs non-prefetchable separately Phase 2: Assign Addresses (Top-Down) Starting from Root Complex with available ranges: • Allocate largest requests first (reduce fragmentation) • Assign base address to each resource • Configure bridge windows to span downstream devices • Write addresses to BARs Example Resource Map: System Memory: 0x00000000 - 0xBFFFFFFF (3GB) MMIO Region: 0xC0000000 - 0xFFFFFFFF (1GB) Bridge 1 Window: 0xC0000000 - 0xCFFFFFFF (256MB) ├── GPU BAR0: 0xC0000000 - 0xC7FFFFFF (128MB prefetch) ├── GPU BAR2: 0xC8000000 - 0xC800FFFF (64KB non-prefetch) └── NIC BAR0: 0xC8010000 - 0xC801FFFF (64KB non-prefetch) Bridge 2 Window: 0xD0000000 - 0xD0FFFFFF (16MB) └── NVMe BAR0: 0xD0000000 - 0xD0003FFF (16KB)
Command Register (Offset 04h) - Enable Device ═══════════════════════════════════════════════════════════════════════════════ Bit │ Name │ Description ────┼─────────────────────────┼─────────────────────────────────────────── 0 │ I/O Space Enable │ Respond to I/O BAR accesses 1 │ Memory Space Enable │ Respond to Memory BAR accesses 2 │ Bus Master Enable │ Allow device to initiate transactions 3 │ Special Cycles │ (Legacy, usually 0) 4 │ Memory Write & Inval │ (Legacy, usually 0) 5 │ VGA Palette Snoop │ (Legacy, usually 0) 6 │ Parity Error Response │ Enable parity error reporting 7 │ Reserved │ 8 │ SERR# Enable │ Enable system error reporting 9 │ Fast B2B Enable │ (Legacy, usually 0) 10 │ INTx Disable │ Disable legacy interrupts (use MSI) 11+ │ Reserved │ Typical Enumeration Sequence: 1. Initial state: Command = 0x0000 (all disabled) 2. Configure BARs: (still disabled, safe to program) config_write(BAR0, allocated_address) 3. Enable device: config_write(COMMAND, 0x0006) // Memory + Bus Master Or for device with I/O BARs: config_write(COMMAND, 0x0007) // I/O + Memory + Bus Master
Multi-Function Device Detection ═══════════════════════════════════════════════════════════════════════════════ Header Type Register (Offset 0Eh): Bit 7: Multi-Function Device flag Bits 6:0: Header Type (0 = Endpoint, 1 = Bridge, 2 = CardBus) Enumeration Logic: header_type = config_read(bus, dev, 0, 0x0E) if (header_type & 0x80): // Multi-function: scan functions 0-7 for func in 0..7: if config_read(bus, dev, func, VENDOR_ID) != 0xFFFF: configure_function(bus, dev, func) else: // Single function: only function 0 exists configure_function(bus, dev, 0) Example: Multi-Function Network Adapter Bus 3, Device 0: Function 0: Ethernet Port 1 (vendor_id = 0x8086) Function 1: Ethernet Port 2 (vendor_id = 0x8086) Function 2: 0xFFFF (not present) Function 3: 0xFFFF (not present) ... Function 7: 0xFFFF (not present) ARI (Alternative Routing-ID Interpretation): For SR-IOV devices, ARI extends Function field to 8 bits: Standard: 5-bit Device + 3-bit Function = 8 functions max ARI: 8-bit Function = 256 functions max Requires: • Device ARI capability • Upstream port ARI forwarding enabled
Hot-Plug Enumeration Sequence ═══════════════════════════════════════════════════════════════════════════════ Device Insertion: 1. Physical Connection │ User inserts card into slot │ MRL sensor detects (if present) │ Attention Button pressed (if required) ▼ 2. Power-Up Sequence │ Slot Controller powers up slot │ Power Indicator set to ON │ Device performs internal initialization ▼ 3. Link Training │ Physical Layer LTSSM: Detect → Polling → Config → L0 │ Data Link Layer: FC Init → DL_Active │ Hot-Plug Controller receives Data Link Layer State Changed ▼ 4. OS Notification │ Hot-Plug interrupt generated (MSI or INTx) │ OS reads Slot Status register │ Presence Detect Changed / DLL State Changed bits set ▼ 5. Enumeration │ OS performs configuration access to new device │ May return CRS initially (device not ready) │ Eventually returns valid Vendor ID │ OS allocates resources (may require rebalancing) │ Device driver loaded ▼ 6. Device Ready │ Device fully operational Device Removal: 1. Attention Button pressed (orderly) or card pulled (surprise) 2. Data Link Layer goes down (DLL State Changed) 3. OS notified via interrupt 4. OS quiesces driver, releases resources 5. Slot power turned off (orderly removal) 6. Resources freed for reallocation
SR-IOV VF Enumeration ═══════════════════════════════════════════════════════════════════════════════ PF (Physical Function) Enumeration: • Standard enumeration discovers PF as normal endpoint • SR-IOV Extended Capability indicates VF support • PF BARs configured normally VF Creation Process: 1. Read SR-IOV Capability ├── TotalVFs: Maximum VFs supported ├── InitialVFs: Initial allocation ├── VF Offset: First VF Routing ID offset └── VF Stride: Routing ID increment between VFs 2. Configure VF BARs (in SR-IOV capability) ├── VF BAR0-5: Size determined like standard BARs └── Each VF gets same-sized slice of VF BAR space 3. Set NumVFs (number of VFs to create) 4. Set VF Enable = 1 └── VFs appear at calculated Routing IDs VF Routing ID Calculation: VF_RID[n] = PF_RID + VF_Offset + (n × VF_Stride) Example: PF at Bus 5, Device 0, Function 0 (RID = 0x500) VF Offset = 0x100 VF Stride = 0x001 VF0: 0x500 + 0x100 + 0×1 = 0x600 → Bus 6, Dev 0, Func 0 VF1: 0x500 + 0x100 + 1×1 = 0x601 → Bus 6, Dev 0, Func 1 VF2: 0x500 + 0x100 + 2×1 = 0x602 → Bus 6, Dev 0, Func 2 ... VF Configuration Space: • VFs have minimal configuration space • BARs read from SR-IOV capability, calculated per-VF • No capability list (capabilities from PF) • Vendor/Device ID in SR-IOV capability
Linux PCIe Enumeration Flow ═══════════════════════════════════════════════════════════════════════════════ Boot-time Flow: BIOS/UEFI performs initial enumeration │ ▼ Linux kernel starts │ ▼ pci_subsys_init() │ ├── pcibios_init() // Architecture-specific init │ ├── pci_acpi_init() // Parse MCFG table for ECAM │ └── pci_scan_root_bus() // Scan from root complex │ ├── pci_scan_child_bus() │ │ │ └── pci_scan_slot() │ │ │ └── pci_scan_single_device() │ │ │ └── pci_device_add() │ └── pci_assign_unassigned_resources() │ └── pci_bus_assign_resources() Useful Commands: # List all PCI devices lspci -v # Show tree view with bus numbers lspci -tv # Show device configuration space lspci -xxx -s 00:1f.0 # Show resource allocation cat /proc/iomem cat /proc/ioports # Rescan PCI bus echo 1 > /sys/bus/pci/rescan # Remove device echo 1 > /sys/bus/pci/devices/0000:03:00.0/remove