Device addressing, BDF routing, and TLP forwarding through PCIe fabric
Routing ID (RID) is a 16-bit identifier that uniquely identifies every function in a PCIe hierarchy. It consists of Bus Number, Device Number, and Function Number (BDF).
Standard Routing ID (16 bits):
┌─────────────────────────────────────────────────────────────────┐
│ Bit 15:8 │ Bit 7:3 │ Bit 2:0 │
│ Bus Number │ Device Number│ Function Num │
│ (8 bits) │ (5 bits) │ (3 bits) │
└─────────────────────────────────────────────────────────────────┘
Max values:
- 256 Buses (0-255)
- 32 Devices per bus (0-31)
- 8 Functions per device (0-7)
Total: 256 × 32 × 8 = 65,536 unique functions
Example: Bus 5, Device 0, Function 2 = 05:00.2 = 0x0502
ARI extends the function number to 8 bits by eliminating the device number field, allowing up to 256 functions per device. Used primarily for SR-IOV Virtual Functions.
ARI Routing ID (16 bits):
┌─────────────────────────────────────────────────────────────────┐
│ Bit 15:8 │ Bit 7:0 │
│ Bus Number │ Function Number │
│ (8 bits) │ (8 bits) │
└─────────────────────────────────────────────────────────────────┘
Benefits:
- 256 Functions per endpoint (vs 8)
- Essential for SR-IOV (many VFs per PF)
- Device number always 0
| Routing Type | Uses | TLP Types |
|---|---|---|
| Address Routing | Memory/IO address | MRd, MWr, IORd, IOWr |
| ID Routing | Bus/Device/Function | CfgRd, CfgWr, Cpl, Some Msg |
| Implicit Routing | Direction-based | Messages (to RC, broadcast) |
Memory/IO TLP with address routing:
┌─────────────────────────────────────────────────────────────────┐
│ Address Field (32 or 64 bit) │
│ Compared against BAR ranges at each switch/endpoint │
└─────────────────────────────────────────────────────────────────┘
Switch routing decision:
Incoming TLP ┌─────────────┐
Addr = 0xFE000000 ────────────►│ Switch │
│ │
Check port address ranges: │ Port 0: No │
- Port 0: 0xF0000000-0xF0FFFFFF│ Port 1: Yes │──► Forward to Port 1
- Port 1: 0xFE000000-0xFEFFFFFF│ Port 2: No │
- Port 2: 0xFF000000-0xFFFFFFFF└─────────────┘
Configuration/Completion TLP with ID routing:
┌─────────────────────────────────────────────────────────────────┐
│ Requester/Completer ID │ Bus │ Device │ Function │ │
└─────────────────────────────────────────────────────────────────┘
Switch routing decision:
CfgWr to Bus 5, Dev 0, Func 0
┌─────────────┐
BDF = 05:00.0 ────────────────►│ Switch │
│ (Bus 3) │
Check secondary/subordinate: │ │
- Port 0: Sec=4, Sub=4 │ Port 0: No │
- Port 1: Sec=5, Sub=7 │ Port 1: Yes │──► Forward to Port 1
- Port 2: Sec=8, Sub=8 │ Port 2: No │
└─────────────┘
TLP Header DW1 (typical):
┌─────────────────────────────────────────────────────────────────┐
│ Bit 31:16 │ Bit 15:8 │ Bit 7:0 │
│ Requester ID (BDF) │ Tag │ Last/First BE │
└─────────────────────────────────────────────────────────────────┘
Purpose:
- Identifies transaction originator
- Used by completer for routing completion back
- Used for ACS/IOMMU security checks
Completion Header:
┌─────────────────────────────────────────────────────────────────┐
│ Completer ID (BDF) │ Status │ BCM │ Byte Count │
├─────────────────────────────────────────────────────────────────┤
│ Requester ID (BDF) │ Tag │ Lower Address │
└─────────────────────────────────────────────────────────────────┘
- Completer ID: Who completed the request
- Requester ID: Where to route the completion
- Tag: Match with original request
| Register | Description | Usage |
|---|---|---|
| Primary Bus Number | Bus number of upstream port | Routing upstream |
| Secondary Bus Number | Bus immediately downstream | Routing downstream |
| Subordinate Bus Number | Highest bus behind this bridge | Range for downstream routing |
Bridge (Pri=2, Sec=3, Sub=7)
Incoming TLP Target Bus:
Bus < Pri (e.g., Bus 1):
If from downstream: Route upstream
If from upstream: Error (shouldn't happen)
Bus == Pri (Bus 2):
Target is on primary bus (upstream)
Route upstream
Sec <= Bus <= Sub (Bus 3-7):
Target is downstream of this bridge
Route to secondary (downstream)
Bus > Sub (e.g., Bus 8):
Target is not behind this bridge
Route upstream (let parent handle it)
Root Complex
│
┌────┴────┐
│ Switch │ (Bus 1, Pri=0, Sec=1, Sub=10)
│ USP │
└────┬────┘
┌────┴────────────┬──────────────────┐
│ │ │
DSP 0 DSP 1 DSP 2
(Sec=2,Sub=4) (Sec=5,Sub=7) (Sec=8,Sub=10)
│ │ │
GPU 0 GPU 1 NVMe
(02:00.0) (05:00.0) (08:00.0)
GPU 0 to GPU 1 P2P:
1. GPU 0 sends MWr to GPU 1's BAR address
2. TLP arrives at DSP 0 (upstream)
3. Switch routes internally (address-based)
4. TLP exits DSP 1 (downstream)
5. GPU 1 receives write
No Root Complex involvement!
| r[2:0] | Type | Behavior |
|---|---|---|
| 000 | To Root Complex | Always routes upstream to RC |
| 001 | By Address | Uses address routing |
| 010 | By ID | Uses ID routing (BDF) |
| 011 | Broadcast | RC broadcasts to all downstream |
| 100 | Local - terminate | Terminates at receiver |
| 101 | Gather to RC | Collected upstream |