VIRTUALIZATION DEEP-DIVE

SR-IOV & Virtualization Complete Guide

Physical Functions, Virtual Functions, ARI, ATS, PRI, PASID, and SIOV

1. What is SR-IOV?

What is Single Root I/O Virtualization?

SR-IOV (Single Root I/O Virtualization) is a PCIe capability that allows a single physical device to present itself as multiple virtual devices. Each virtual device can be directly assigned to a virtual machine (VM), providing near-native I/O performance.

Key Concepts

2. SR-IOV Architecture

    ┌─────────────────────────────────────────────────────────────────┐
    │                           HOST SYSTEM                           │
    │                                                                 │
    │   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐           │
    │   │  VM 1   │  │  VM 2   │  │  VM 3   │  │  VM 4   │           │
    │   │  VF 0   │  │  VF 1   │  │  VF 2   │  │  VF 3   │           │
    │   └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘           │
    │        │            │            │            │                 │
    │        └────────────┴─────┬──────┴────────────┘                 │
    │                           │                                     │
    │                     ┌─────┴─────┐                               │
    │                     │   IOMMU   │                               │
    │                     └─────┬─────┘                               │
    │                           │                                     │
    │                    ┌──────┴──────┐                              │
    │                    │ Root Complex │                              │
    │                    └──────┬──────┘                              │
    └───────────────────────────┼─────────────────────────────────────┘
                                │
                    ┌───────────┴───────────┐
                    │   SR-IOV NIC Device   │
                    │                       │
                    │  ┌─────┐  ┌─────────┐│
                    │  │ PF  │  │VF0│VF1│..││
                    │  │     │  │VF2│VF3│..││
                    │  └─────┘  └─────────┘│
                    │                       │
                    │  Physical Resources   │
                    │  (Queues, Buffers)    │
                    └───────────────────────┘

Physical Function (PF)

Virtual Function (VF)

3. SR-IOV Extended Capability

Offset Register Description
00h Extended Capability Header ID = 0010h
04h SR-IOV Capabilities VF Migration, ARI support
08h SR-IOV Control VF Enable, VF MSE, ARI
0Ah SR-IOV Status VF Migration Status
0Ch Initial VFs Initial VF count
0Eh Total VFs Maximum VFs supported
10h Num VFs Number of enabled VFs
12h Function Dependency Link Related PF index
14h First VF Offset RID offset to first VF
16h VF Stride RID increment between VFs
1Ah VF Device ID Device ID for VFs
1Ch Supported Page Sizes Page size capabilities
20h System Page Size Configured page size
24h+ VF BAR0-5 VF Base Address Registers

4. VF Routing ID Calculation

VF RID Formula

    VF_RID = PF_RID + First_VF_Offset + (N × VF_Stride)
    
    Where:
    - PF_RID = Routing ID of Physical Function (Bus:Dev:Func)
    - First_VF_Offset = From SR-IOV capability
    - N = VF index (0, 1, 2, ...)
    - VF_Stride = RID increment between consecutive VFs
    
    Example:
    PF_RID = 02:00.0 (Bus 2, Device 0, Function 0)
    First_VF_Offset = 0x100 (256)
    VF_Stride = 0x01
    
    VF0_RID = 0x0200 + 0x100 + 0 = 0x0300 = 03:00.0
    VF1_RID = 0x0200 + 0x100 + 1 = 0x0301 = 03:00.1
    VF2_RID = 0x0200 + 0x100 + 2 = 0x0302 = 03:00.2

5. ARI (Alternative Routing-ID Interpretation)

What is ARI?

ARI reinterprets the traditional 8-bit Device + 3-bit Function fields as a single 8-bit Function field, enabling up to 256 functions per device (vs 8 traditional).

ARI vs Traditional

Mode Device Bits Function Bits Max Functions
Traditional 5 bits (0-31) 3 bits (0-7) 8 per device
ARI 0 (ignored) 8 bits 256 per device

Why ARI for SR-IOV?

6. ATS (Address Translation Services)

What is ATS?

ATS allows PCIe devices to request address translations from the IOMMU, cache the translations, and send pre-translated addresses in TLPs, reducing IOMMU overhead.

ATS Flow

    Device                    IOMMU                    Memory
       │                        │                        │
       │ 1. Translation Request │                        │
       │ ──────────────────────►│                        │
       │    (Virtual Address)   │                        │
       │                        │                        │
       │ 2. Translation Response│                        │
       │ ◄──────────────────────│                        │
       │    (Physical Address)  │                        │
       │                        │                        │
       │ 3. DMA with AT=Translated                       │
       │ ────────────────────────────────────────────────►
       │    (Physical Address, bypasses IOMMU lookup)    │
       │                        │                        │

ATS TLP Types

7. PRI (Page Request Interface)

What is PRI?

PRI enables devices to request that the OS make pages present when an ATS translation fails due to page fault. This enables on-demand paging for device memory access.

PRI Flow

  1. Device requests translation via ATS
  2. IOMMU returns "page not present" error
  3. Device sends Page Request via PRI
  4. OS handles page fault, makes page present
  5. IOMMU sends Page Response to device
  6. Device retries ATS translation (succeeds)

8. PASID (Process Address Space ID)

What is PASID?

PASID is a TLP Prefix that identifies a specific process address space, enabling devices to perform DMA to/from process virtual addresses with per-process isolation.

PASID Usage

9. SIOV (Scalable IOV)

What is Scalable IOV?

SIOV is the next-generation IOV architecture that provides more flexible and scalable virtualization using Scalable Device Interfaces (SDIs) instead of discrete VFs.

SIOV vs SR-IOV

Aspect SR-IOV SIOV
Virtualization Unit VF (discrete function) SDI (scalable interface)
BDF Required Yes (per VF) No (uses PASID)
Scalability Limited by BDF space Much higher (PASID space)
Configuration Full config space per VF Minimal config
Migration Complex Simpler (software-based)

SDI (Scalable Device Interface)

10. System Configuration

SR-IOV Enable Sequence

  1. Enumerate PF, detect SR-IOV capability
  2. Configure system page size
  3. Configure VF BAR sizes
  4. Set NumVFs to desired count
  5. Enable ARI if supported
  6. Set VF Enable bit
  7. Wait for VFs to appear
  8. Configure each VF (BARs, etc.)
  9. Assign VFs to VMs via IOMMU

IOMMU Integration