PCIe 6.0/7.0 NEW FEATURE

UIO (Unordered I/O) Complete Deep-Dive

Comprehensive technical analysis covering What, Why, When, Where, How, System Applications, and all normative Rules

Table of Contents

  1. What is UIO?
  2. Why was UIO introduced?
  3. When to use UIO?
  4. Where does UIO apply in the system?
  5. How does UIO work?
  6. UIO TLP Types
  7. UIO Header Formats
  8. UIO Completion Rules
  9. UIO Ordering Model
  10. UIO Flow Control
  11. UIO and 14-Bit Tags
  12. System-Level Applications
  13. Normative Rules Summary
  14. UIO vs Traditional Posted/Non-Posted

1. What is UIO (Unordered I/O)?

What is UIO?

Unordered I/O (UIO) is a new transaction type introduced in PCIe 6.0 and enhanced in PCIe 7.0 that allows memory read and write operations to complete in any order, regardless of when they were issued. Unlike traditional PCIe transactions where ordering rules (producer-consumer model) must be enforced, UIO transactions explicitly allow out-of-order completion.

Technical Definition

UIO provides a mechanism where:

Key Characteristics

UIO Fundamental Properties

  • Completion Required: Both UIO Reads and UIO Writes require Completions
  • No Ordering Enforcement: Fabric does not enforce producer-consumer ordering
  • Requester Responsibility: The Requester is responsible for any needed ordering
  • Flit Mode Only: UIO transactions are only valid in Flit Mode
  • 14-bit Tags: UIO must use 14-bit Tags for transaction identification
  • Split Transactions: UIO Completions can be split or coalesced

2. Why was UIO Introduced?

Why is UIO needed?

Traditional PCIe ordering rules create significant complexity and performance overhead in the switching fabric. UIO removes this burden by allowing Requesters that don't need strict ordering to explicitly indicate this, enabling more efficient fabric implementations and higher throughput.

Problems with Traditional PCIe Ordering

1. Producer-Consumer Model Complexity

Traditional PCIe enforces strict ordering to support the producer-consumer programming model:

2. Performance Limitations

3. Modern Workload Requirements

UIO Benefits

Traditional PCIe Issues

  • Fabric must track ordering
  • Complex switch implementations
  • Head-of-line blocking
  • Posted writes = no ack
  • Limited parallelism

UIO Solutions

  • No fabric ordering needed
  • Simpler switch logic
  • No ordering dependencies
  • Write completions = reliability
  • Maximum parallelism

3. When to Use UIO?

When should UIO be used?

UIO is appropriate when the Requester either does not require ordering guarantees, or when the Requester's software can explicitly manage ordering through other mechanisms (fences, barriers, completion waiting).

Ideal Use Cases

1. CXL Memory Accesses

2. GPU and Accelerator Traffic

3. High-Performance Storage

When NOT to Use UIO

Avoid UIO When:

  • Producer-consumer ordering is required between writes and reads
  • Legacy software expects traditional PCIe ordering semantics
  • Configuration accesses (UIO is for memory operations only)
  • The fabric does not support end-to-end UIO capability
  • Operating in Non-Flit Mode (UIO requires Flit Mode)

4. Where Does UIO Apply?

Where in the system does UIO apply?

UIO applies to memory transactions in PCIe hierarchies where all components in the path support UIO capability. This requires end-to-end UIO support from Requester to Completer.

System Architecture Requirements

┌─────────────────────────────────────────────────────────────────────┐
│                         ROOT COMPLEX                                │
│  ┌──────────────┐                                                   │
│  │  Root Port   │◄── UIO Capability Required                        │
│  │  (UIO-aware) │                                                   │
│  └──────┬───────┘                                                   │
│         │ Flit Mode Link (64/128 GT/s)                              │
└─────────┼───────────────────────────────────────────────────────────┘
          │
          ▼
   ┌──────────────┐
   │    Switch    │◄── UIO forwarding capability required
   │  (UIO-aware) │    (all ports in the path)
   └──────┬───────┘
          │ Flit Mode Link (64/128 GT/s)
          ▼
   ┌──────────────┐
   │   Endpoint   │◄── UIO Requester or Completer capability
   │  (UIO-aware) │    14-bit Tag support required
   └──────────────┘

   ═══════════════════════════════════════════════════════════════════
   KEY: All components in UIO transaction path MUST support UIO
        All links MUST operate in Flit Mode
        All components MUST support 14-bit Tags for UIO

Topology Constraints

End-to-End Requirement

UIO requires that every component in the transaction path supports UIO:

Virtual Channel Mapping

UIO uses dedicated Virtual Channels for traffic segregation:

5. How Does UIO Work?

How does UIO transaction flow work?

UIO transactions follow a request-completion model where both reads and writes receive completions. The fabric forwards transactions without enforcing ordering, and completions can arrive in any order.

UIO Memory Write Flow

   Requester                    Completer
      │                            │
      │ 1. UIOMWr Request          │
      │ ──────────────────────────►│
      │   (14-bit Tag, Address,    │
      │    Length, Data)           │
      │                            │
      │                            │ 2. Write data to memory
      │                            │
      │ 3. UIOWrCpl Completion     │
      │ ◄──────────────────────────│
      │   (Tag, Completion Status, │
      │    Length)                 │
      │                            │
      │ 4. Requester matches Tag   │
      │    Mark transaction done   │
      │                            │

   NOTES:
   - Unlike traditional MWr, UIOMWr receives a completion
   - Completion indicates write has reached destination
   - Multiple UIOMWr with same Tag allowed (coalesced completion)
   - Length in completion = DWs completed (for accounting)

UIO Memory Read Flow

   Requester                    Completer
      │                            │
      │ 1. UIOMRd Request          │
      │ ──────────────────────────►│
      │   (14-bit Tag, Address,    │
      │    Length)                 │
      │                            │
      │                            │ 2. Read data from memory
      │                            │
      │ 3. UIORdCplD Completion    │
      │ ◄──────────────────────────│
      │   (Tag, Lower Address,     │
      │    Length, Data)           │
      │                            │
      │ 4. Match Tag, extract data │
      │                            │

   SPLIT COMPLETION EXAMPLE:
   ┌────────────────────────────────────────┐
   │ Original UIOMRd: 1024 bytes            │
   │ Completion 1: 256 bytes (LA=0x000)     │
   │ Completion 3: 512 bytes (LA=0x200)     │ ← Out of order!
   │ Completion 2: 256 bytes (LA=0x100)     │
   │ Total: 1024 bytes (transaction done)   │
   └────────────────────────────────────────┘

Transaction Identification

UIO transactions are identified by a Transaction ID consisting of:

Important: Tag Reuse in UIO

For UIO Memory Writes, multiple Requests with the same Transaction ID are allowed to be outstanding simultaneously. The Completion Length field is used to track how many DWs have been completed for proper accounting.

6. UIO TLP Types

UIO defines specific TLP Types that are only valid in Flit Mode:

TLP Type Type[7:0] Description Has Payload
UIOMRd 0010 0xxx UIO Memory Read Request No
UIOMWr 0110 0xxx UIO Memory Write Request Yes
UIORdCpl 0000 1011 UIO Read Completion (no data - error) No
UIORdCplD 0100 1011 UIO Read Completion with Data Yes
UIOWrCpl 0000 1010 UIO Write Completion No

Type Field Encoding

In Flit Mode, the Type[7:0] field fully identifies the TLP type:

7. UIO Header Formats

UIOMWr Header (UIO Memory Write Request)

Byte 0-1
Byte 2-3
Type[7:0] | OHC | TC
Attr | TS | Length[9:0]
Tag[13:0] | EP | Requester ID[15:0]
Reserved
Address[63:2] (or 32-bit)
First/Last BE

UIO Completion Headers

UIOWrCpl and UIORdCpl (Without Data)

Field
Description
Type
0000 1010 (UIOWrCpl) or 0000 1011 (UIORdCpl)
Length[9:0]
Number of DWs represented by this completion
Completer ID
BDF of the Completer
Tag[13:0]
Matches the request Tag
Destination BDF
Routes back to Requester
CDL[1:0]
CXL-defined field (Reserved for PCIe)

UIORdCplD (Read Completion with Data)

Includes all fields above plus:

8. UIO Completion Rules

General Completion Rules

Mandatory Completion Rules for UIO

  1. Both UIO Reads and UIO Writes MUST receive Completions
  2. Completers MUST return Completions for ALL DWs in a UIO Request (regardless of Completion Status)
  3. UIO Requesters MUST accept UIO Completions in ANY order
  4. Tag field value MUST match the corresponding Request Tag
  5. Byte Enables MUST NOT be considered when determining Length for UIO Completions

Read Completion Boundary (RCB) Rules

Write Completion Rules

Completion Status Handling

When UIO Completions with different statuses are coalesced:

Priority Status Rule
1 (Highest) UR (Unsupported Request) If ANY completion has UR → final status is UR
2 CA (Completer Abort) If no UR, but any CA → final status is CA
3 RRS (Retry) If no UR or CA, but any RRS → final status is RRS
4 (Lowest) SC (Success) If ALL completions are SC → final status is SC

Transaction Completion Determination

A UIO transaction is considered complete when:

// UIO Transaction Completion Algorithm Total_DW_Requested = Sum of all Length values in UIOMWr/UIOMRd Requests with same Transaction ID Total_DW_Completed = Sum of all Length values in Completions received with matching Tag Transaction_Complete = (Total_DW_Completed == Total_DW_Requested) // For Zero-Length UIO Write (Length=1, First BE=0000b) // One DW must be considered written

9. UIO Ordering Model

No Ordering Enforcement

Unlike traditional PCIe where the fabric enforces producer-consumer ordering, UIO explicitly removes this requirement:

Traditional PCIe Ordering

  • MRd must see all prior MWr
  • Completions ordered relative to requests
  • Fabric tracks and enforces ordering
  • Complex reordering buffers needed

UIO Model

  • No MRd/MWr ordering enforced
  • Completions can arrive any order
  • Fabric simply forwards
  • Requester manages ordering

Requester Responsibilities

When using UIO, the Requester takes on ordering responsibilities:

Ordering with Non-UIO Traffic

Mixed Traffic Ordering

UIO transactions and non-UIO transactions are on separate Virtual Channels. There are no ordering relationships enforced between UIO and non-UIO traffic. If ordering between UIO and non-UIO is needed, software must explicitly manage it.

10. UIO Flow Control

UIO Virtual Channels

UIO uses dedicated Virtual Channels with their own flow control credits:

Credit Types for UIO

Transaction Credit Type Used Notes
UIOMRd NP (Non-Posted) Standard NP credit for UIO VC
UIOMWr NP (Non-Posted) OR P (Posted) Can use either - configurable
UIORdCplD Cpl (Completion) UIO VC completion credits
UIOWrCpl Cpl (Completion) UIO VC completion credits

Flit Mode Credit Blocks

In Flit Mode, flow control uses credit blocks instead of individual credits:

11. UIO and 14-Bit Tags

Mandatory 14-Bit Tags

UIO requires 14-bit Tags for several reasons:

Tag Field Location

Bits
Field
Description
[13:10]
Tag[13:10]
Extended tag bits (Flit Mode)
[9:8]
Tag[9:8]
10-bit extension
[7:0]
Tag[7:0]
Base tag field
UIO Tag Range Recommendation

For UIO with 14-bit Tags, the recommended Tag range is 1024 to 16383. Tags 0-1023 should be avoided to maintain keep-out for 10-bit Tag compatibility.

12. System-Level Applications

CXL Memory Pooling

Primary use case for UIO is CXL memory systems:

High-Performance Computing

Storage Systems

Implementation Considerations

System Software Responsibilities

  • Verify UIO capability in all path components
  • Configure UIO VC mappings appropriately
  • Enable 14-bit Tags in all UIO-capable Functions
  • Manage ordering explicitly when needed
  • Handle completion status aggregation in drivers

13. Normative Rules Summary

UIO Capability Rules

  1. UIO MUST only be used in Flit Mode
  2. UIO MUST use 14-bit Tags
  3. End-to-end UIO support MUST exist (Requester to Completer)
  4. If PF/VF supports UIO as Completer, 14-bit Tags MUST be supported

UIO Request Rules

  1. UIOMRd and UIOMWr MUST use UIO TLP Types
  2. Attr[2:0] (IDO, RO, NS) are Reserved in UIO
  3. Multiple UIOMWr with same Tag allowed simultaneously
  4. Zero-Length UIOMWr (Length=1, First BE=0000b) counts as 1 DW

UIO Completion Rules

  1. Completers MUST return Completions for ALL DWs regardless of status
  2. UIO Completions CAN arrive in any order
  3. UIOWrCpl and UIORdCpl CAN be coalesced or split
  4. Byte Enables MUST NOT affect Completion Length calculation
  5. Transaction complete when Sum(Completion Lengths) = Sum(Request Lengths)
  6. EP bit is Reserved for UIOWrCpl and UIORdCpl
  7. Lower Address[1:0] is Reserved for UIO Completions

UIO Ordering Rules

  1. No ordering enforced between UIO transactions
  2. No ordering enforced between UIO and non-UIO traffic
  3. Requester is responsible for any needed ordering
  4. IDE Sync/Fail messages not architected for UIO VC Traffic Classes

14. UIO vs Traditional Posted/Non-Posted

Aspect Traditional MWr (Posted) Traditional MRd (Non-Posted) UIOMWr UIOMRd
Completion None Required Required (UIOWrCpl) Required (UIORdCplD)
Ordering Producer-Consumer Producer-Consumer None None
Acknowledgment None (fire-and-forget) Via Completion Via Completion Via Completion
Mode NFM or Flit NFM or Flit Flit Mode Only Flit Mode Only
Tags N/A 8/10/14-bit 14-bit Required 14-bit Required
Fabric Complexity High (ordering) Medium Low (no ordering) Low (no ordering)