Transaction Ordering Rules - Complete Reference

Producer-Consumer Model, Relaxed Ordering, IDO

The Producer-Consumer Ordering Model

PCIe inherits its ordering model from PCI, designed to support the common producer-consumer programming pattern without requiring explicit synchronization in most cases.

Producer-Consumer Pattern

Scenario: CPU writes data to device memory, then writes to a "doorbell" register to signal completion.

  1. CPU issues Memory Write #1 (data payload)
  2. CPU issues Memory Write #2 (doorbell)
  3. Device sees doorbell, reads data

Requirement: Write #2 must not arrive before Write #1. PCIe guarantees this through strict ordering of Posted requests to the same destination.

Complete Ordering Matrix

This matrix shows whether Row transaction can pass Column transaction within a single queue (same Traffic Class, same Virtual Channel).

Row passes Column → Posted Request
(MWr, Msg)
Non-Posted Request
(MRd, CfgRd/Wr)
Read Completion
(CplD)
Posted Request (MWr, Msg) No* Yes Yes
Non-Posted Request (MRd, CfgRd/Wr) No Yes** Yes
Read Completion (CplD) No No Yes

* Posted passing Posted: Allowed only if addresses do not overlap. Same-address writes must remain ordered.

** NP passing NP: Allowed if Relaxed Ordering (RO) bit is set in the passing request.

Ordering Attributes Deep Dive

Relaxed Ordering (Attr[1])

When the RO bit is set in a Memory Write request, it may pass previous Memory Writes to different addresses. This breaks strict write ordering for performance.

Relaxed Ordering Example

T0: MWr to 0x1000 (Data A) - RO=0 T1: MWr to 0x2000 (Data B) - RO=1 T2: MWr to 0x1000 (Data C) - RO=0 // Without RO, arrival order must be: A, B, C // With RO on B, permitted orders: A,B,C or B,A,C // C cannot pass A (same address, both RO=0)

When NOT to Use Relaxed Ordering

  • When write order matters for correctness (producer-consumer)
  • When writing to memory-mapped I/O with side effects
  • When subsequent reads depend on write completion

ID-Based Ordering (Attr[2])

IDO allows transactions from different Requester IDs to pass each other, even within the same VC. This enables independent transaction streams.

IDO Use Cases

  • SR-IOV: Different Virtual Functions can have independent ordering domains
  • Multi-queue NICs: Each queue operates independently
  • NVMe: Separate submission queues don't need strict ordering between them

IDO Example

// Two VFs sending to host memory T0: VF0: MWr to 0x1000 - IDO=1 T1: VF1: MWr to 0x2000 - IDO=1 T2: VF0: MWr to 0x3000 - IDO=1 // With IDO, VF1's write can pass VF0's writes // Arrival order can be: any permutation // Software must use barriers if ordering needed between VFs

No Snoop (Attr[0])

The No Snoop attribute is a hint that the data does not require hardware cache coherency management. This does NOT affect ordering—it's purely a performance optimization.

NS ValueBehaviorUse Case
0Normal snooping (cache coherent)Default, shared data
1Skip cache coherency checkStreaming data, device-only memory

Ordering Through Switches

Switches must maintain ordering for transactions passing through. The ordering rules apply at each ingress/egress port pair.

Switch Ordering Requirements

  • Transactions in the same TC that enter the same ingress port and exit the same egress port must maintain relative order per the matrix
  • Transactions from different ingress ports have no ordering relationship
  • Different TCs (mapped to different VCs) have no ordering relationship

Completion Ordering

Completions follow specific rules to prevent deadlock and ensure forward progress:

Practical Implications

DMA Engine Design

// Correct DMA completion signaling 1. DMA engine writes data to host memory (MWr) 2. DMA engine writes completion entry (MWr) 3. DMA engine sends MSI-X interrupt (MWr) // Because Posted requests maintain order: // - Data arrives before completion entry // - Completion entry arrives before interrupt // - CPU sees consistent state when handling interrupt

Read-After-Write Hazard

RAW Hazard with Peer Devices

Problem: Device A writes to Device B, then reads back.

The read completion might return stale data if it races with the write.

Solution: Use a read from Device B as a fence (the read completion cannot pass the write).