The Producer-Consumer Ordering Model
PCIe inherits its ordering model from PCI, designed to support the common producer-consumer programming pattern without requiring explicit synchronization in most cases.
Producer-Consumer Pattern
Scenario: CPU writes data to device memory, then writes to a "doorbell" register to signal completion.
- CPU issues Memory Write #1 (data payload)
- CPU issues Memory Write #2 (doorbell)
- Device sees doorbell, reads data
Requirement: Write #2 must not arrive before Write #1. PCIe guarantees this through strict ordering of Posted requests to the same destination.
Complete Ordering Matrix
This matrix shows whether Row transaction can pass Column transaction within a single queue (same Traffic Class, same Virtual Channel).
| Row passes Column → | Posted Request (MWr, Msg) |
Non-Posted Request (MRd, CfgRd/Wr) |
Read Completion (CplD) |
|---|---|---|---|
| Posted Request (MWr, Msg) | No* | Yes | Yes |
| Non-Posted Request (MRd, CfgRd/Wr) | No | Yes** | Yes |
| Read Completion (CplD) | No | No | Yes |
* Posted passing Posted: Allowed only if addresses do not overlap. Same-address writes must remain ordered.
** NP passing NP: Allowed if Relaxed Ordering (RO) bit is set in the passing request.
Ordering Attributes Deep Dive
Relaxed Ordering (Attr[1])
When the RO bit is set in a Memory Write request, it may pass previous Memory Writes to different addresses. This breaks strict write ordering for performance.
Relaxed Ordering Example
When NOT to Use Relaxed Ordering
- When write order matters for correctness (producer-consumer)
- When writing to memory-mapped I/O with side effects
- When subsequent reads depend on write completion
ID-Based Ordering (Attr[2])
IDO allows transactions from different Requester IDs to pass each other, even within the same VC. This enables independent transaction streams.
IDO Use Cases
- SR-IOV: Different Virtual Functions can have independent ordering domains
- Multi-queue NICs: Each queue operates independently
- NVMe: Separate submission queues don't need strict ordering between them
IDO Example
No Snoop (Attr[0])
The No Snoop attribute is a hint that the data does not require hardware cache coherency management. This does NOT affect ordering—it's purely a performance optimization.
| NS Value | Behavior | Use Case |
|---|---|---|
| 0 | Normal snooping (cache coherent) | Default, shared data |
| 1 | Skip cache coherency check | Streaming data, device-only memory |
Ordering Through Switches
Switches must maintain ordering for transactions passing through. The ordering rules apply at each ingress/egress port pair.
Switch Ordering Requirements
- Transactions in the same TC that enter the same ingress port and exit the same egress port must maintain relative order per the matrix
- Transactions from different ingress ports have no ordering relationship
- Different TCs (mapped to different VCs) have no ordering relationship
Completion Ordering
Completions follow specific rules to prevent deadlock and ensure forward progress:
- Completions never blocked by requests: Ensures forward progress
- Completions can pass completions: No strict ordering between different transactions' completions
- Completions cannot pass requests: Prevents reordering that could violate producer-consumer
Practical Implications
DMA Engine Design
Read-After-Write Hazard
RAW Hazard with Peer Devices
Problem: Device A writes to Device B, then reads back.
The read completion might return stale data if it races with the write.
Solution: Use a read from Device B as a fence (the read completion cannot pass the write).