ERROR HANDLING DEEP-DIVE

DPC (Downstream Port Containment)

Complete technical guide to error containment, trigger conditions, RP PIO errors, and software recovery

1. What is DPC?

What is Downstream Port Containment?

Downstream Port Containment (DPC) is an optional extended capability that provides a mechanism for Downstream Ports (Root Ports and Switch Downstream Ports) to contain uncorrectable errors and enable software to recover from them without system-wide impact.

Key Concepts

DPC Capability Requirements

2. Why Use DPC?

Why is DPC needed?

Traditional error handling often leads to system-wide resets or crashes. DPC enables fine-grained error containment, allowing software to recover from device failures while maintaining overall system stability.

Problems DPC Solves

Benefits

3. DPC Trigger Conditions

Trigger Enable Settings

DPC Trigger Enable Encoding Trigger Conditions
Disabled 00b DPC not active
ERR_FATAL Only 01b Unmasked uncorrectable error OR ERR_FATAL Message
ERR_NONFATAL/FATAL 10b Unmasked uncorrectable error OR ERR_NONFATAL/ERR_FATAL Message
Reserved 11b Reserved

Additional Triggers

Trigger Reason Encoding

DPC Trigger Reason Value Description
Uncorrectable Error 00b Unmasked uncorrectable error detected
ERR_NONFATAL 01b Received ERR_NONFATAL Message
ERR_FATAL 10b Received ERR_FATAL Message
Extension 11b See Trigger Reason Extension field

Trigger Reason Extension

Extension Value Description
00b RP PIO error
01b DPC Software Trigger

4. DPC Operational Flow

           ┌─────────────────────────────────────────────────────────────┐
           │                    NORMAL OPERATION                        │
           │                   (DPC Trigger Status = 0)                 │
           └────────────────────────┬────────────────────────────────────┘
                                    │
                                    │ Trigger Event (Error detected)
                                    ▼
           ┌─────────────────────────────────────────────────────────────┐
           │                    DPC TRIGGERED                            │
           │                   (DPC Trigger Status = 1)                 │
           │                                                             │
           │  Actions:                                                   │
           │  1. Set DPC Trigger Status bit                             │
           │  2. Set DPC Trigger Reason                                 │
           │  3. Capture Error Source ID (if ERR_* Message)             │
           │  4. Direct LTSSM to Disabled state                         │
           │  5. Log RP PIO info (if applicable)                        │
           │  6. Generate interrupt (if enabled)                        │
           │  7. Send ERR_COR (if enabled)                              │
           └────────────────────────┬────────────────────────────────────┘
                                    │
                                    │ (Port is "in DPC")
                                    ▼
           ┌─────────────────────────────────────────────────────────────┐
           │                    LINK DISABLED                            │
           │                                                             │
           │  - LTSSM in Disabled state                                 │
           │  - No TLP transmission/reception                           │
           │  - Incoming requests get UR/CA completion                  │
           │  - Wait for software intervention                          │
           └────────────────────────┬────────────────────────────────────┘
                                    │
                                    │ Software clears DPC Trigger Status
                                    │ (after DPC RP Busy = 0)
                                    ▼
           ┌─────────────────────────────────────────────────────────────┐
           │                    RECOVERY                                 │
           │                                                             │
           │  1. LTSSM exits Disabled state                             │
           │  2. LTSSM enters Detect state                              │
           │  3. Link retrains                                          │
           │  4. Normal operation resumes                               │
           │  5. Software re-enumerates/recovers device                 │
           └─────────────────────────────────────────────────────────────┘

5. RP PIO (Root Port Programmed I/O) Errors

What are RP PIO Errors?

Root Port Programmed I/O errors occur when a Root Port issues a request (on behalf of CPU) and receives an error response:

Error Type Request Type Description
Cfg UR Cpl Configuration Config request received UR Completion
Cfg CA Cpl Configuration Config request received CA Completion
Cfg CTO Configuration Config request Completion Timeout
I/O UR Cpl I/O I/O request received UR Completion
I/O CA Cpl I/O I/O request received CA Completion
I/O CTO I/O I/O request Completion Timeout
Mem UR Cpl Memory Memory request received UR Completion
Mem CA Cpl Memory Memory request received CA Completion
Mem CTO Memory Memory request Completion Timeout

RP PIO Registers

6. DPC Extended Capability Structure

Register Layout

Offset Register Description
00h Extended Capability Header ID = 001Dh, Version = 1h
04h DPC Capability Supported features
06h DPC Control Enable and configuration
08h DPC Status Trigger status and reason
0Ah DPC Error Source ID Requester ID from ERR_* Message
0Ch+ RP PIO Registers Only for Root Ports with RP Extensions

DPC Capability Register Fields

DPC Control Register Fields

7. Poisoned TLP Egress Blocking

What is Poisoned TLP Blocking?

When enabled, the Downstream Port blocks transmission of poisoned (EP=1) TLPs from its egress, preventing corrupted data from reaching host memory.

   WITHOUT POISONED TLP BLOCKING:
   ┌─────────┐    Poisoned TLP    ┌─────────────┐
   │ Device  │ ──────────────────► │ Host Memory │  ← Data corruption!
   └─────────┘                     └─────────────┘

   WITH POISONED TLP BLOCKING:
   ┌─────────┐    Poisoned TLP    ┌─────────┐
   │ Device  │ ─────────► X ──────│ Root    │  ← TLP blocked
   └─────────┘        BLOCKED     │ Port    │
                                  └─────────┘
                                       │
                                       ▼
                                  Error logged,
                                  DPC triggered

Blocking Behavior

8. Software Recovery Procedure

Recovery Steps

DPC Software Recovery Sequence

  1. Detect DPC: Read DPC Status, check Trigger Status = 1
  2. Identify Cause: Read Trigger Reason, Error Source ID
  3. Log Error Info: Read RP PIO logs if applicable
  4. Wait for Busy: Poll DPC RP Busy until = 0
  5. Clear Trigger: Write 1 to DPC Trigger Status to clear
  6. Honor Timing: Wait for link retraining (conventional reset timing)
  7. Re-enumerate: Re-discover and configure affected devices
  8. Restore State: Restore device configuration
  9. Resume Operation: Resume normal device operation

Timing Requirements

Important: DPC RP Busy

Software must NOT clear DPC Trigger Status while DPC RP Busy is Set. The Root Port may be completing internal activities. Clearing while busy results in undefined behavior.

9. ERR_COR Signaling for DPC

ERR_COR Message Usage

DPC can signal completion via ERR_COR Message with specific subclass:

ERR_COR Subclass Encoding

ECS Value Name Use
00b ECS Legacy Non-ECS capable Requesters
01b ECS SIG_SFW DPC or SFI events
10b ECS SIG_OS AER or RP PIO events
11b ECS Extended Reserved for future

10. Flit Mode Considerations

Differences in Flit Mode

Header Log Size Requirements

Mode RP PIO Header Log Size Total RP PIO Log Size
Non-Flit Mode 4 DW (16 bytes) ≥4 DW
Flit Mode (min) 13 DW (52 bytes) Per Section 6.2.11.3
Flit Mode (max) 19 DW (76 bytes) Per Section 6.2.11.3

11. System-Level Applications

Use Cases

Integration with AER

DPC works alongside Advanced Error Reporting (AER):