Українською
  In English
Збирач потоків
Твори Василя Климика в ЦКМ
Виставка відомого українського художника Василя Климика пройшла у квітні в Картинній галереї ім. Григорія Синиці Центру культури і мистецтв КПІ.
PCIe 7.0: Addressing legacy ordering limitations with UIO

Part 1 of this mini-series about PCIe 7.0 fundamentals explained ordering rules and the distinction between relaxed ordering and ID-based ordering. Part 2 elaborates why PCIe 7.0 bandwidth alone isn’t enough and how UIO addresses legacy ordering limitations in this version of high-speed serial interface specification.
As noted earlier, PCIe 7.0 doubles raw link bandwidth compared to PCIe 6.0, increasing full‑duplex throughput from 256 GB/s to 512 GB/s on an x16 link by raising the signaling rate to 128 GT/s in flit mode. However, raw bandwidth does not directly translate into sustained throughput in AI factories.
Large‑scale training and inference systems generate traffic patterns such as GPU collective operations, sharded parameter broadcasts, gradient reductions, and streaming access to disaggregated accelerator and memory resources. These patterns include many independent data streams that cross the PCIe fabric concurrently and continuously.
The legacy ordering model inherited from earlier PCIe generations, including strict ordering, relaxed ordering, and ID‑based ordering, was designed around a producer-consumer abstraction in which ordering conveys semantic meaning to software. Relaxed ordering and ID-based ordering loosen this model selectively.
Relaxed ordering allows certain transactions to bypass global ordering constraints, while still participating in fabric‑enforced ordering rules. ID-based ordering further scopes ordering guarantees to a requester or execution context, preserving program order within that scope. In both cases, the PCIe fabric requires tracking and enforcement of ordering relationships to ensure correctness.
However, fabric‑enforced ordering introduces head‑of‑the-line blocking, increases buffering pressure, and restricts the ability of switches and endpoints to exploit parallel paths. This is particularly the case for multi‑path and non‑tree topologies common in modern AI systems. These effects reduce effective link utilization even though physical bandwidth is available, making it difficult for highly parallel AI workloads to keep PCIe 7.0 links continuously busy.
Addressing legacy ordering limitations with UIO
The unordered I/O (UIO) engineering change notice (ECN) was introduced in the PCIe 6.1 specification and included in PCIe 7.0 to address the specific limitation noted above. UIO introduces a wire-level semantic that shifts producer-consumer ordering responsibility from the fabric to the endpoints. The UIO ECN declares that ordering may be irrelevant for certain traffic classes.
For AI factory workloads, where operations such as reductions, parameter streaming, and telemetry are independent or statistically aggregated and never consumed in program order, enforcing any form of ordering (even per‑ID ordering) adds overhead. UIO removes fabric‑enforced ordering, enabling true multi‑path parallelism and reducing buffering requirements.
This allows PCIe fabrics to sustain higher utilization for concurrent AI traffic. Since UIO enables independent transactions from different request originators to bypass one another safely, AI systems can optimize PCIe 7.0’s increased bandwidth to support rapidly growing model sizes and highly parallel GPU workloads.
UIO is especially effective at reducing read latency because multiple UIO read completions for a single UIO read request may be returned in any address order. This same flexibility applies to UIO write completions, with the additional capability that write completions for the same transaction ID may be coalesced. Since every UIO request has a corresponding completion, the request originator maintains the ordering of its own transactions. This allows the PCIe fabric to forward traffic along multiple paths without violating semantic correctness.
With its low latency, UIO transforms PCIe fabrics into high-throughput, highly parallel forwarding planes capable of accommodating modern AI workloads. Instead of relying on the fabric to manage per-flow sequencing, UIO shifts ordering control back to the source device that initiates the requests.
How UIO reduces latency and unlocks concurrency in AI applications
UIO’s command set and wire semantics reduce latency and boost performance for AI training and inference in several ways.
First, UIO mandates completions for all UIO requests. This gives GPU endpoints precise end-to-end flow control and prevents posted-write “fire and forget” bursts from clogging switch queues. It also cuts head-of-the-line blocking and shortens tail latency, speeding up requests by allowing different types of requests to bypass each other without applying any ordering rules within the PCIe fabric.
One of the classic head-of-the-line blocking examples in the baseline strict ordering rule is that current read requests are not permitted to bypass previous write requests. UIO eliminates this rule, allowing read and write requests to be processed in parallel and completed in any order, as shown in Figure 1.

Figure 1 UIO read and write requests are processed in parallel at the application layer. Source: Cadence Design Systems
In addition, UIO read requests reduce latency and buffering by allowing a completer to return read completions out of order. This enables data to be delivered as it becomes available, rather than delaying responses to preserve requests or address ordering. This improves overall efficiency by giving the device greater freedom to exploit internal data availability and minimizing completion queueing and reassembly overhead.
For example, Figure 2 and Figure 3 show the completion patterns for a single 512 MB MRD request for non-UIO (in-order) and UIO (out-of-order) cases, respectively.

Figure 2 Non-UIO completion responses must be in order for the same MRD request. Source: Cadence Design Systems
For non-UIO, Figure 2 illustrates that completions must arrive in order, starting at byte 0 and ending at byte 511. However, with UIO, the completion order can be random, as shown in Figure 3. The first two completions carry the last two chunks of MRD requests (256-383B and 384-511B) because they are already available in the local cache. After that, the application reads the remaining completion data from its local memory and sends the remaining two completions (0B-127B and 128B-255B).

Figure 3 UIO read and out-of-order completion responses are processed for the same request. Source: Cadence Design Systems
Second, because ordering is enforced at the source rather than at every intermediate hop, packets from unrelated GPU streams can be load-balanced across multiple parallel paths through the PCIe fabric without being serialized by switch-level producer-consumer rules. This increases effective throughput at a given link rate and stabilizes latency underload. In multi-path topologies, system architects often use a non-transparent bridge (NTB) to connect separate systems, enabling cross-system traffic within a larger fabric.
Third, UIO is available only in flit mode. Operating in fixed-size flits with UIO-specific VC3VC4 (via the streamlined virtual channel capability) isolates UIO traffic from legacy flows, minimizes delays, and improves switch buffer utilization.

Figure 4 The above diagram displays a multi-path application example. Source: Cadence Design Systems
Figure 4 shows two interconnected PCIe systems (System 0 and System 1), each with GPUs and local PCIe switches connected via multiple NTB links. The upper NTB link can operate with either UIO-enabled or non-UIO-enabled traffic, while the three diagonal and lower links operate with UIO-enabled NTB.
As a result, independent transactions can flow concurrently across switches SW0–SW3. This topology shows how UIO-based NTB paths improve GPU communication by enabling multipath routing, reducing latency, and increasing bandwidth in large-scale AI systems.
PCIe ordering: A traffic light analogy
A helpful way to think about PCIe ordering is traffic control in a city. Strict ordering is like running the entire city with a single traffic light, and every vehicle must wait its turn and proceed in sequence. While there is no ambiguity, congestion can quickly build up. Relaxed ordering allows certain vehicles to pass through intersections in specific emergency situations, provided it is safe to do so.
While this removes unnecessary traffic jams, it still assumes the traffic system is centrally managed. ID-based ordering further refines this model by assigning each neighborhood its own traffic lights. While cars within the same neighborhood must obey local ordering rules, traffic from different neighborhoods can flow independently. This improves parallelism without sacrificing local correctness.
UIO bypasses traffic light rules entirely. It is akin to routing traffic onto a freeway, where there are no intersections or signals at all, and vehicles move continuously as capacity allows. On a freeway, the infrastructure does not impose sequencing. Instead, the responsibility for safe merging and interpreting arrival order shifts to drivers.
Similarly, with UIO, the PCIe fabric no longer enforces producer‑consumer ordering or completion sequencing. The requester explicitly declares that ordering carries no semantic meaning, allowing the fabric and devices to deliver and complete transactions opportunistically. This maximizes parallelism while minimizing buffering and latency.
These four ordering schemes are a progression rather than a set of alternatives. Strict ordering prioritizes safety and simplicity, while relaxed ordering removes unnecessary global barriers. ID-based ordering preserves correctness within a context while enabling scale, and UIO explicitly abandons ordering when it has no value. This layered model allows PCIe to remain compatible with legacy software while scaling efficiently for modern accelerators, multi‑queue devices, and highly parallel workloads.
Turning PCIe bandwidth into system-level performance
Fully utilizing PCIe 7.0’s 128 GT/s link in today’s AI factories requires more than higher signaling rates. In an environment where thousands of GPUs, accelerators, and memory expanders operate as a single, distributed system, an ordering model that can scale with extreme parallelism is necessary.
Legacy relaxed ordering and ID-based ordering schemes retain implicit ordering constraints that limit their efficiency at PCIe 7.0 speeds, making them increasingly inadequate for AI factories operating at hyperscale.
UIO relaxes fabric‑enforced ordering and enables AI workloads to more effectively utilize multi‑path PCIe fabrics. By shifting ordering decisions to endpoints that already manage synchronization at the runtime and application levels, UIO reduces ordering-related head-of-the-line blocking issues.
Not only does this improve latency under bursty collective traffic, it also supports higher sustained link utilization across dense training and inference clusters. The result: Under AI workloads, PCIe 7.0 can be used more efficiently as a data plane, rather than simply serving as a peak‑bandwidth interconnect.
Vanessa Do is a senior product marketing manager for PCIe IP at Cadence with over 20 years of experience in PCIe design, system validation, and customer engagement. Her background spans PCIe protocol development, FPGA-based customer support, and leading cross‑functional teams to debug complex PCIe issues at the system level.
Editor’s Note
This is Part 2 of the article series about PCIe 7.0 fundamentals. Part 1 explained PCIe’s ordering rules and the distinction between relaxed ordering and ID-based ordering.
The post PCIe 7.0: Addressing legacy ordering limitations with UIO appeared first on EDN.
Quantifying a power surge: Insufficient supplier-sourced knowledge

Portable power units have both instantaneous-output and run-time limits, of course, but this situation seems a bit ridiculous. Or, then again, maybe not. But how to tell?
Last December, a few hours after the “kickoff” of our high wind-induced multi-day power outage “adventure”, I had the bright (if I do say so myself) idea to try hooking up our portable power stations (plus extended batteries in two of the three cases):


to the refrigerator-plus-freezer combo in the kitchen, along with both its combo fridge-plus-freezer companion and a standalone chest freezer out in the garage. The weather outside, therefore also the temperature in the garage, was chilly, so I wasn’t terribly worried about anything spoiling in either of those latter two units. Then again, I didn’t know how long the outage would last, and I had three supplemental power solutions at my disposal, so…
I started (and ended; keep reading) with the cooling combo in the kitchen, my highest priority for perhaps-obvious comparative ambient temperature reasons. It’s a Samsung model RF217ACBP/XAA; here are a couple of stock photos to start:


I dragged from the downstairs furnace room the EcoFlow DELTA 2-plus-Smart Extra Battery “stack”, enabled the former’s AC inverter outputs, and plugged the combo fridge-plus-freezer in. I heard the compressor start up (accompanied by a DELTA 2 front panel display-reported AC output spike)…try to start up is a more accurate description, because after a second or so, the setup seemingly overloaded and gave up trying. Next up, the DELTA 3 Plus and its Smart Extra Battery sibling. Same underwhelming outcome.
The wind was blowing, the outside light was dimming, and my spouse was understandably getting stressed, so I didn’t waste any more time messing around; I promptly bailed on the idea and focused my attention elsewhere. Since I’d already expended the effort to get both “stacks” upstairs, they ended up alternatively finding use in powering table lamps, recharging various battery-powered devices—lanterns, laptops, tablets, smartphones—and the like.
No, I didn’t bother trying to haul upstairs my even heavier SLA battery-based Phase2 Energy unit. And fortunately, save for the spoil-prone contents of our kitchen refrigerator (but not its combo freezer), we didn’t need to toss any food. Still, I was both disappointed and (more than a) bit surprised, because I’d seen success reports from other folks who’d successfully powered food-storage equipment (albeit of unknown capacity and for unknown duration) using EcoFlow and other suppliers’ similar systems in similar circumstances as mine.
Published data also would have been helpfulGiven my background experiences with other startup-surge hardware, I was pretty sure I knew how the failure had happened, but not specifically why. So, after the electricity started flowing again, I did some research. First off, I realized I hadn’t enabled either EcoFlow base unit’s X-Boost Mode feature, which might have gotten them over the compressor-start initial-surge “hump”. Please take a moment to “enjoy” the following promo video clips
:
As I wrote last February, X-Boost “doubles the output AC power (at a reduced voltage tradeoff that not all powered devices are guaranteed to accept, albeit obviously counterbalanced by higher current)”. Could it have helped? Dunno; I’ll have to try it sometime when I get a chance.
But how much surge current, and at what minimum voltage, does the Samsung RF217ACBP/XAA demand on compressor startup? Ay, there’s the rub. You won’t find it in the user manual, or even the service manual, only steady-state power draw specs. The labels on the side:

and rear of the Samsung RF217ACBP/XAA:


weren’t directly helpful either, although they at least revealed the compressor model number (MK162D-L1U SJ1). But my online browsing using that specific search term was equally fruitless.
Cue the hand-wavingWhat do online resources say in general? Here’s Google AI Mode’s take on the topic:
A refrigerator typically experiences a startup surge current 3–4 times higher than its normal running amperage, lasting only a few seconds. While running at 1–4 amps, it can spike to 15–30 amps during compressor startup. This inrush current is essential to overcome inertia, usually requiring a dedicated 15–20 amp circuit.
I just checked and confirmed that my kitchen refrigerator breaker is 20A. Feel free to contrast that with the “3.9 Maximum Amperes” claim in the above sticker closeup shot. Sigh.
Ballpark figures are better than nothing, I suppose, albeit still (quite) non-ideal. Am I just overlooking something obvious, or being pedantic, or is the startup surge draw:
- useful information that
- Samsung (at least) isn’t publishing
therefore, compelling consumers to potentially overshoot, buying portable power systems beefier and more expensive than they may actually need (and, apparently, than I bad-pun-intended “currently” own)? Reader thoughts are as-always welcomed in the comments!
My father (the King of Duct Tape) would have been impressedp.s…while researching this post’s topic online, I came across a mind-blowing (at least to me) somewhat-related Reddit thread that I couldn’t resist sharing: “Fridge kept tripping circuit breaker until I added an extension cord. Why?”. Here’s my stab at the TL;DR summary:
The OP (original poster) eventually determined, in conjunction with his repair tech, that the refrigerator’s defrost heater was failing. But in initially attempting to debug the issue, originally assuming that the outlet wiring might be failing, he used an extension cord (beefy, I hope) to plug the fridge into another outlet, which worked fine. Turns out, the extension cord was still largely coiled and sitting on top of the fridge; the resulting added circuit induction sufficiently opposed the high frequency noise injection coming from the failing defrost heater such that the arc fault circuit interrupting (ACFI) breaker stopped tripping…temporarily, at least.
The entire thread is well worth your perusal if you have sufficient spare time and interest!
—Brian Dipert is the associate editor, as well as a contributing editor, at EDN.
Related Content
- Preemptive utilities shutdown oversight: Too much, too little, or just right?
- EcoFlow’s Delta 2: Abundant Stored Energy (and Charging Options) for You
- Portable power station battery capacity extension: Curious coordination
- EcoFlow’s DELTA 3 Plus and Smart Extra Battery: Product line impermanence curiosity
- An assortment of tech-hiccup tales
The post Quantifying a power surge: Insufficient supplier-sourced knowledge appeared first on EDN.
AOI awarded $20.9m Texas Semiconductor Innovation Fund grant
IQE raises £81m, including £45m from MACOM long-term supply agreements
Transceivers boost in-vehicle audio bandwidth

ADI’s ADAA245x series of A2B 2.0 Automotive Audio Bus transceivers delivers 4× higher bus bandwidth (98.3 Mbps full-duplex) than A2B 1.0 devices. Now in production, the transceivers handle up to 119 upstream and downstream audio channels for advanced automotive audio systems, enabling high-definition audio transport across ECU networks.

The ADAA2457 supports Ethernet data tunneling via an Open Alliance SPI (OASPI) interface. All ADAA245x devices are compatible with existing A2B 1.0 cable and connector infrastructure and enable A2B 1.0 branching via device-specific I2S, I2C, and SPI interfaces. The ADAA2455 operates as a sub-node transceiver, while the ADAA2456 and ADAA2457 can be configured as main or sub-nodes.
According to ADI, the transceivers achieve up to 30% system cost reduction through increased functional integration and reduced external circuitry and component count. They also provide low, deterministic latency of 62 µs and are built for straightforward integration.
Learn more about A2B 2.0 and individual transceivers here.
The post Transceivers boost in-vehicle audio bandwidth appeared first on EDN.
Rohm shrinks NFC charging for wearables

Rohm’s ML7670/ML7671 wireless charging chipset provides NFC charging for compact wearables such as smart rings and fitness trackers. Operating in the 13.56-MHz band, NFC charging enables antenna miniaturization for ultra-compact devices. Following the 1-W ML7660/ML7661 chipset, the ML7670/ML7671 is optimized for even smaller wearable designs.

The chipset comprises the ML7670 receiver and ML7671 transmitter and supports wireless power transfer up to 250 mW. Peripheral components, including switching MOSFETs used to power the charging IC, are integrated. ROHM states that the 2.28×2.56×0.48-mm receiver IC reaches 45% power-transfer efficiency at 250 mW output, where it is optimized for compact wearable designs.

Rohm says the 45% power-transfer efficiency is enabled by tailored coil matching, rectifier circuitry, and reduced switching losses. Firmware for wireless power delivery is embedded in the IC, eliminating the need for a host MCU and reducing board space.
The NFC Forum WLC 2.0-compliant chipset is in mass production and is used in the Soxai Ring 2.
The post Rohm shrinks NFC charging for wearables appeared first on EDN.
Rectifiers combine low profile and high current

Vishay has released 16 single and dual FRED Pt ultrafast rectifiers in low-profile DFN6546A packages with wettable flanks. The 200-V devices occupy a 6.5×4.6-mm footprint with a typical height of 0.88 mm. Rated from 6 A to 15 A, they offer a 10% lower profile and 50% higher current than comparable 200-V SMPC (TO-227A) devices.

The rectifiers are designed for high-frequency power conversion and protection in automotive, industrial, and consumer systems, including EV powertrains, ADAS, industrial automation, and telecom equipment. Automotive variants are AEC-Q101 qualified.
For these applications, the rectifiers feature low reverse leakage current and operate over a wide temperature range from −55 °C to +175 °C. A low forward voltage drop of 0.75 V, combined with fast reverse recovery time and low reverse recovery charge, reduces power losses and improves efficiency.
The DFN6546A package’s wettable flanks enable automatic optical inspection (AOI), eliminating the need for X-ray inspection and supporting automated assembly. The devices are MSL 1 qualified per J-STD-020, with a maximum peak reflow temperature of 260 °C.
Samples and production quantities of the single and dual FRED Pt ultrafast rectifiers are available now, with lead times of eight weeks.
The post Rectifiers combine low profile and high current appeared first on EDN.
Timing module enables vRAN synchronization

Microchip’s MD-990-0011-B M.2 plug-in timing module delivers precise synchronization for data center servers and 5G vRAN. Developed with Intel, it is compatible with Xeon 6 SoC–based server platforms. It leverages Intel’s vRAN architecture for low-latency time synchronization in distributed AI workloads and real-time applications.

Customized for Intel-based reference designs, the device supports automatic source selection and locking across GNSS, Synchronous Ethernet (SyncE), and PTP networks. Its integrated SyncE synthesizer includes two independent digital PLL channels: one for time and one for frequency. Additional components include an OCXO supporting 4 or 8 hours of 1.5-µs holdover, along with a temperature sensor, EEPROM, and a crystal oscillator to help maintain low jitter.
By integrating these components into a single plug-in module, the MD-990-0011-B simplifies server design and reduces complexity. Its modular approach also speeds installation and maintenance, helping minimize downtime.
The MD-990-0011-B is available in production quantities from Microchip and authorized distributors.
The post Timing module enables vRAN synchronization appeared first on EDN.
MCUs pair high flash capacity with security

The GD32F5HC series of 32-bit general-purpose MCUs from GigaDevice features a 200-MHz Arm Cortex-M33 core with DSP and FPU capabilities. Hardware-based security, ample on-chip memory, and integrated peripherals target both consumer and industrial designs.

On-chip memory includes 2 MB of flash, 320 KB of SRAM, and 32 KB of instruction cache. Multichannel DMA controllers handle complex algorithms, graphics frameworks, and high-speed data flows. QSPI and SPI interfaces enable external PSRAM and flash expansion at up to 45 MHz.
Peripherals include a 12-bit ADC with an integrated temperature sensor and an infrared interface for analog signal processing. Multiple 16-bit and 32-bit timers, along with a real-time clock, enable waveform generation, motor control, and synchronized multi-axis operation.
Security features combine Arm TrustZone with a 2-kbit eFuse for key storage and hardware cryptographic accelerators. Secure boot, storage, debugging, and firmware updates help maintain system integrity across the device lifecycle.
The MCUs operate from a 3.3-V supply and offer four power-saving modes: sleep, deep sleep, standby, and SRAM sleep. Devices are available in BGA64 and QFN56 packages with up to 54 GPIOs.
The post MCUs pair high flash capacity with security appeared first on EDN.
Проєкти КПІ – переможці конкурсів НФДУ 2026
На початку року Наукова рада НФДУ оприлюднила рейтингові списки проєктів – переможців конкурсів, що будуть реалізовані за рахунок грантової підтримки. КПІ ім. Ігоря Сікорського – лідер серед ЗВО за кількістю проєктів, що отримають державне фінансування.
ΔVbe + DMM = Celsius, Kelvin, Fahrenheit, and Rankine thermometer

Combining an accurate temperature sensor with a standard digital multimeter can make an inexpensive, accurate, and useful thermometer.
A recent Design Idea, BJT is accurate sensor for absolute temperature in Kelvin and Rankine, was based on a 1991 application note (PDF) by a legendary guru, the forever remembered Jim Williams. In his article, Williams demonstrated that, when used as ΔVbe sensors, ordinary unselected transistors give temperature readings accurate to a fraction of a degree without calibration:
“…randomly selected 2N3904s and 2N2222s … showed less than 0.4°C spread over 25 devices from various manufacturers.”
Wow the engineering world with your unique design: Design Ideas Submission Guide
As shown in BJT is accurate…, the basic math of ΔVbe can be cooked down to a simple and easy to remember (hah!) linear-in-absolute-temperature relationship: ΔVbe/°C = Log10(Current-ratio)/5050. Therefore, if we want any given ΔVbe/°C, the required is just Current-ratio = 10^(5050 ΔVbe/°C).
For example, for ΔVbe/°C = 100uV, Current-ratio = 10^(5050 * 100uV) = 10^(0.5050) = 3.20. This ratio is implemented in Figure 1’s simple circuit for a 100uV per Kelvin output.

Figure 1 An ordinary BJT Q1 makes an accurate 100uV per unit Kelvin absolute temperature sensor.
Okay. So. What’s it good for? One plausible application is, as frequent contributor Nick Cornford has shown in several ingenious designs:
- Newer, shinier DMM RTDs—part 1 and part 2
- Dropping a PRTD into a thermistor slot—impossible?
- DIY RTD for a DMM
that the combination of an accurate temperature sensor with a standard digital multimeter can make an inexpensive, accurate, and useful thermometer.
Nick’s favorite sensor is the super-versatile platinum RTD, but as Williams showed, a humble (and super cheap) 2N3904 (or similar) BJT might also fill the bill. That’s assuming that its package-limited −55 to +150°C temperature range is adequate. And that’s also assuming that it gets a little help from its friends, such as Figure 2’s zero-drift op amp that boosts the output span to a DMM-friendly 1mV per unit Celsius, Kelvin, Fahrenheit, and Rankine.

Figure 2 A zero drift, 5uV max offset A1 rescales 100uV/°K by 10x to 1mV/°C and by 18x to 1mV/°F.
Of course, Kelvin and Rankine absolute temperature measurements are absolutely less frequently useful than the common Celsius and Fahrenheit scales…which is where Figure 3 comes in:

Figure 3 Connect the DMM’s plus lead to the appropriate figure 2 output, and the minus lead to the correct precision 0° offset terminal, to re-zero 273K to 0°C and 460R to °0F.
V+ can be anywhere from 3 to 6 volts. Current consumption at 3v is barely more than 1mA, dominated by the Z1 shunt reference, so two AAs will support 2000 hours (nearly three months) of continuous operation. A single CR2032 lithium coin will hold up for 10 non-stop days.
Thanks, Nick and Jim!
Stephen Woodward‘s relationship with EDN’s DI column goes back quite a long way. Over 200 submissions have been accepted since his first contribution back in 1974. They have included best Design Idea of the year in 1974 and 2001.
Related Content
- BJT is accurate sensor for absolute temperature in Kelvin and Rankine
- Newer, shinier DMM RTDs—part 1
- Newer, shinier DMM RTDs—part 2
- Dropping a PRTD into a thermistor slot—impossible?
- DIY RTD for a DMM
The post ΔVbe + DMM = Celsius, Kelvin, Fahrenheit, and Rankine thermometer appeared first on EDN.
ІІІ Всеукраїнський фестиваль креативної молоді «Creative Hub»
Навчально-науковий видавничо-поліграфічний інститут КПІ організував ІІІ Всеукраїнський фестиваль креативної молоді «Creative Hub».
OpenLight secures $50m in Series A-1 funding, boosting total raised to $84m
PCIe 7.0 fundamentals: Baseline ordering rules

Adding more compute is no longer enough to maximize AI training and inference performance in today’s AI factories. The real challenge is how efficiently data flows through AI systems, not raw processing power.
Training remains the foundation of AI development, and maximizing throughput across large clusters is critical as models internalize structure, learn statistical relationships, and establish a baseline for downstream workloads. Inference shifts the focus, demanding ultra-low latency and high-reliability token generation. Both of these phases are characterized by exponential scale.
Training trillion-parameter models and performing inference that requires enormous amounts of contextual information places significant pressure on the “plumbing” of modern computing platforms. In this environment, the efficiency, predictability, and speed of data movement across the CPUs, accelerators, memory, and I/Os that compose these AI systems have become the true bottleneck.
Eliminating data bottlenecks is now dependent on optimizing interconnect bandwidth. The interconnects themselves are crucial to system success, with PCI Express (PCIe), specifically PCIe 7.0, being a prime example.
PCIe’s role in multi-GPU scale-up systems
For more than a decade, PCIe has been the backbone of multi-GPU server systems due to its universal and extensible interconnect that ties all compute and I/O devices together inside a node, such as GPU-to-DPU/NIC, or GPU-to-switch. Even as newer, high-bandwidth GPU-to-GPU fabrics and proprietary accelerator meshes have emerged, systems continue to rely on PCIe for baseline connectivity, system bring-up, and data movement across system components.
Announced in June 2025, PCIe 7.0 doubles link bandwidth to 128 GT/s, delivering up to 512 GB/s of bi-directional throughput per x16 connection. While this increased bandwidth helps alleviate I/O bottlenecks for AI computing, fully utilizing PCIe 7.0 for inference workloads also requires minimizing latency across the fabric.
Multiple inference streams sharing the same PCIe path may cause head-of-the-line blocking due to unnecessary serialization. This results in delays in unrelated traffic, which impacts overall system efficiency.
To maintain low latency and fully utilize PCIe 7.0 bandwidth under parallel workloads, a more flexible ordering model is required.
Baseline PCIe ordering rules: Why serialization exists
First, it’s helpful to understand PCIe’s baseline ordering rules. Most systems using early PCIe generations—from 2.5 GT/s in PCIe 1.0 to 8 GT/s in PCIe 3.0—relied on simple point‑to‑point connections supporting a single application or device context. As a result, the PCIe protocol strictly enforced baseline ordering rules to ensure that the results of memory operations are presented in an order that matches software expectations.
Within a single traffic class, PCIe groups transaction-layer packets (TLPs) into posted, non-posted, and completion categories, each governed by defined ordering constraints. Posted requests are memory writes (MWR) and messages (MSG) that operate without needing a completion, while non‑posted requests include memory reads (MRD) and configuration transactions that must receive a completion. To simplify the discussion, the focus is on the main traffic. Only MRD requests, MWR requests, and read completions (CPL) are described in the ordering rules.

Table 1 Baseline ordering rules highlight the relationship between the current and previous requests. Source: Cadence Design Systems
Table 1 shows the relationship between the current requests in rows A, B, and C and the previous requests in columns 2, 3, and 4. The “Y/N” in the table is the abbreviation of “Yes/No” that implies the row’s request/completion “may pass” the column’s request/completion type.
To understand the A3 deadlock scenario detailed in Figure 1, assume the root complex (RC) issues an MRD request (1) followed by an MWR request (2) toward the endpoint (EP) device. A deadlock can occur when the RC exhausts the completion credits, and its completion buffer becomes full (3). The RC is unable to accept new completions (4) associated with the outstanding non-posted MRD (1).

Figure 1 In A3 deadlock, the RC completion queue (CQ) is full, preventing it from returning completion to release the MRD from blocking the MWR request. Cadence Design Systems
Because strict ordering prevents the newer MWR (2) from bypassing the unresolved MRD (1) until its completion (4) is received, the RC’s transmit request path is also blocked. This prevents the issuance of MWR (2) from propagating to the EP link (5). This head-of-the-line blocking creates circular dependency, which stalls internal request queue draining and completion acceptance. Unless the MWR is allowed to bypass the MRD, a deadlock results.
For the C3 deadlock scenario illustrated in Figure 2, assume both the RC and EP issue many non-posted read requests (1), which aggressively fill both the RX and TX request queues (RQ) (2) and prevent them from accepting any new MRD requests (3). Meanwhile, the completions (4) are returned for pending MRD requests (1) in the opposite direction, but they can’t be forwarded to fulfill the previous pending MRD request (1). This is because they arrived behind the new MRD request (3). If the completion is not allowed to bypass the previous MRD requests in the same direction, it will result in a deadlock.

Figure 2 A C3 deadlock occurs if both the RX and TX request queues are full, and the completion is not allowed to pass the previous MRD request. Source: Cadence Design Systems
For A2 (Row A, column 2), B2, and C2, MWR, MRD, and CPL requests cannot pass MWR requests to maintain correctness. These three scenarios are illustrated in Figure 3, Figure 4, and Figure 5, respectively.

Figure 3 In A2, current MWR requests cannot pass previous MWR requests. Source: Cadence Design Systems

Figure 4 In B2, current MRD requests cannot pass previous MWR requests. Source: Cadence Design Systems

Figure 5 In C2, current completion requests cannot pass previous MWR requests. Source: Cadence Design Systems
However, A3 and C3 illustrate that both MWR requests and completions can pass an earlier MRD request to avoid a deadlock. This is shown in Figure 6 and Figure 7.

Figure 6 In A3, MWR requests are allowed to bypass previous MRD requests to avoid a deadlock. Source: Cadence Design Systems

Figure 7 In C3, completions are allowed to bypass previous MRD requests to avoid a deadlock. Source: Cadence Design Systems
For B3, the current MRD request might be bypassed or blocked by the previous MRD request. For A4 and B4, the current MWR or MRD request is permitted to pass the previous completion or be blocked by the completion.
Figure 8 describes both the C4a and C4b scenarios. The yellow and green completions belong to the pending yellow (RD1) and green (RD0) MRD requests, respectively. If current completions belong to different MRD requests, they can pass each other as CPL10 passes CPL01 (C4a scenario). However, if they belong to the same MRD request as in the C4b scenario, they must follow the order and cannot pass previous completions (CPL00 followed by CPL01, and CPL10 followed by CPL11).

Figure 8 In C4, completions of the same MRD request can pass completions of the previous MRD, and completions of the same MRD request must follow in order. Source: Cadence Design Systems
Strict ordering: A safe but conservative baseline
PCIe’s default strict ordering rules ensure safe producer-consumer software behavior. Under strict ordering, the system observes transactions issued by a requester in program order. Posted writes must be completed before subsequent read completions of subsequent pending MRD requests.
However, this global ordering discipline is conservative. It causes unrelated transactions to wait for one another, even when there is no true data dependency. For instance, this can occur when different functions access data from different memory segments in the host or local memory. As PCIe link speeds increase, this approach becomes a scalability bottleneck because it causes head-of-the-line blocking of unrelated serialized traffic and underutilizes the available bandwidth.
Why relaxed ordering is needed
Relaxed ordering loosens these global constraints. When a transaction is marked as relaxed, it tells the PCIe fabric that this MRD or MWR request does not need to participate in the default system‑wide ordering guarantees. Relaxed ordering improves throughput and reduces latency by enabling certain transactions to be reordered. The key point is that relaxed ordering removes unnecessary ordering barriers between independent operations.
However, it still preserves most of the transactional correctness, such as ensuring the completion order of the same MRD requests. This is especially valuable for workloads such as prefetching, polling reads, or accelerator traffic, where software already explicitly manages synchronization. Relaxed ordering addresses the performance loss caused by overly strict global rules. However, it treats ordering as a binary choice, either fully ordered or globally relaxed.
Why ID‑based ordering is also necessary
Relaxed ordering alone is too coarse‑grained for modern devices. High‑performance endpoints, such as GPUs, NICs, and NVMe controllers, generate traffic from many independent sources, including queues, processes, virtual machines, or process address space IDs (PASIDs).
These sources often require ordering within themselves, but not between each other. With ID‑based ordering, PCIe maintains ordering among transactions that share the same requester ID or PASID, while permitting reordering across different IDs. In effect, it scopes ordering guarantees to a logical context rather than imposing them system‑wide.
This allows transactions of the same function or context to maintain the correct program semantics, while the fabric freely parallelizes traffic across independent functions or contexts. Without ID-based ordering, systems would be forced to choose between full serialization for safety or full relaxation with no per‑context guarantees.
Attribute-based ordering: Relaxed and ID-based ordering
Because the PCIe fabric already enforces ordering semantics for both relaxed ordering and ID‑based ordering, system software and device logic influence these behaviors by setting attributes in the TLP headers rather than redefining the rules themselves. Relaxed ordering and ID‑based ordering address different dimensions of the same problem, which is why both are required to meaningfully relax PCIe’s strict ordering rules.
Relaxed ordering removes unnecessary global ordering constraints between different classes of traffic, enabling better scheduling and reducing head-of-the-line blocking. In contrast, ID‑based ordering refines ordering to the level of a requester or context, preserving correctness where the associated software expects it while eliminating artificial dependencies elsewhere.
Together, they allow PCIe to scale with modern parallel workloads. While strict ordering provides a safe default, relaxed ordering removes global bottlenecks, and ID‑based ordering preserves local semantics without sacrificing concurrency. This combination allows PCIe to support today’s accelerators, virtualized I/Os, and high‑throughput devices without breaking the programming models that software relies on.

Table 2 Here is a highlight of the ordering rules for relaxed ordering and ID-based ordering. Source: Cadence Design Systems
Table 2 specifies the ordering rules for relaxed ordering and ID-based ordering. The “Y/N” in the table is the abbreviation for “yes/no”, indicating whether the row’s request/completion “may pass” the column’s request/completion type.
The key differences between the relaxed ordering and ID-based ordering rules detailed in Table 2 and the baseline rules shown earlier in Table 1 are A2, B2, and C2 vs. D2, E2, and F2, respectively. For baseline rules, current MWR, MRD, or completions are not allowed to pass previous MWR requests. However, relaxed ordering and ID-based ordering allow them to pass previous MWR requests if their request IDs—bus, device, function, and PASID—are different.
Vanessa Do is a senior product marketing manager for PCIe IP at Cadence with over 20 years of experience in PCIe design, system validation, and customer engagement. Her background spans PCIe protocol development, FPGA-based customer support, and leading cross‑functional teams to debug complex PCIe issues at the system level.
Editor’s Note
This is Part 1 of the article series about PCIe 7.0 fundamentals. Part 2 will explain why PCIe 7.0 bandwidth alone isn’t enough while highlighting the importance of addressing legacy ordering limitations with UIO.
The post PCIe 7.0 fundamentals: Baseline ordering rules appeared first on EDN.
Norwest Business Park Internet is DOWN FOR 2 WEEKS!
Farmbot Sensor Enclosures
| These are a couple of auxiliary sensor enclosures I made for my Farmbot. The first box is power distribution and housing for i/o of the RS485 7 in 1 soil sensor and an array of moisture sensors. All through USBC ports and RJ45. 24v for the 7 in 1, 5v for the extra USB power ports and the Max485. Not pictured is the max485 circuit and the associated wiring. Box two is a relay enclosure to control 120v pumps and lights. All with custom faceplates to identify the ports so nobody mistakes the RS485 port for a standard USBC. Pretty simple stuff but it's still fun. [link] [comments] |
🎥 КПІ ім. Ігоря Сікорського поглиблює партнерство з Румунією
📯Університет відвідала румунська делегація на чолі з міністром економіки, цифровізації, підприємництва та туризму Амброзіє-Ірінеу Дерау, Надзвичайним і Повноважним Послом Румунії в Україні Александру Віктором Мікулою та керівником політичного відділу Посольства Богданом Пекурарем.
Вітаємо директора НН ФТІ Олексія Новікова з премією НАН України
🔘 Доктор технічних наук, професор, директор Навчально-наукового фізико-технічного інституту КПІ ім. Ігоря Сікорського, член-кореспондент НАН України Олексій Новіков отримав премію НАН України імені С. О. Лебедєва.
Візит Ніколя Тензера — французького політолога, аналітика із питань міжнародної безпеки та прав людини
🇫🇷 Нещодавно КПІ ім.



