Українською
  In English
EDN Network
The RF-ready GaN-on-silicon with lower parasitic losses

A new technology addresses a key performance barrier limiting the use of GaN-on-silicon semiconductors in mainstream RF applications. According to Scott Bibaud, president and CEO of Atomera, this will change the economics of GaN in RF by unlocking breakthrough RF performance on low-cost silicon substrates.
Gallium nitride (GaN) devices for high-performance RF applications are typically built on silicon carbide (SiC) substrates; while they offer robust performance, they are also costly and difficult to scale. On the other hand, silicon substrates offer a lower-cost, more scalable foundation with the potential to support larger wafer sizes and greater compatibility with standard silicon manufacturing.
However, GaN-on-silicon underperforms in RF applications due to parasitic channel losses that reduce efficiency, especially at high frequencies. Enter Atomera’s Mears Silicon Technology (MST), which claims to reduce these losses while offering robust linearity and lower-cost GaN solutions for 5G and other high-frequency RF devices.
MST—a quantum-engineered thin-film technology—introduces a thin, oxygen-modified layer near the surface of the silicon wafer to create a more favorable platform for GaN growth, making silicon a more viable foundation for high-performance RF devices. This controlled layer modifies the silicon lattice structure and helps block the diffusion of electrical dopants. That, in turn, improves crystal quality at the GaN-silicon interface.

MST can improve various wafer-level reliability measures in nitrided oxide planar devices. Source: Atomera
Incize, which provides characterization and modeling services for RF semiconductors, has performed RF characterization of the first MST-enabled samples. The Belgian company reports a substantial reduction in parasitic interface charge and a significant reduction in RF losses.
“Beyond the small-signal improvements, the large-signal results are particularly compelling,” said Mostafa Emam, founder and CEO of Incize. “Then there is a linearity benefit that extends into the high-power regime, approaching performance levels typically associated with advanced RF SOI technologies.”
In Atomera’s own testing, MST enabled more than a 10x reduction in parasitic channel charge, reducing a key mechanism of RF power loss and supporting improved high-frequency GaN device performance. The test data also shows that MST enables devices to handle significant power while maintaining signal quality—linearity—under stress.
Robert Mears, founder and CTO of Atomera, is quick to add that linearity is a top concern for RF designers. “The new data shows MST GaN-on-silicon achieving both the ultra-low RF losses and linearity metrics of advanced trap-rich RF SOI,” he said. “At the benchmark input power of 30 mW, the linearity is exceptional, 1000x better than the GaN-on-silicon reference wafer.”
Atomera, a semiconductor materials and technology licensing company, is based in Los Gatos, California.
Related Content
- GaN on silicon or SiC?
- A Guide to GaN-on-Silicon
- A brief history of gallium nitride (GaN) semiconductors
- Why RF Technologies Should Consider GaN Over Silicon
- GaN-on-Si Technology Makes Headway in RF Applications
The post The RF-ready GaN-on-silicon with lower parasitic losses appeared first on EDN.
How to design a digital-controlled PFC, Part 4

Editor’s note: This is a multi-part series on how to design a digital-controlled PFC. Previous entries:
- How to design a digital-controlled PFC, Part 1
- How to design a digital-controlled PFC, Part 2
- How to design a digital-controlled PFC, Part 3
High efficiency is a mandatory requirement in some applications, especially in data centers. The recently announced 80 Plus Ruby certification sets the highest efficiency standard for data center power-supply units (PSUs), as shown in Table 1. The new efficiency requirement is not only higher than 80 Plus Titanium at each load condition, but also requires 90% efficiency at a 5% load, which has never been specified before.
|
80 Plus test type |
230V internal redundant |
||||
|
Percentage of rated load |
5% |
10% |
20% |
50% |
100% |
|
80 Plus Titanium |
90% |
94% |
96% |
91% |
|
|
80 Plus Ruby |
90% |
91% |
95% |
96.5% |
92% |
Table 1 “Ruby” is the most recent and most stringent of the 80 Plus certification levels
With totem-pole bridgeless power factor correction (PFC) offering the best efficiency among all PFC topologies, digital control can further push the efficiency capabilities of this topology to new levels. In the fourth and final installment of this series, I will first introduce several digital methods to improve efficiency and then discuss some special PFC requirements including re-rush current control, electrical metering (e-metering) and PFC with a baby boost converter.
Dynamic dead time to achieve ZVS for synchronous switchTheoretically, the PFC synchronous switch can operate with zero voltage switching (ZVS), but there must be a proper dead time between when the boost switch turns off and the synchronous switch turns on. As illustrated in Figure 1, assuming a positive cycle, when boost switch Q2 turns off, the inductor current (IL) starts to charge the output capacitance (COSS) of Q2 and discharge the output capacitance COSS of Q1, and the switch-node voltage rises.
If Q1 turns on before the switch-node voltage rises to the output voltage (VOUT), this is hard switching, and the switching losses are high. If Q1 turns on too late after the switch-node voltage rises to VOUT, the current will conduct in the third quadrant of Q1 with diode-like behavior. Since the gallium nitride field-effect transistor used for Q1 has a higher VSD drop compared to a silicon metal-oxide semiconductor field-effect transistor body diode, this induces a higher third-quadrant conduction loss.

Figure 1 This equivalent circuit describes a PFC synchronous switch during dead time. (Source: Texas Instruments)
Ideally, Q1 should turn on at the exact moment when the switch-node voltage rises to VOUT. Given the IL, VOUT and COSS of Q1 and Q2, the following equation calculates the time to charge the switch node from 0 to VOUT:
You can use firmware to dynamically adjust the dead time calculated from the equation to maintain ZVS for the synchronous switch.
CCM_TCM multimode controlA totem-pole bridgeless PFC can operate in either continuous conduction mode (CCM) or triangular current mode (TCM); each has its advantages and disadvantages. Table 2 provides a high-level comparison between the two modes.
|
|
CCM operation |
TCM operation |
|
Pros |
|
|
|
Cons |
|
|
Table 2 Continuous conduction mode (CCM) and triangular current mode (TCM) options both have pros and cons for totem-pole power factor correction (PFC) operation purposes.
Ideally, the totem-pole bridgeless PFC could operate with multimode, as shown in Figure 2. At heavy loads or at the peak of an AC half cycle, the desired PFC input current is high and the PFC operates in CCM mode. When the load reduces or around the AC zero-crossing area where the desired PFC input current is low, the PFC switches to TCM mode and operates with ZVS.
Compared to pure CCM mode, this multimode operation has better efficiency at light loads because of ZVS. Compared to pure TCM mode, because the inductor current ripple is much lower, there is no need to use multiphase interleaved operation; therefore, this multimode operation significantly reduces the size and system costs. By combining the advantages of both CCM and TCM, this multimode operation can meet both high-efficiency and high-power-density requirements.

Figure 2 CCM_TCM multimode operation can meet both high-efficiency and high-power-density requirements. (Source: Texas Instruments)
Reference 1 provides details about this control method and its implementation. Figure 3 compares the efficiency (tested on the same board) between this CCM_TCM multimode control method and traditional CCM control, with efficiency improving as much as 2%.
![]() |
![]() |
| (a) | (b) |
Figure 3 CCM_TCM multimode control delivers efficiency improvements versus traditional CCM control in both low line (a) and high line (b) environments. (Source: Texas Instruments)
Special burst mode – AC cycle skippingBurst mode is widely used to improve efficiency at light loads. Unlike traditional pulse-width modulation (PWM) pulse-skipping burst mode, where you skip PWM pulses randomly, here I would like to introduce a special burst mode: AC cycle skipping, which is you skip one or more AC cycles in light loads.
In other words, you would turn the PFC off for one or more AC cycles and turn the PFC back on for the next AC cycle. The turnon and turnoff instance occurs at the AC zero crossing such that the whole AC cycle is skipped. Since PFC turnon and turnoff at inductor current equal zero, there is less stress and electromagnetic interference.
The number of AC cycles to skip is reverse-proportional to the load; the lighter the load, the more AC cycles skipped. Figure 4 shows the skipping of one and two AC cycles, respectively. Channel 1 is the AC voltage, and channel 4 is the AC current.
![]() |
![]() |
| (a) | (b) |
Figure 4 Shown here is AC cycle skipping at a light loads: one cycle (a) and two cycles (b). (Source: Texas Instruments)
Once the PFC turns off, the switching losses, driving losses and reverse-recovery losses all drop to zero, and the power losses are just the PFC standby power.
When turning off the PFC to skip AC cycles, both the current loop and voltage loop need to be frozen; otherwise, the integrators in those loops will build up to generate a big PWM pulse when the PFC turns back on, causing a large current spike.
Determining whether the PFC enters a light load requires the load information. Normally there is no current sensor at the PFC output; therefore, it’s not possible to directly measure the output load. However, because the PFC voltage-loop output is proportional to the load, you can use the voltage-loop output as a rough indicator to determine whether the PFC is operating with a light load.
If you must precisely skip an appropriate number of AC cycles to maintain VOUT ripple within a specified range, you will need accurate load information, which you can obtain through an integrated e-meter function that I will discuss after the next section.
A big concern with AC cycle skipping is the VOUT drop during a load transient. Assuming that a load step-up occurs when the PFC is off, VOUT may drop too much.
To address this issue, you can compare VOUT to a predefined threshold through a comparator. Once VOUT is below this threshold, the PFC will immediately exit burst mode, disable AC cycle skipping, and return to normal operation. The PFC will handle the transient response as if there is no such special burst mode.
AC cycle skipping can also help reduce total harmonic distortion (THD) at light loads. Reference 2 compares THD with and without this method.
Re-rush current limitThe AC input voltage could suddenly drop out when PFC is operating normally. Since the load is still applied, the PFC VOUT could drop to a lower value. Then, when the AC voltage returns, if the AC input voltage is higher than VOUT, there will be an inrush current. This current is called the re-rush current.
Previously, the re-rush current was unspecified and there was no special control action for this event, it solely relied on the power-stage components’ ability to handle re-rush current. Test results show that re-rush current can jump more than 10 times higher than the PFC-rated maximum input current. Such a high re-rush current can either damage the power supply or reduce its lifetime.
The recently released Modular Hardware System– Common Redundant Power Supply (M-CRPS) specification requires limiting re-rush current when the input voltage resumes after an input brownout or blackout event on the power supply used in a data center. As shown in Figure 5, the root-mean-square (RMS) value of re-rush current should not exceed 5 times the maximum PSU rating over one-half cycle of input frequency, or 3.5 times the maximum PSU rating over one cycle of input frequency. In addition, the input current of the PSU should settle to a value less than or equal to two times the maximum PSU rating of the PSU within two cycles of the input frequency after applying the AC input.

Figure 5 The Modular Hardware System– Common Redundant Power Supply (M-CRPS) specification documents limits on both re-rush current and timing. (Source: Texas Instruments)
Reference 3 provides a firmware-based solution to handle this re-rush current so that when the AC voltage comes back from dropout, both the re-rush current (when VIN > VOUT) and the non-re-rush current (when VIN < VOUT) are well controlled – not exceeding the M-CRPS limit specification, but high enough to rapidly boost VOUT.
E-meteringPower supplies in data centers are required to measure the input power in real time and report the measurement to the host; this is called e-metering. The M-CRPS specification requires an input power measurement error within ±1% when the load is >125W, within ±1.25W when the load is between 50W and 125W, and within ±5W when the load is <50W. To achieve such high measurement accuracy, the e-meter function is traditionally implemented through a dedicated metering device, as shown in Figure 6a.
![]() |
![]() |
| (a) | (b) |
Figure 6 These circuit diagrams show a traditional e-meter and PFC control (a), as well as combining an e-meter with PFC control (b). (Source: Texas Instruments)
A current shunt placed on the PFC input side senses the input current, with a voltage divider (not shown in Figure 6a) across the AC line and AC neutral senses the input voltage. A dedicated metering device receives this current and voltage information and calculates the input power and input RMS current information, sending the results to the host.
With a digital controller, since analog-to-digital converters (ADCs) of the microcontroller (MCU) are measuring both the input voltage and input current, it becomes possible to integrate the e-meter function into PFC control code. Figure 6b shows this e-meter configuration.
A current shunt senses the input current and an isolated delta-sigma modulator (the AMC1306 from Texas Instruments) measures the voltage drop across the current shunt. The delta-sigma modulator output is sent to the PFC controller MCU. The current information will be used for both e-metering and PFC current-loop control. A voltage divider senses the input voltage, which is then measured by the MCU’s ADC directly, just as in traditional PFC control. Reference 4 has more details about e-meter implementation and calculation.
Integrating e-meter functionality into PFC control code eliminates the need for a dedicated metering device, not only reducing system costs, but also simplifying printed circuit board layout and expediting the design process.
PFC with a baby boost converterIn server applications, a bulk capacitor (CBULK in Figure 7) is required to hold PSU output in regulation for more than 10mS after AC dropout. To accomplish this, a 3kW server PSU would need a total capacitance of over 1.3mF, which would consume at least 30% of the overall space. To improve power density, you must reduce the bulk capacitance.
Adding a baby boost converter between PFC and DC/DC, as shown in Figure 7 and described in Reference 5, can achieve high power density. The baby boost converter is a compact boost converter that only operates during AC dropout events.

Figure 7 A PFC with a baby boost converter can achieve high power density. (Source: Texas Instruments)
Figure 8 is a flow chart of baby boost converter operation. During normal operation, the baby boost converter is off and bypassed by a BYPASS FET Q4. When AC line dropout occurs and VBULK drops to a certain level, Q4 turns off, and the baby boost converter turns on to allow VBB to maintain its nominal value. If AC power returns, VBULK will rise; once VBULK rises to a certain level, MCU turns off the baby boost converter, turns on BYPASS FET Q4, and the PFC resumes normal operation.

Figure 8 This flow chart outlines the various stages of baby boost converter operation.
I hope that the information imparted in this series enables you to design your own digital-controlled PFC and meet ever-more-strict specifications. You will find that digital control is so flexible that is possible to implement advanced control algorithms that would be difficult to implement with analog control. A digital-controlled power supply also offers impressive performance.
References
- Sun, Bosheng. “A novel CCM-TCM multimode control method for totem-pole bridgeless PFC.” Texas Instruments Analog Design Journal article, literature No. SLYT877, 1Q 2026.
- Sun, Bosheng. “AC cycle skipping improves PFC light-load efficiency.” Texas Instruments Analog Design Journal article, literature No. SLYT585, 3Q 2014.
- Sun, Bosheng. “How to limit PFC re-rush current.” Texas Instruments Analog Design Journal article, literature No. SLYT865, 1Q 2025.
- Sun, Bosheng. “A low-cost and high-accuracy e-meter solution.” EDN, Aug. 26, 2024.
- Yu, Sheng-Yang, Benjamin Genereaux, and LiehChung Yin. “Improve power density with a baby boost converter in a PFC circuit.” Texas Instruments Analog Design Journal article, literature No. SLYT830, 2Q 2022.
Related Content
- How to design a digital-controlled PFC, Part 1
- How to design a digital-controlled PFC, Part 2
- How to design a digital-controlled PFC, Part 3
- A low-cost and high-accuracy e-meter solution
The post How to design a digital-controlled PFC, Part 4 appeared first on EDN.
MLPerf and the rise of latency-aware LLM benchmarking

Any discussion of modern AI system performance must include MLCommons and its MLPerf benchmark suite, which has become the industry’s de facto standard for measuring machine learning performance. Since its debut in 2018, MLPerf has provided a neutral, peer-reviewed framework for comparing hardware and software platforms across a broad range of AI workloads.
The original MLPerf benchmarks reflected the dominant AI workloads of the late 2010s. Early inference tests focused on models such as image classification with ResNet-50, natural language processing with Bidirectional Encoder Representations from Transformers (BERT), object detection with RetinaNet, and recommendation with Deep Learning Recommendation Model (DLRM).
These workloads were important and representative at the time, but they shared one characteristic: they were highly parallel and relatively easy to map onto GPU architectures.
For several years, benchmark results reinforced a simple narrative. Each new generation of accelerators delivered higher throughput, lower latency, and better energy efficiency. Because the workloads aligned well with GPU strengths, the benchmark curves rose steadily and predictably.
The generative AI shockwave: Rewriting the rules of MLPerf
Autoregressive LLMs introduced a fundamentally different inference pattern. Prompt processing remained highly parallel, but token generation became sequential and memory bound. Suddenly, raw TeraFLOPS no longer told the whole story.
MLPerf began incorporating this new reality in stages. Inference v4.0 introduced the first LLM benchmark based on Meta platform Llama 2 70B. This benchmark measured token throughput and provided the industry with its first standardized method for comparing LLM inference systems.
MLPerf Inference v5.0 released in 2025 significantly expanded the generative AI focus. It added Llama 3.1 405B Instruct, a 405-billion parameter model with a 128,000-token context window. The benchmark also introduced an interactive variant of Llama 2 70B that imposed strict limits on Time to First Token (TTFT) and Time Per Output Token (TPOT), two metrics that directly capture user experience in conversational applications.
These additions were pivotal because they exposed the core weakness of GPU-based inference systems. When unconstrained by latency, GPUs could buffer requests, create large batches, and deliver excellent throughput. Under interactive latency limits, batching opportunities shrank, hardware utilization dropped, and throughput fell sharply.
In other words, MLPerf began measuring not just how fast a system could run under ideal conditions, but also how responsive it remained under realistic conditions.
Inference disaggregation: Optimization of resources
This evolution reached another milestone in MLPerf Inference v5.1 and the emerging v6.x era. The benchmark suite broadened its focus to include increasingly sophisticated workloads, including reasoning models such as DeepSeek-R1 and more demanding long-context applications. At the same time, submissions began showcasing system-level optimizations such as inference disaggregation, where prompt processing and decoding are assigned to different accelerator pools.
Disaggregation has become one of the most consequential developments in modern inference benchmarking.
Historically, MLPerf treated each benchmark run as a single system under test, leaving vendors free to optimize their hardware and software stacks as they saw fit. As long as submissions complied with accuracy and latency requirements, any architectural technique was fair game.
This openness allowed participants to introduce increasingly sophisticated serving strategies. One of the most effective has been the separation of prefill and generation across distinct groups of accelerators. The prefill cluster handles the compute-intensive prompt processing stage, while the generation cluster focuses exclusively on token decoding.
In controlled benchmark scenarios, where prompt lengths and output lengths are known in advance, disaggregation can produce dramatic gains. By eliminating interference between the two phases, systems reduce preemption and improve latency-sensitive throughput.
Yet this raises an important question. Does the benchmark still measure accelerator capability, or is it increasingly measuring system orchestration? The answer is both.
Modern AI performance depends on the interaction between processor, memory hierarchy, interconnect fabric, runtime software, and serving algorithms. MLPerf has evolved accordingly. It now rewards system-level innovation rather than isolated chip performance.
That shift is entirely appropriate, but it also means benchmark results must be interpreted carefully.
A disaggregated configuration optimized for long document summarization may perform brilliantly in MLPerf while delivering more modest benefits in production environments where workloads vary continuously. Real-world deployments must cope with unpredictable prompt lengths, bursty traffic, and rapidly changing ratios of prefill to generation demand.
Consequently, MLPerf increasingly measures a system’s ability to align resources with a known workload profile. This is a valuable metric, but it’s not synonymous with universal real-world performance.
Illustrative comparison: MLPerf 5.x versus MLPerf 6.x
Table below illustrates how benchmark methodology evolved as MLPerf shifted from throughput-oriented LLM tests to more latency-sensitive and system-aware workloads. The numbers are representative rather than exact, but they reflect the broad trends seen in published results and vendor disclosures.

Publicly discussed MLPerf inference results based on Llama 3.1 405B LLM run on a leading-edge GPU-based processor in three scenarios (off-line, server mode, and interactive mode) highlight MLPerf’s evolution. Source: Author
From chip benchmark to system benchmark
The history of MLPerf mirrors the evolution of AI itself.
The early benchmark suites focused on relatively static workloads that aligned naturally with the strengths of GPU architectures. Tasks such as image recognition, recommendation systems, and conventional deep learning inference relied heavily on dense matrix operations and large-scale parallelism, allowing GPUs to demonstrate exceptional throughput and scalability. In that era, benchmark leadership was closely associated with raw compute capability, memory bandwidth, and increasingly larger accelerator configurations.
The rise of generative AI fundamentally changed that equation.
As autoregressive LLMs became the dominant workload, MLPerf evolved accordingly, introducing larger models, longer context windows, interactive server scenarios, and increasingly strict latency constraints. These additions exposed a critical reality: while GPUs remain extraordinarily efficient during the highly parallel prefill phase, they are far less efficient during token generation, where inference becomes sequential, memory-bound, and heavily dependent on latency-sensitive execution.
This shift transformed the meaning of benchmark performance.
Modern MLPerf results no longer measure the capabilities of an isolated accelerator alone. Instead, they measure the effectiveness of an entire inference architecture.
Disaggregation, scheduling policies, key-value (KV) cache management, streaming pipelines, runtime orchestration, and workload balancing have become just as important as the underlying silicon itself. In many cases, the benchmark winner is no longer the system with the most compute power, but the one that most effectively adapts a fundamentally sequential workload to hardware originally designed for massively parallel graphics and HPC computation.
As a result, benchmark interpretation has become significantly more nuanced. The headline numbers increasingly reflect how intelligently the system orchestrates resources across racks of accelerators, separates prefill from generation, minimizes preemption, and maintains throughput under realistic latency constraints. MLPerf has evolved from a pure hardware benchmark into a broader measure of system architecture and software orchestration.
At the same time, this evolution reveals something even more profound. The latest MLPerf 6.x requirements implicitly highlight the growing limitations of conventional GPU architectures for real-time LLM inference. The industry has reached a point where increasingly sophisticated scheduling mechanisms and disaggregated serving infrastructures are being used to compensate for a deeper architectural mismatch between autoregressive inference and massively parallel processors.
In many respects, the benchmark itself is beginning to suggest the next major transition in AI infrastructure design.
Rather than continuing to optimize architectures originally developed for graphics rendering and parallel numerical computing, the future may require entirely new inference-centric architectures built specifically for the unique characteristics of the LLM generation. Such architectures would need to deliver high utilization and low latency even with very small batch sizes—potentially down to a single user request—while minimizing data movement, reducing memory bottlenecks, and supporting continuous token generation without relying on increasingly complex orchestration layers to hide inefficiencies.
In that sense, MLPerf has become more than a benchmark suite. It is now a window into the architectural tensions shaping the future of AI computing, revealing both the extraordinary adaptability of modern accelerator systems and the growing need for a fundamentally new class of inference hardware designed from the ground up for the realities of autoregressive AI.
Lauro Rizzatti is a business development executive with Vsora, a technology company offering semiconductor solutions that redefine design performance. He is a noted chip design verification consultant and industry expert on hardware emulation.
Editor’s Note
This is Part 2 of the mini-series that examines how LLM inference forced changes to MLPerf benchmarking. In Part 1, contributor Lauro Rizzattti analyzes LLM inference across its two processing phases—prefill versus generation—and highlights how this workflow exposes structural inefficiencies in GPU-based accelerators.
Related Content
- Strategies to Dominate the AI Accelerator Market
- A closer look at LLM’s hyper growth and AI parameter explosion
- The role of AI processor architecture in power consumption efficiency
- AI GPU computing delivers data-center performance on the factory floor
- The truth about AI inference costs: Why cost-per-token isn’t what it seems
The post MLPerf and the rise of latency-aware LLM benchmarking appeared first on EDN.
Memory card interfaces keep pace with the internal bus evolution race: Part 1

Clock speeds get faster. Per-cycle (and per-clock edge) address and data dollops get larger. And protocols get more efficient. But here we’re talking about external, not internal, buses.
Back in 2023, I devoted two blog posts’ worth of content to comparing various memory card technologies, products and speed bin options, initially in March (identifying a fake card in the process) and more in-depth in July. Since then, I’ve come across numerous examples of both evolutionary and revolutionary successors to the devices discussed in that two-part series, not to mention those covered in even more distant-past writeups (themed, for example, around the cameras, digital audio recorders and other devices that leverage such storage).
I’ve had this follow-up piece in my to-do list for a while now, and I’ve finally decided to actualize my longstanding aspiration before the dust pile accumulating on this specific list entry gets any deeper. Not every technology to be discussed in the paragraphs to follow will likely achieve high-volume market success, mind you, with any sooner-or-later failures not necessarily the result of implementation shortcomings, either. Note, for example, that today’s (and past) industry supply constraints encourage manufacturers to “double down” on maximizing the output and profitability of existing approaches, versus devoting scarce capacity to dubious bets.
That said, win or lose there’s usually an interesting story behind each approach. Without further ado…and with the upfront qualifier that I’ll be intentionally delaying any discussion of USB-interface memory devices until later, since their connector locations compel them to be fully external to the system, either sticking straight out of it or cable-tethered to it…and that for related reasons, I won’t be covering eMMC and other fully internal formats, either…and lastly, that I’ll be skipping over legacy formats that were proprietary and/or otherwise non-impactful…
Historical precedentsA short writeup, “History Repeating” at Virginia Tech’s website, begins as follows:
Variations on the repeating-history theme appear alongside debates about attribution. Irish statesman Edmund Burke is often misquoted as having said, “Those who don’t know history are destined to repeat it.” Spanish philosopher George Santayana is credited with the aphorism, “Those who cannot remember the past are condemned to repeat it,” while British statesman Winston Churchill wrote, “Those that fail to learn from history are doomed to repeat it.”
Long-time readers may recall that I’ve referenced variants of this same quote theme in several past writeups, consistently with a negative connotation involving the downsides of ignorance to the past. That said, excessive dependence on history lessons can also be problematic, resulting in evolutionary, overly constraining baby-steps that suppress alternative more revolutionary strides, which may lead to failure but may also dramatically leap beyond traditional approaches.
I’ll leave you to decide for yourselves what to conclude from this first case study, admittedly too personal to likely allow me to be completely arms-length about it! Embedded within the tuple (card identifier) data structures reported by Intel’s Series 2 flash memory cards were the initials of the small team of developers, myself among them, who designed their ASIC (30 years ago…yikes!). I subsequently led the technical marketing launch of the 28F008SA 8 Mbit flash memories inside those same cards, followed by the definition, development and introduction of 16 and 32 Mbit component successors and cards based on them, all in the early-to-mid-1990s.
Products such as these, representing the industry’s first removable and high capacity (for the era, at least) memory cards, added these tuple structures and other enhancements in order to deliver full Personal Computer Memory Card International Association (PCMCIA, later known as PC Card) compatibility, in contrast to Series 1 precursors which were more elementary multi-component arrays along with address decode and chip select logic. Intel’s and others’ similar products were specifically referred to as linear flash memory PC Cards, both to differentiate them from other PCMCIA card types—modems, ISDN and SCSI, for example, and living on (at least to a degree) with CableCARDs—and from alternative ATA-interface flash memory cards.

The key difference between the two memory card types centered on where the flash media management intelligence was located: in the card itself for ATA flash PC Cards, thereby presenting a standardized hardware and software interface to the system regardless of what (and whose) media was inside, versus in the system, implemented as software and/or dedicated hardware, for the linear flash PC card approach. Proponents of the latter scheme touted its claimed reduced media bill-of-materials cost, not to mention the potential ability to direct-execute code out of it (acting as a big parallel-interface chip), but it was inherently relevant for only NOR (vs NAND) memory suppliers, along with being a “heavier lift” for system developers. For these and other reasons, the ATA approach eventually won out in the marketplace.
MiniaturizationThat said, Intel and several of its NOR flash memory partner/competitors had also taken a stab at miniaturizing the linear flash PC Card with the creatively named (ha!) Miniature Card format:

Other flash memory suppliers countered with the ultimately much more popular CompactFlash card, now maintained by the aptly named CompactFlash Association (CFA), whose hardware interface was similarly PCMCIA-derived albeit instead (as with the ATA flash PC Card precursor) focused on the IDE/ATA (and later, UDMA) command set:

Amid this “where is the media management intelligence best located” debate, two other notable contending approaches of the same timeframe also bear mentioning. The first, SmartMedia, was championed by Toshiba (as well as, later, by its primary competitor, Samsung):

SmartMedia was essentially a single (although a few variants embedded multiple) NAND flash memory die embedded within a thin plastic membrane, plus a multi-contact metallic interface that wirebond-direct-connected to the die with no intervening media controller intelligence.
Conceptually sounds like linear flash PC Cards and their derivatives, doesn’t it? Yes…and no. For one thing, SmartMedia was much smaller than either Miniature Card or Compact Flash. For another, it was based on NAND flash memory, which was more HDD-like in its core attributes (notably erase block size and speed) than NOR, simplifying system-side media management development. And then there was the fact that Toshiba wasn’t just a semiconductor supplier; its various systems divisions were potential SmartMedia implementers, and the company also did a good job of cultivating business from other Japanese and broader Asian systems manufacturers.
Finally, near the end of the last century (in 1997, to be exact), Sandisk and systems partners Siemens and Nokia unveiled the MultiMediaCard (MMC), which ultimately came in multiple dimension options, as well as in both standard and clock-boosted performance variants:

MMC is best known today in its aforementioned non-removable eMMC form, which itself is being slowly supplanted by the embedded variant of the MIPI- and SCSI-based Universal Flash Storage (UFS) (an organization whose own removable-version standard ironically has conversely been underwhelmingly adopted by the industry). Today’s generational successor to MMC is the Secure Digital (SD) card, originally referred to as SecureMMC:

which built on the MMC foundation with “enhancements including a digital rights management (DRM) feature, a more durable physical casing, and a mechanical write-protect switch.” The SD standard’s successive iterations have expanded the available clock speed, protocol and electrical contact count options in a backwards-compatible fashion to keep pace with flash memory performance gains, such as in this high-end V90 card from OWC:

The microSD Card derivative tackled substantive dimensional decreases with notable success; here’s one alongside the SmartMedia card I showed you earlier:

One interesting newer SD (and microSD) card specification variation that I became aware of recently when shopping for storage media for a couple of new Raspberry Pi cards is the Application Performance Class. Quoting from Kingston Technology documentation:
A new classification has been presented with the introduction of Android’s Adopted Storage Device feature. The App Performance Class assures minimum random and sequential performance speeds to meet both run and store execution time requirements under given conditions. It does this simultaneously while providing storage for pictures, videos, music, files and other important data. Basically, they’re ideal for use in smartphones and mobile gaming devices that run applications at random read and write speeds while also being used for storage.
There are two ratings for the App Performance Class which are known as A1 and A2. A1 has a minimum random read of 1500 IOPS and a minimum random write of 500 IOPS while A2 has a minimum random read of 4000 IOPS and a minimum random write of 2000 IOPS. Both A1 and A2 have a minimum sustained write speed of 10MB/s. The App Performance Class is something to consider [editor: for example] when planning on installing Android apps on a microSD card.
And, by the way, unlike the SmartMedia competitor of the day, both MMC and successor SD Cards notably also embed (despite their smaller sizes) media management intelligence that simplifies and standardizes the system implementation. Moore’s Law strikes again, eh?
Hang tight; I’ll be right backBelieve it or not, I originally envisioned this being, and wrote it as, a single unified blog post. However, as thought of more (and more…and more…) things to include, the wordcount grew (and grew…and grew…), transforming it into something resembling a small book (I exaggerate, but you get my drift). Having passed through 1,500 words at the beginning of this paragraph, I’m instead going to pause for now, intending (God willing) to share the other half of this now-two-part series with you next week. Until then, please share in the comments your thoughts on what I’ve covered so far!
—Brian Dipert is the associate editor, as well as a contributing editor, at EDN.
Related Content
- Memory cards: Specifications and (more) deceptions
- SD card speeds: question your assumptions
- AI boom and the politics of HBM memory chips
The post Memory card interfaces keep pace with the internal bus evolution race: Part 1 appeared first on EDN.
The hidden bottleneck in LLM inference and the impact on MLPerf benchmarking

Recent frontier LLM inference benchmarks have highlighted a recurring pattern. GPU-based systems deliver outstanding throughput when latency is not a concern, but their performance drops sharply once real-time response requirements are imposed.
This behavior is sometimes attributed to software inefficiencies or suboptimal system tuning. In reality, the root cause lies much deeper. It reflects a fundamental mismatch between how GPUs are architected and how autoregressive inference works.
LLM inference: Prefill versus generation
To understand this limitation, it is useful to examine the two distinct phases of LLM inference: prefill and generation.
During the prefill phase, the model processes the entire input prompt in one pass. The prompt is tokenized, embedded, and propagated through every layer of the transformer network. At each layer, the model computes the attention relationships among all tokens and builds the key-value (KV) cache, which stores the intermediate data needed for subsequent token generation.
This stage maps extremely well onto GPU hardware. GPUs were designed to execute thousands of identical operations in parallel. In the prefill phase, the model performs massive matrix multiplications over large tensors, exactly the type of workload for which GPUs excel. When all tokens are available upfront, the calculations can be distributed across tens of thousands of cores, resulting in very high arithmetic utilization.
The generation phase is fundamentally different.
Once the KV cache has been created, the model begins producing output tokens one at a time. Each token depends on all tokens that came before it. This sequential dependency means that, regardless of how much hardware is available, the model cannot generate the next token until the current one has been completed.
For every generated token, the model must read the parameters for every layer, consult the KV cache, compute the next token probabilities, and then repeat the autoregressive process. The amount of computation per token is relatively modest, but the amount of data movement remains substantial.
Two faces of GPU architecture: Why modern GPUs struggle with real-time latency constraints
This is where the GPU architecture begins to work against the workload.
GPUs achieve peak efficiency when they execute large, highly parallel workloads with regular memory access patterns. Token generation offers neither. The workload is small, inherently sequential, and dominated by repeated memory accesses rather than dense arithmetic. Many of the GPU’s compute units remain idle while the device waits for data to arrive from high-bandwidth memory.
In other words, generation is not compute-bound; it’s memory-bound.
The distinction is crucial. In a compute-bound workload, adding more arithmetic units improves performance. In a memory-bound workload, performance is limited by how quickly data can be moved to the processors. Once memory bandwidth becomes the bottleneck, additional compute resources provide diminishing returns.
This explains why GPUs can appear extraordinarily efficient when throughput is measured without latency constraints. In that scenario, inference servers are free to buffer requests and combine them into large batches. Batching allows the system to process many token streams simultaneously, effectively transforming numerous small sequential tasks into a larger parallel workload that better matches the GPU’s strengths.
The role of batch sizes in GPU’s utilization
At first glance, batching in AI inference may appear straightforward. Unlike image inference where every sample in a batch completes simultaneously, LLM inference involves many conversations progressing independently and asynchronously. Some requests finish quickly, others may continue for hundreds or even thousands of decoding iterations, and new requests may arrive continuously while older conversations are still active.
The workload therefore becomes highly dynamic and irregular. Specifically, the generation of each request ends only when the model produces a special “end-of-sequence” token indicating that the response is complete.
This characteristic fundamentally changes the nature of inference scheduling.
This is where continuous batching becomes essential. Continuous batching is the runtime orchestration algorithm responsible for managing the simultaneous execution of multiple conversations across the same accelerator resources. Instead of treating inference as a sequence of isolated batches, the scheduler continuously inserts, removes, pauses, and resumes requests as tokens are generated.
The objective is to maximize hardware utilization while minimizing user-visible latency. As batch sizes increase, hardware utilization rises and throughput improves dramatically. However, batching comes at the cost of response time.
When users expect low latency, the system cannot afford to delay requests while waiting to accumulate a large batch. Each request must be processed almost immediately. As batch sizes shrink, the GPU loses the parallelism needed to keep its compute resources busy. Utilization falls, and throughput drops accordingly.
This is the central architectural limitation of GPUs in LLM inference.
The issue becomes even more pronounced when the same accelerator must handle both prefill and generation. Prefill is a large, compute-intensive task, while generation consists of many smaller, latency-sensitive operations. When new prompts arrive, the system may need to interrupt ongoing token generation to perform prompt processing. These context switches, often referred to as preemption, increase latency and reduce efficiency further.
Inference disaggregation: A clever shortcut to mitigate GPU’s inefficiencies
To mitigate this problem, system designers have begun disaggregating inference. Instead of assigning both phases to the same accelerator pool, they dedicate one group of GPUs to prefill and another to generation. The prefill GPUs build the KV cache and transfer it to the generation GPUs, which decode tokens independently.
This separation eliminates interference between the two phases and allows each group of GPUs to operate more efficiently. Prompt processing can proceed continuously without disrupting active token generation, and generation can continue without interruption.
In controlled benchmark environments, where prompt lengths, output lengths, and request patterns are known in advance, this approach can deliver substantial improvements.
Yet the underlying limitation of GPU architectures remains.
Inference disaggregation: Does it scale in real-world applications?
The generation phase is still sequential and memory bound. No amount of software optimization can eliminate the need to read model weights and cached data for each token. The disaggregated approach simply reduces scheduling inefficiencies and isolates the phases so that GPU resources are used more effectively.
Whether this strategy can scale efficiently in real-world applications depends on workload predictability.
The real-world AI services process a highly variable mix of requests. Some consist of long prompts and short responses. Others involve short prompts and long outputs. Demand can shift rapidly over time, changing the ideal ratio between prefill and generation resources.
Adapting to these changes requires dynamically reallocating accelerators. That process is not instantaneous. Devices must be initialized, model parameters loaded, and serving infrastructure synchronized. If traffic patterns are highly volatile, the overhead of reconfiguration can offset much of the benefit.
The broader lesson is that GPU performance in LLM inference is governed by more than raw TeraFLOPS.
The prefill phase showcases the strengths of GPUs, leveraging dense matrix operations and massive parallelism. The generation phase exposes their weaknesses, forcing highly parallel processors to execute a fundamentally sequential, memory-dominated workload.
As a result, the impressive throughput numbers often reported in unconstrained benchmarks can be misleading. They reflect idealized conditions in which batching hides architectural inefficiencies. Once latency constraints are introduced, those inefficiencies become visible.
The challenge for the industry is not simply to build larger GPUs, but to develop architectures and system designs better aligned with the realities of autoregressive inference.
Until then, the most significant limitation in real-time LLM serving will remain the same: generation is a sequential, memory-bound process running on hardware originally optimized for massively parallel computation.
Lauro Rizzatti is a business development executive with VSORA, a technology company offering semiconductor solutions that redefine design performance. He is a noted chip design verification consultant and industry expert on hardware emulation.
Editor’s Note
In a two-part series, contributor Lauro Rizzattti examines how LLM inference forced changes to MLPerf benchmarking. He will illustrate the evolution of the MLPerf benchmark and detail how generative AI forced a radical shift in AI hardware evaluation in the upcoming Part 2.
Related Content
- Strategies to Dominate the AI Accelerator Market
- A closer look at LLM’s hyper growth and AI parameter explosion
- The role of AI processor architecture in power consumption efficiency
- AI GPU computing delivers data-center performance on the factory floor
- The truth about AI inference costs: Why cost-per-token isn’t what it seems
The post The hidden bottleneck in LLM inference and the impact on MLPerf benchmarking appeared first on EDN.
Build 2026: Accumulating evidence of Microsoft’s AI independence

Abundant use of the AI acronym is increasingly evident at various industry events. Strip away the hype layer and look deeper, however, and interesting trends still emerge into view.
This is my third straight year covering Microsoft’s developer-focused conference, following up on the 2024 and 2025 show editions. And interestingly (at least to me), the event timing, both in an absolute sense and relative to other notable industry trade shows, has shifted each year.
- 2024’s Build took place on May 21-23, the week after Google’s I/O developer event (May 14-16) and several weeks before Computex (June 4-7)
- Last year, all three conferences took place on the same week
- And this year, the Google I/O and Microsoft Build cadence returned to separate-weeks spacing, two weeks apart this time. Conversely, Build and Computex were still in the same-week slot.
Why the upfront focus on this seeming nuance? Well, for one thing, Computex conversely is a consumer-tailored show. That’s why, for example, Microsoft and NVIDIA co-announced one new computer (information on which I’ll share shortly) at Computex, while introducing another with a different form factor but the exact same processing subsystem at Build. Plus, in emphasizing a point that is likely already obvious to at least some of you, any chronological spacing between two companies’ events enables the latter to fine-tune its announcements and their messaging to react to the former…and the more spacing the better from a reaction-robustness standpoint.
Speaking of announcements, let’s get to them, shall we? Microsoft CEO Satya Nadella and his various lieutenants, along with a couple of special guests, covered a lot of ground in the 2.5-hour kickoff keynote, the video of which I’ve embedded below. I’ll hit what I thought were the highlights in the following paragraphs.
AI inference-accelerating hardwareAbout those computers I just mentioned…stop me if you’ve heard this before. Microsoft and a partner roll out new Windows-on-Arm computer platforms, both mobile and mini-desktop in shape, and intended for both consumers and developers. Two years ago, that partner was Qualcomm, the SoCs were the Snapdragon X Elite and Plus, and the consumer mobile systems were the Surface Laptop and Pro (also accompanied by ones from other OEMs, in a nod to Microsoft’s broader Windows-on-Arm aspirations). The developer mini-desktop was the Snapdragon Dev Kit for Windows, which never made it to production: Qualcomm “indefinitely paused” it only a few months later:

This outcome was more than a bit of a surprise to me, albeit not a complete surprise, as I’d been hearing for some time of both chronic hardware and software issues with the platform. That said, I already owned (and still use) its two Qualcomm application processor-based, developer-tailored predecessors, the Qualcomm-branded ECS LIVA Mini Box QC710:
and Microsoft’s “Project Volterra” (officially: Windows Dev Kit 2023) system:

so the Snapdragon Dev Kit for Windows was unsurprisingly on my wish list, too.
Hopefully NVIDIA will have better luck, although the situation still feels somewhat embryonic. Consumer mobile system(s) first: launched at Computex and coming “this fall” at an as-yet-unannounced price is the Microsoft Surface Laptop Ultra, based on NVIDIA’s RTX Spark SoC:
While you might not immediately recognize the processor from its new marketing moniker, you’ve heard about it (from me, to be precise) before. It was previously known as the N1 and N1X, as well as the GB10, and it’s the outcome of a co-development project with MediaTek, who contributed the up-to-20-core CPU constellation and reportedly also took lead on full-chip integration, including the NVLink interconnect to the up-to-6,144 core GPU cluster.

The SoC’s development has been lengthy and troubled, if longstanding and widespread rumors are to be believed, and industry analyst skepticism remains existent. It first appeared in a Linux-based system, the DGX Spark (rebranded from its initial name, Project DIGITS), last October:

And now, NVIDIA has determined that the RTX Spark is finally ready for Windows-based laptops (and not just from Microsoft itself, just as was the case two years before with Qualcomm). But not now. “This fall”. At a price to be announced later, but likely stratospheric if due only to the industry constraints-driven currently pricey “up to 128GB of unified memory”. And what about the developer mini-desktop system, the Surface RTX Spark Dev Box, unveiled at Build?
There’s…umm…a waitlist. Microsoft CEO Satya Nadella invited the Build attendees to join him on it. None of which inspires much in the way of confidence. Maybe one or both systems will be available for sale in time to end up on this November’s edition of my yearly “Holiday shopping guide for engineers”, but at this point, I’d be (pleasantly, mind you) surprised.
If you’re once again feeling déjà vu, by the way, it’s because Microsoft and NVIDIA have been here before. The initial attempt at bringing a Windows-on-Arm system to market, the Surface with Windows RT, was based on an NVIDIA Tegra SoC. I personally owned one and ended up tearing it apart after it eventually died. The hardware was first-rate for the time, although a dearth of native software in conjunction with woeful x86 code emulation support doomed it.
That was 2012. Jump forward again to the other, earlier-mentioned déjà vu moment, Qualcomm’s announced partnership with Microsoft in 2024, and I feel compelled to point out that by no means is it seemingly deceased (or even on life support, for that matter). I recently acquired a gently used Microsoft Surface Pro 11 based on Qualcomm’s Snapdragon X Plus to replace my long-in-the tooth Surface Pro X. The SP11 has 16 GBytes of RAM and a 1 TByte SSD and runs solely on its integrated battery all day with ease, even when emulating x86. Microsoft systems based on second-generation Snapdragon X2 Elite (and presumably also Plus) SoCs are seemingly coming soon. And on a similar note, Microsoft’s still churning out branded systems based on x86 CPUs, too, with most recent updates less than a month ago.
Agentic-centric O/SsOne particularly memorable quote from Satya Nadella in the keynote was the following:
“There’s a real platform shift. We’re moving from building operating systems, devices for apps, to agents.”
Indicative of this forecasted shift is Project Solara, explained by means of a conversation between Nadella and Qualcomm President and CEO Cristiano Amon:
along with an Android-derived proof-of-concept demonstration showing agent-based interactions with (and between) a smart speaker with a screen, mobile devices, and intelligent ID cards. Google also spoke a great deal about agentic AI at its I/O developer conference two weeks ago; instead of repeating myself again, I’ll refer you to my coverage of that event for the background info if you need it.
Speaking of agents, Microsoft also announced Execution Containers, which keep agents from accessing unintended, critical regions of other agents and applications, the underlying operating system and system hardware. And for when you want to communicate with them, OpenClaw founder Peter Steinberger showed up on stage by means of introducing Scout, an OpenClaw AI Assistant gateway. If you’re thinking it sounds at least something like Gemini Spark, which Google announced two weeks back, you’re not off-base. Remember my comments at the beginning of this piece about competing-event timing and ordering and effects on later-event messaging?
Homegrown modelsLast but not least, let’s touch on an event topic that prompted the “AI Independence” title of this piece. In late April, OpenAI and Microsoft “redefined” their business relationship, in the process fundamentally freeing both companies from the various exclusivity arrangements that had previously defined (and arguably dominated) it. While a “divorce” would be overstating the result, a “softer” term such as “conscious uncoupling” wouldn’t be far off.
One tangible outcome of this redefinition was clearly evident this week, as Mustafa Suleyman, head of Microsoft AI, unveiled seven new homegrown AI models with capabilities spanning image, voice and transcription functions and claimed performance matching if not exceeding that of Google, OpenAI and other competitors’ models, both open- and closed-source. I was particularly interested in Suleyman’s declaration regarding MAI-Thinking-1, the flagship reasoning model, that:
“We trained it from the ground up on clean data, without distillation from third-party models.”
And with that, I’ll wrap up for today. As always, I welcome your thoughts in the comments on the topics I’ve covered here, as well as any others that might have caught your eye—Microsoft’s ongoing research work on quantum computing, for example, including the development of Majorana 2, the sequel to last year’s premier quantum computing chip from the company.
Next Monday, Tim Cook and his CEO successor John Ternus (I’m assuming) will hit the stage to kick off Apple’s yearly Worldwide Developers Conference (WWDC), completing the yearly big-tech-company developer conference triumvirate. I’ll see you back here then, if not before!
—Brian Dipert is the associate editor, as well as a contributing editor, at EDN.
Related Content
- Google I/O 2026: Agentic AI gets serious
- Microsoft Build 2025: Arm (and AI, of course) thrive
- Microsoft’s Build 2024: Silicon and associated systems come to the fore
- A holiday shopping guide for engineers: 2025 edition
The post Build 2026: Accumulating evidence of Microsoft’s AI independence appeared first on EDN.
Agilex 9 FPGAs power COTS VPX boards

Altera has partnered with Mercury Systems and VadaTech to expand its Agilex 9 FPGA ecosystem with COTS VPX boards for mission-critical defense platforms. These solutions integrate Agilex 9 medium-band Direct RF FPGAs into VPX architectures, including SOSA-aligned OpenVPX, to help defense customers accelerate time-to-market, reduce SWaP, and enable flexible software-defined RF capabilities.

The Agilex 9 FPGAs combine RF data converters, FPGA fabric, and high-speed transceivers into a unified, programmable architecture, enabling real-time processing of large volumes of RF data at the edge. This integration supports distributed, multi-domain operations that require rapid decision-making and adaptation to changing mission requirements. The devices deliver the bandwidth, performance, and I/O needed for demanding embedded applications such as adaptive radar, cognitive electronic warfare, and secure, software-defined communications.
Mercury Systems’ DRF5660 boards and VadaTech’s VPX540 boards with Agilex 9 Direct RF AGRM027 FPGAs are available for order today.
The post Agilex 9 FPGAs power COTS VPX boards appeared first on EDN.
Value DSCs streamline embedded control

Digital signal controllers (DSCs) in Microchip’s dsPIC33CK Value Line provide real-time control for cost-sensitive designs. Starting at $0.51 each, they offer consistent pricing regardless of order size. The 16-bit controllers deliver 100-MHz deterministic processing, high-resolution PWM, and a 12-bit ADC supporting motor control, precision sensing and control, and touch/HMI applications.

A balanced set of peripherals helps reduce external component count, PCB footprint, and overall BOM cost. With flash memory ranging from 32 KB to 256 KB and compatibility across the dsPIC33CK family, the Value Line DSCs enable scalability and migration to future designs. The devices integrate a 12-bit ADC capable of up to 2 Msamples/s, four PWM pairs with resolution down to 2 ns, and on-chip analog comparators with a 12-bit DAC. Communication interfaces include CAN FD, LIN, SENT, UART, SPI, and I2C.
To accelerate evaluation and development, Microchip offers the dsPIC33CK Value Line Curiosity Nano evaluation kit with an onboard debugger. The evaluation platform supports the Curiosity Nano base for Click Boards and a touch adapter board for touch applications. A motor control DIM is also available for rapid prototyping of motor control designs.
Value Line DSCs are available directly from Microchip, its sales representatives, or authorized distributors.
The post Value DSCs streamline embedded control appeared first on EDN.
RF tool captures reusable design workflows

Keysight’s RF Circuit Simulation Professional software now enables engineers to document their design workflow on an executable whiteboard. The software replicates design decisions while capturing simulations, optimizations, decision trees, and parameters derived from prior analyses. Each step generates editable Python code that can be saved, shared, replayed for design reviews, and redeployed across the Keysight Advanced Design System (ADS), Cadence Virtuoso, and Synopsys Custom Compiler environments with full design data traceability.

Design teams often face workflow inefficiencies, simulation bottlenecks, and knowledge-transfer challenges. Engineers can build workflows visually on an executable whiteboard while the software automatically generates corresponding Python scripts. The platform executes simulations, optimizations, and design decisions in sequence, with support for decision-based loops and parameter settings.
Each workflow becomes a repeatable methodology that can be shared across teams, reused, and driven by AI. Captured workflows help preserve RF design expertise while creating structured design data that can support future AI-driven automation and training. Design review and tapeout tasks that previously required manual configuration now execute automatically.
RF Circuit Simulation Professional
The post RF tool captures reusable design workflows appeared first on EDN.
Buck controller streamlines in-vehicle USB charging

Diodes’ APK43070Q synchronous buck controller integrates a USB Type-C PD 3.1 source controller, simplifying automotive single- and multi-port charging designs. Operating from a 4-V to 36-V input, it enables USB Type-C charging up to 140 W. The device supports USB extended power range (EPR) and adjustable voltage supply (AVS) up to 28 V, along with standard power range (SPR) and programmable power supply (PPS) up to 21 V.

The constant-frequency controller features integrated drivers, optimized dead time, and elevated gate drive voltage for efficient mid- to high-power charging using external N-channel MOSFETs. This allows flexible MOSFET selection to balance thermal performance and power loss. A VIN DC pass-through mode further improves converter performance by enabling the high-side MOSFET to act as the VBUS switch, eliminating the need for an additional output switch.
An I2C interface with a controller/target addressing scheme enables power sharing across up to eight USB Type-C ports via resistor selection without an external MCU. The APK43070Q also includes overvoltage, overcurrent, undervoltage, and thermal protection.
The APK43070Q is priced at $0.80 each in 1000-unit quantities.
The post Buck controller streamlines in-vehicle USB charging appeared first on EDN.
Low-noise USB scopes deliver 16-bit resolution

Pico Technology has launched the PicoScope 5000E series of USB-C oscilloscopes for analog, digital, and mixed-signal debugging. The four-channel scopes provide true 16-bit resolution with bandwidths to 200 MHz, sample rates to 2.5 Gsamples/s, and up to 1 GS of memory. PicoScope 5000E Plus models also offer a switchable 8-bit high-speed mode that raises bandwidth to 500 MHz, sample rates to 5 Gsamples/s, and memory to 2 GS.

With an ultra-low-noise front end, the oscilloscopes achieve a noise floor below 22 µV RMS and total harmonic distortion better than -73 dB. The resulting dynamic range helps reveal small-amplitude components, ripple, distortion, and other anomalies that lower resolution or noisier instruments can miss.

The compact, portable scopes connect to a host computer through a SuperSpeed USB 3.0 Type-C interface. For debug and validation, Pico 7 software provides more than 40 serial protocol decoders, advanced math channels, automated measurements including power analysis, multi-capture analysis, and measurement and mask limit testing. The Pico SDK supports custom application development using C, C#, C++, Python, MATLAB, and LabVIEW.
The PicoScope 5000E series is available in four-channel and 4+16-channel mixed-signal oscilloscope variants, with bandwidth options from 60 MHz to 500 MHz depending on model and operating mode. Units are sold through authorized distributors worldwide and directly from Pico Technology.
The post Low-noise USB scopes deliver 16-bit resolution appeared first on EDN.
Triply simply sequence supply voltages

This circuit design for power supply on/off sequencing uses Schmidt triggers for triple-positive-rail timing purposes.
Recent design ideas have explored the utility of timed power supply ON/OFF sequencing and provided circuit designs to implement it. Figure 1 shows a simple topology using Schmidt triggers for timing the turn ON and OFF of triple positive supply rails. Here’s how it works.
Wow the engineering world with your unique design: Design Ideas Submission Guide

Figure 1 This significantly simple supply sequencing scheme leverages Schmidt triggers.
Switching action begins with SPDT S1 in the OFF position which holds the C1 and C2 timing caps discharged. The latter holds U1 pin 1 at 15v and therefore its pin 2 and the NFET Q2’s gate at zero, forcing the 5Vout rail OFF.
Meanwhile, C1’s discharged state holds U1’s pins 3 and 5 low so pins 4 and 6 sit high. The former holds enhancement mode PFET Q1 and the 15Vout rail OFF, while the latter does the same for level shifter Q3, PFET Q4, and the 24Vout rail.
Therefore no power flows to the connected loads. Yet, at least. Figure 2’s left side graphs the sequence of events initiated by actuating S1.

Figure 2 This plot shows power sequence timing when S1 is flipped ON and later flopped OFF.
C2 connects to ground through R3, quickly charging it to the Schmidt trigger low-going threshold in about R3C2 = 1mS. This inverts U1 pin 2 to 15v, placing a net forward bias of 15 – 5 =10V on NFET Q2, turning it and the 5Vout rail ON. Thus they will remain as long as S1 stays ON.
Meanwhile, reset of C1 has been released, allowing it to begin charging through R1 + R3. The first thing that happens occurs at the end of T1 when U1 pin 3 reaches the ~9V Schmidt threshold. Since the timeout duration is proportional to C1, any desired interval can be chosen with an appropriate RC product. U1 pin 4 then snaps low, PFET Q1 turns ON and 15Vout goes active.
Of course C1 continues to charge, so at T2 U1 pin 5 also reaches its triggering threshold. Then its pin 6 snaps low, turning ON Q3, Q4 and 24Vout. The ratio R4 = 10 R5/(15 – 0.7) was chosen to apply an adequate and safe ~10V drive to Q4’s gate, independently of 24Vin. The S1 flip ON sequence is now complete.
The right side of Figure 2 shows what happens when S1 subsequently flops OFF. First, C1 is promptly discharged through R3, turning OFF Q1, Q3, Q4 and thereby 15Vout and 24Vout, putting them and whatever they power to sleep. Meanwhile C2 begins ramping up, taking T3 to get to U1’s threshold. When it completes the trip, pin 2 goes low, turning Q2 and 5Vout OFF.
Turnoff sequencing is therefore complete. Nighty night.
Details of the design include D1 and D2. Their purpose is to make the sequencer’s response to losing and regaining of the input rail voltage orderly, and to do it regardless of whether S1 is ON or OFF. If S1 is OFF, then all output rails remain low and (a safe) nothing occurs when the supply voltages return. If it’s ON, then a normally timed (and therefore safe) power-up sequence is executed.
Note that the MOSFETs should be chosen for adequate voltage and current handling capacities. Because Q1 has 15v of gate drive and Q2 and Q4 get 10v, none need be sensitive logic-level types.
Okay. But what if you also need to sequence a negative supply rail? Figure 3 shows how.

Figure 3 This power switching circuit works with a negative rail.
When the U1 inverter’s input rises above the Schmidt trigger voltage, its output snaps low, causing the 2N3906 to pass Ic = (+15Vin – 0.6)/15k = 0.96mA. This develops a 10.6V that’s independent of –Vin across the 11k resistor, saturating the NFET. If symmetrical polarity rails (e.g. +/-15v) are needed, Figure 3 can be added to Figure 1 to provide the negative side with no other modifications required.
Stephen Woodward‘s relationship with EDN’s DI column goes back quite a long way. Over 200 submissions have been accepted since his first contribution back in 1974. They have included best Design Idea of the year in 1974 and 2001.
Related Content
- Silly simple supply sequencing
- Single switch controls sequential operation of multiple power supplies
- Short push, long push for sequential operation of multiple power supplies
The post Triply simply sequence supply voltages appeared first on EDN.
Ruggedized connectors: Not necessarily big or bulky

Ruggedized connectors are usually associated with military/aerospace, industrial, and some medical applications, but there are consumer ones as well, in special circumstances. Of course, the phrase “ruggedized connector” invokes different requirements in different circumstances.
In brief, it’s the ability of the connector to endure and consistently function to specifications despite extreme mechanical, environmental, and thermal stresses. These stresses differ depending on the operating conditions but often have overlap as well. For example:
- Connectors in land-based military systems must handle severe vibration, dirt accumulation (dust, sand, grit), and cold and heat extremes.
- Seaborne interconnects must withstand prolonged exposure to corrosive saltwater; deep-sea ones must also withstand crushing pressure.
- Aerospace applications must tolerate repeated take-offs, landings, and in-flight vibrations in addition to wide temperature ranges.
- Space applications have more extreme temperature swings, vacuum exposure and outgassing, and intense mechanical stress during launch and re-entry.
- Industrial applications often need to function despite vibration, shock, dirt, grease, abuse, and even neglect.
- Some consumer-facing applications such as vending machines, commercial washers/dryers, arcade games, and elevators/escalators also need ruggedized attributes; it’s a surprisingly long list here.
Meeting these requirements involves an understanding of multiple factors, including:
- Vibration: connectors in military vehicles or fighter jets are tested to resist forces up to 20 g.
- Shock: a high-impact force during rapid acceleration or deceleration is a distinct from vibration. It can be as high as 50 g for standard connectors and 100 g for nano and micro designs.
- Temperature extremes: ground-based systems may see temperatures ranging from -65°C to +125°C while space systems can go as high as 200°C.
- Sealing and ingress protection: connectors may need to be protected against exposure to moisture, dust, and contaminants to ensure long-term operation using sealing solutions such as O-rings, gaskets, and grommets.
- Corrosion: it’s caused by exposure to moisture and salt spray, leading to oxidation.
Deciding on a ruggedized connector requires attention to two broad design issues: the body or shell, and the electrical contacts.
For the body or shell, vendors and users consider what it’s made of, how it mates, retention and locking, and more. For this reason, rugged connectors are often associated with relatively bulky form factor, locking rings, and similar; but this is not necessarily the case.
For the contacts, ruggedized connectors also have sophisticated, specially designed and fabricated contacts that use suitable base metals and are clad with advanced plating to withstand and maintain contact despite the challenges. The contact pairs are often based on a multipoint design with two or four mating surfaces for redundancy, rather than a single mating point.
Start with a classic
One widely used choice for a ruggedized connection is the classic D-subminiature connector. If you think that the classic 9-pin D-subminiature connector (often called DB-9) and the rest of the broader family of D-subminiature connectors have largely disappeared due to the fading away of the “ancient” RS-232 interface—along with the rise of various versions of USB and Ethernet connectors—that’s not the case at all.
The D-sub form factor has been in use since the 1950s and still offers many advantages. It’s fully shielded against EMI/RFI and provides a sealed or nearly sealed enclosure. And it’s mechanically rugged, and its mating halves can be locked to each other with small jackscrews or other arrangements. This class of connectors is still widely used due to their flexibility, integrity, track record, and wide variety of models and versions. It’s so good that it is widely used in mil/aero and space-related designs.
This connector is offered in six basic standard-size bodies, but that is only part of its versatility. It also offers flexibility in its electrical contact positions and types.
In addition to offering connector shells with the same contact type at all positions, “Combo-D” D-subs such as those from Amphenol Positronic provide a mix of independent signal and power contacts within the connector shell (Figure 1). A single D-sub can support multiple signal contacts, power contacts, and more in a variety of mix-and-match arrangements. There are available contacts for signal, power, shielded, high voltage, thermocouple, and even fiber-optic applications.


Figure 1 The Combo-D subminiature connector style supports many signal- and power-path combinations (upper); these combinations are available in standardized, named shell sizes and contact arrangements (lower). Source: Amphenol Positronic
Among the material options for the shell are:
- Thermoplastic polymers offer excellent mechanical strength, thermal resistance, and chemical stability. These materials effectively absorb vibration and shock in a low-weight structure.
- Composite materials such as fiberglass-reinforced polymers and carbon fiber composites provide excellent strength-to-weight ratios. They can be engineered to maximize specific properties such as tensile strength, impact resistance, or thermal stability.
- Metal enclosures of stainless steel and aluminum alloys are preferred materials for connector housing in the high-shock, high-vibration, and high-EMI environments of aerospace and defense applications.
The virtues of the sub-D shell—or any ruggedized housing—are an important part of the connector story, but they are only half of the ruggedness reality as the electrical contacts and their attributes are also critical. Over the years, there have been many innovations in contact technology with respect to materials, design, and electrical and mechanical performance.
For example, Amphenol Positronic uses its patented PosiBand contact technology (U.S. Patent 7,115,002) in one of its D-sub families. This contact has a unique approach to provide enhanced performance, where its external pressure-element design fully separates the mechanical action from the electrical action of the connection (Figure 2).

Figure 2 The PosiBand uses a patented design to separate the mechanical action and the electrical action of the connection. Source: Amphenol Positronic
The pressure element performs the mechanical action by applying a force pressing the male pin against the inner female cavity, achieving electrical connection along a long line of direct contact. Among its many subtle but important attributes is the spring clip within the PosiBand; it’s a small but critical part of the assembly and a key contributor to its vibration/shock performance (Figure 3).


Figure 3 The PosiBand spring clip provides a normal force across the contact area and so maximizes the electrical mating-surface contact area. Source: Amphenol Positronic
This spring-tempered beryllium copper alloy provides a normal force on the male contact, contributing to a rugged and reliable contact pairing. At the same time, it offers a lower average insertion force while meeting or exceeding performance requirements.
Consumer connectors get a little more rugged, too
The recent European initiative mandating use of USB-C for many classes of consumer end products is a major factor driving the use of this connector. Due to the wide availability of USB-C connected functions and peripherals, it seems logical that the connector and associated standard would be worth considering for medical, industrial, and other non-consumer appliances.
But there’s a problem with USB-C connectors: they are not rugged or sealed against intrusion, yet that’s where many may be used beyond low-end consumer applications.
Addressing this concern, Same Sky has introduced the UJ family of waterproof USB receptacles with IPX5, IPX6, IPX7, IPX8, IP66, IP67, and IP68 ratings, making them well-suited for applications where moisture and environmental contaminants are a concern (Figure 4). If you are not familiar with Same Sky, it was known as CUI Devices until it changed its name in September 2024.

Figure 4 These USB Type C connectors from Same Sky (formerly CUI Devices) feature water/dust intrusion-resistant O-rings to meet multiple IP ratings. Source: Same Sky
The five models are compatible with reflow soldering due to their UV-glued O-rings. This simplifies the PCB assembly process, as there is no need for a separate wave-soldering step (as is often the case with connectors and other larger components).
The five IP-rated USB Type C receptacles conform to a variety of USB standards, from USB 2.0 up to USB 4.0 Gen 3×2, with data-transfer speeds up to 40 Gbps as well as power delivery up to 240 W at 48 V and 5 A. The family also includes power-only models that remove the data-transfer pins to create a more cost-effective solution for designs where charging or power is the sole needed function.
If you are looking for a ruggedized connector, you have these and many other options. The first challenge is defining what you mean by “ruggedized” in your application beyond number and type of contacts and then pick which available connectors meet those criteria.
Maybe AI can help make the selection?
Bill Schweber is a degreed senior EE who has written three textbooks, hundreds of technical articles, opinion columns, and product features. Prior to becoming an author and editor, he spent his entire hands-on career on the analog side by working on power supplies, sensors, signal conditioning, and wired and wireless communication links. His work experience includes many years at Analog Devices in applications and marketing.
Related Content
- Consumer connectors get ruggedized
- Meeting the ‘Rugged Design’ Challenge
- USB-C and Power Delivery: Too much of a good thing?
The post Ruggedized connectors: Not necessarily big or bulky appeared first on EDN.
One-shot surge protection

This moral of this story: any promises of protection and safety should be double-checked for validity.
Surge (over-voltage) protection is a rather frequent function, but implementing it robustly is not a very simple task. Recently, I stumbled upon a device called an “Extension Lead With Surge Protection“. I already owned several other gadgets from the same manufacturer, and they were by and large OK. “Why not?” was my thought, so the new gadget I also bought.
Wow the engineering world with your unique design: Design Ideas Submission Guide
My initial testing involved connecting the device to AC outlets both with and without ground. The gadget correctly identified both of these configurations, which was good, but I wasn’t yet done.
The gadget also promised protection from surges up to 2000 Volts (the normal AC voltage is 220V here where I live). This protection was its main merit, so of course I had to check this feature as well. I did so with the simple circuit shown in Figure 1.

Figure 1 This simple circuit supports confirmation of valid (or not) surge voltage protection.
The circuit produces a DC voltage of roughly double the input AC amplitude, approximately 600V in this case. The DC output shouldn’t be considered a shortcoming! The values of resistors R1 and R2, and diode Z1 (a 200V Zener diode in this case) need to be recalculated if your AC outlet voltage isn’t 220V.
The circuit also includes a LED which will illuminate only when this doubled voltage really is present on the gadget’s output. The LED should be bright enough for a current of ~ 1mA or less. Should I warn you here to beware of high voltage; to be cautious and not to connect any inappropriate load to the circuit? That said, I’ll continue the story.
I connected an AC/DC voltmeter to the output of the “Extension Lead with Surge Protection”, while its input was connected to the output of the circuit. The voltmeter showed 600+ volts! The gadget was simply translating its input to the output without any high voltage detection, far from protection.
To figure out what had gone wrong, I had to dismantle the gadget, which was not a simple task, as as it turned out. The screws required a very specific bit in order for them to be unscrewed. At this point, I prepared to see something interesting inside, and indeed there was!
The circuit within had several transistors to detect unconnected ground, which I’d already confirmed worked. It also had two varistor/thermal switch pairs, in thermal contact. Unfortunately, these thermal switches were only single-tasked! Being one-shot fuses, they could protect the load only once, leaving it permanently disconnected afterwards. “One-shot Surge Protection” would have been more accurate.
It seems that the designers realized this fault too late, so they instead connected the output of the gadget directly to its input, bypassing and completely disabling any surge protection in the process! My disappointing purchase had transformed into an interesting project, enabling me to re-enable the gadget’s protection again, albeit on a one-time-only basis.
—Peter Demchenko studied math at the University of Vilnius and has worked in software development.
Related Content
- Four key surge protection methods for RF designs
- Top 10 circuit-protection devices
- Gas discharge tubes (GDTs): From sparks to circuit protection
The post One-shot surge protection appeared first on EDN.
The firmware-hardware handshake in a silicon governance system

Design-time closure is no longer the end of system convergence.
In modern AI silicon—encompassing chiplet-based platforms, high-bandwidth memory systems, and advanced heterogeneous packages—the realized system continues to change after release. Workloads shift. Voltage and thermal conditions move dynamically. Network-on-chip (NoC) traffic patterns vary. Memory pressure changes. SerDes links retrain. Aging accumulates. Package and board environments influence behavior over time.
A system may pass design signoff, validation, and qualification, yet still encounter runtime states that were not fully represented during design-time closure. This does not mean the original design was wrong. It means the operating system has entered a lifecycle regime where hardware state, firmware response, and evidence maturity must remain synchronized.
This is where the firmware–hardware handshake becomes important. Hardware senses the condition; firmware executes bounded actions; and governed evidence determines whether the action is valid.
The handshake is not an uncontrolled autonomous loop. It’s a disciplined runtime structure that connects hardware telemetry, firmware policy, causality interpretation, bounded action envelopes, rollback limits, and lifecycle evidence.
In this viewpoint, firmware is not the intelligence. Firmware is the bounded execution layer. The intelligence is in the governed interpretation of evidence: whether a signal is mature enough, synchronized enough, causally grounded enough, and safe enough to support action.
From observability to action
In complex AI silicon, observability is expanding rapidly. NoC counters, voltage monitors, thermal sensors, ECC logs, accelerator stall indicators, memory-controller events, SerDes retraining records, clock-domain telemetry, firmware traces, and package-level sensors can all provide valuable runtime information.

Here is how the firmware–hardware handshake layer works in governed runtime convergence. Source: Author
Hardware telemetry is captured, normalized into evidence, checked for admissibility, evaluated for causality, and passed through bounded firmware policy before any runtime action is executed and recorded as lifecycle evidence. But telemetry alone does not create authority.
An NoC latency spike may correlate with workload congestion, but it may also reflect a localized thermal hotspot, voltage droop, memory backpressure, firmware scheduling behavior, or package-level power delivery instability. A SerDes retraining event may indicate channel degradation, but it may also be triggered by temperature drift, reference-clock behavior, board-level noise, connector variation, or power integrity disturbance.
The runtime system therefore faces a difficult question: When should firmware act?
If firmware acts too slowly, the system may lose performance, reliability, or availability. If firmware acts too aggressively, it may create instability, hide root cause, or trigger unnecessary throttling, rollback, or degraded operation. If firmware acts on weak evidence, it may correct the wrong problem.
This is why runtime telemetry must mature into governed evidence before it’s used to drive consequential action.
Hardware as sensing layer
Hardware provides the first layer of runtime awareness.
Examples include NoC latency, congestion, retry, and utilization counters; voltage droop sensors and current monitors; thermal sensors and hotspot indicators; memory-controller stalls and ECC events; SerDes equalization, retraining, and link-margin information; accelerator utilization and stall counters; clock, reset, and power-state telemetry; and package, board, and system-level sensor data.
These signals provide visibility into how the system behaves under real workload and environmental conditions.
However, hardware signals are not self-explanatory. They must be interpreted in context. A voltage droop event means something different during peak AI workload than during idle transition. A thermal hotspot means something different if it is stable, spreading, oscillating, or correlated with a specific workload pattern. An NoC stall means something different if it aligns with memory saturation, power throttling, package temperature, or firmware scheduling.
The key point is simple: Hardware can sense state, but it does not automatically explain state. And that explanatory layer requires causality, evidence maturity, synchronization, and decision context.
Firmware as bounded execution layer
Firmware is the natural runtime bridge between hardware state and system response. Depending on the platform, firmware may be able to adjust voltage and frequency states, throttle selected regions, retrain high-speed links, reduce lane rate or link width, isolate a tile or accelerator block, migrate workload away from a stressed region, change scheduling policy, request diagnostic capture, enter deterministic degraded mode, or trigger service and validation escalation.
These actions are powerful because they allow the system to respond before a condition becomes a failure. But that power also creates risk.
Firmware should not become an unconstrained autonomous agent. A firmware action can affect performance, lifetime, reliability, customer experience, safety margin, and debug visibility. If firmware changes the operating state without traceable evidence, the system may appear to recover while the underlying cause remains unresolved.
One of the risks of adaptive firmware is that it can unintentionally hide the physical root cause. A system may appear stable because a link retrained, a frequency state changed, a workload migrated, or a region was throttled. But if the intervention is not tied to a normalized evidence record, the original cause may disappear from view. In advanced systems, repeated compensation can become a failure mode of its own.
The purpose of the firmware–hardware handshake is therefore not only to act, but to preserve the evidence trail behind the action. In other words, the correct role of firmware is not unlimited control. The correct role is bounded execution.
Firmware should execute only within approved policy limits, with clear evidence requirements, confidence thresholds, rollback rules, and auditability.
The handshake model
The firmware–hardware handshake can be described as a governed runtime sequence:
Hardware state → contextual capture → normalized evidence → admissibility check → causality assessment → firmware policy → bounded action → updated evidence → lifecycle record
Each step prevents runtime telemetry from becoming uncontrolled action.
First, the hardware signal must be captured with context: timestamp, workload class, physical location, power state, thermal state, firmware version, configuration state, and system region. Second, the signal must be normalized into an evidence object. A raw sensor reading or counter value is not enough. It must be linked to the specific system condition it describes.
Third, the evidence must be checked for admissibility. Is the timestamp valid? Is the firmware version known? Is the sensor calibrated? Is the workload context synchronized? Is the signal consistent with voltage, thermal, memory, package, and board evidence? Is the proposed cause physically plausible?
Fourth, firmware action must remain inside a bounded envelope. The system may allow a defined frequency reduction, limited link retraining, controlled workload migration, or temporary degraded mode. But if evidence confidence is low or the action exceeds policy authority, escalation is required.
Finally, the outcome must be recorded. Did the action stabilize the system? Did the same condition recur? Did the event indicate a one-time workload excursion, a design margin issue, a package-related sensitivity, or an aging trend?
This is how runtime action becomes lifecycle evidence.
Bounded action envelopes
The bounded action envelope is the core safety mechanism. It defines what firmware may do, under what evidence conditions, and with what limits. For example, a firmware policy may allow temporary throttling if thermal evidence is mature, localized, and correlated with workload.
It may allow link retraining if signal-margin evidence crosses a defined threshold. It may allow workload migration if a tile shows repeated voltage-droop sensitivity under known conditions. It may allow deterministic degraded mode if full performance cannot be preserved without violating reliability boundaries.
But the same policy may block action when evidence is incomplete. If an NoC latency spike occurs without synchronized voltage, thermal, workload, and memory context, firmware should not automatically classify the NoC as the root cause.
If a link repeatedly retrains after thermal cycling, firmware should not hide the event indefinitely by retraining silently. If a voltage-droop event becomes recurrent under a specific package lot, board lot, workload class, or thermal condition, the system should escalate the event instead of silently compensating through repeated firmware action.
Bounded action does not mean passive behavior; it means disciplined behavior. The system can respond, but it must respond within governed limits.
Extending convergence into runtime
The handshake extends governed convergence beyond design-time. At design-time, engineers close the system against modeled requirements, simulated margins, validation data, and qualification evidence. At runtime, the system encounters real workload, real aging, real environment, and real variation.
The firmware–hardware handshake allows convergence to continue operationally. Several runtime concepts become useful here.
- A boot-time realization baseline can capture the initial measured system state at startup. This provides a reference for later drift.
- A corridor stability index can summarize the health of a specific governed path, such as an NoC region, power domain, HBM interface, SerDes path, or package-to-board corridor.
- A global convergence epoch can ensure that telemetry from multiple runtime sources is compared within a valid synchronization window.
- Realization fatigue tracking can monitor accumulated stress, repeated throttling, retraining frequency, thermal exposure, voltage events, or degradation patterns.
- A deterministic degraded mode can preserve safe operation when full performance is no longer evidence-supported.
These concepts are not meant to add vocabulary for its own sake. They define how runtime signals can be organized into a governed system state rather than scattered logs.
Why this matters for AI silicon
AI workloads are especially relevant because they stress systems dynamically and unevenly.
A training or inference workload may create localized NoC congestion, memory pressure, power spikes, or thermal concentration. The system may remain within global specifications while a local region experiences repeated stress. A package or board condition may interact with workload behavior in ways that were not fully visible during nominal validation.
In such systems, the firmware–hardware handshake becomes a reliability and performance tool. It allows the platform to distinguish between transient workload variation, recurring physical sensitivity, firmware scheduling artifacts, marginal power delivery behavior, thermal containment issues, aging-related degradation, validation escapes, and package or board interaction.
The goal is not to blame the NoC, firmware, package, power delivery network (PDN), memory, board, or workload too early. The goal is to preserve causality until the evidence is mature enough to support a decision.
Relationship to fleet learning
Runtime evidence becomes even more valuable when it’s aggregated across systems, products, lots, platforms, and field conditions. This is where fleet learning enters the picture.
Fleet learning becomes valuable when repeated runtime patterns appear across systems, lots, boards, packages, workloads, or field environments. A recurring SerDes retraining signature after thermal exposure may indicate a package, board, connector, or policy sensitivity.
A workload-specific droop pattern across a defined power domain may inform future PDN design or validation coverage. A degradation signature that appears after a thermal-cycle threshold may reshape future qualification assumptions.
But these patterns should not automatically rewrite firmware policy. Field data should not autonomously change system behavior, alter operating limits, or modify release criteria. Fleet learning recommends and bounded gate authority approves. This preserves the difference between learning and governing.
Physical state and bounded action handshake
The firmware–hardware handshake is becoming a necessary part of advanced system realization.
As AI silicon, chiplets, HBM platforms, high-speed interconnects, and advanced packages become more dynamic, design-time closure alone cannot cover every runtime state. Hardware must sense. Firmware must respond. But the response must remain bounded by evidence maturity, causality, synchronization, rollback limits, and lifecycle governance.
So, the future system will not be defined only by better telemetry or more autonomous firmware; it will also be defined by a disciplined handshake between physical state and bounded action.
In SEGA-AI terms:
- Observability provides signals
- Admissibility qualifies evidence
- Bounded firmware action preserves convergence
- Fleet learning refines the next lifecycle decision
The system does not remain trustworthy because it can sense everything. It remains trustworthy when it knows which signals are mature enough to act on.
Dr. Moh Kolbehdari is senior director of IC/packaging at Socionext US.
Editor’s Note
This is Part 2 of the article series about silicon governance framework for AI silicon. Part 1 described why data movement alone cannot explain system behavior in modern AI chip designs.
Related Content
- UVM Reactive agents verify with a handshake
- Development tool evolution – hardware/firmware
- Hardware Root of Trust Essential for AI Chip Integrity
- What you need to know about firmware security in chips
- Hardware Verification: What AI Gets Right When It Generates Your Testbench
The post The firmware-hardware handshake in a silicon governance system appeared first on EDN.
HIL platform automates tests to validate hardware behavior

A new hardware-in-the-loop (HIL) testing framework claims to make automated, hardware-validated testing accessible to every team by offering engineering resources previously available only at large enterprises. This new testing framework—called BootLoop Test—unifies bench, continuous integration (CI), and end-of-line validation on a single platform.
Though HIL testing is one of the most valuable practices in the hardware world, it’s mostly adopted without any rigorous testing infrastructure. That’s because building a hardened HIL framework requires dedicated test engineers, months of custom development, and specialized skills that most firmware teams don’t have.
Consequently, many companies either forgo testing entirely or rely on ad hoc scripts and manual validation processes. That, in turn, slows development cycles, misses errors, and causes fragile release processes.
BootLoop, a startup that provides an AI platform for firmware and embedded development, addresses this problem by offering a complete HIL platform that spans the entire embedded product lifecycle. As a result, a hardware company can go from zero testing infrastructure to a fully automated pipeline in days.
“Most hardware companies know they need more rigorous firmware testing,” said Noah Pacik-Nelson, CEO of BootLoop. “They just don’t have the time or the tools. We built BootLoop Test, so they don’t have to choose between shipping quickly and shipping robust code.”

The HIL test platform helps teams to create a fully automated pipeline in days. Source: BootLoop
BootLoop’s agent ingests PCB design files and component datasheets to automatically generate tests that validate real hardware behavior down to the register level. The agent connects to serial monitors, debuggers, and test equipment to iterate until the code runs clean. So, test teams can go from zero testing infrastructure to a full CI pipeline on real hardware in hours by using a single command install.
BootLoop—a Y Combinator company founded by SpaceX and MIT Media Lab engineers—covers the entire embedded development lifecycle, including development, testing, and debugging. The company was founded in 2025 and is based in San Francisco.
Related Content
- HIL simulation boosts automotive design efficiency
- Firmware Testing Still Falls Short of What We Need
- Hardware-in-the-loop or digital twin: Use one, both, or neither?
- Accelerate Automotive Dev Time: Fill Hardware-in-the-Loop Gaps
- Hardware-in-the-loop testing for electric vehicle drive applications
The post HIL platform automates tests to validate hardware behavior appeared first on EDN.
TP-Link’s Tapo P105: A Kasa EP10 clone, or evolutionarily derived?

Two devices. Same manufacturer. Similar cosmetics. (Near-)identical dimensions. Different branding. What about the insides?
After taking a month’s break from the TP-Link smart plug family teardown cadence, I’m back for more. This time, we’ll be looking inside the Tapo P105, one member of a four-pack, to be exact.

Back in early December, I’d noted that it was a “seeming Tapo equivalent to the Kasa EP10”, which I’d subsequently dissected for early March publication, and indeed there are many similarities between them:
- The Kasa EP10 has published dimensions of 2.36 x 1.50 x 1.21 in (60 x 38 x 33 mm), while those of the Tapo P105 are near-identical (in imperial units, that is, identical in metric): 2.4 × 1.5 × 1.3 in (60 × 38 × 33 mm)
- They both support switching load currents of up to 15 A
- And they both support Amazon (Alexa), Google (Assistant and Gemini) and Samsung (SmartThings) smart device protocols, in addition to company-proprietary schemes.
The last bit of that last bullet, however, is indicative of a minor-at-least deviation between them. The earlier device was the Kasa EP10; this one’s the Tapo P105. Once again requoting my early December piece, appropriately titled “Tapo or Kasa: Which TP-Link ecosystem best suits ya?”:
“Kasa” was TP-Link’s original smart home device brand, predominantly marketed and sold in North America. The company, for reasons that remain unclear to me and others, subsequently, in parallel, rolled out another product line branded as “Tapo” across the rest of the world. Even today, if you visit the “smart plugs” product page on TP-Link’s website, you’ll see a mix of Kasa- and Tapo-branded products. The same goes for wall switches, light bulbs, cameras, and other TP-Link smart home devices. And historically, you needed to have both mobile apps installed to fully control a mixed-brand setup in your home.
Fortunately, TP-Link has made some notable improvements of late, from which I’m reading between the lines and deducing that a full transition to Tapo is the ultimate intended outcome. As I tested and confirmed for myself just a couple of days ago, it’s now possible to manage both legacy Kasa and newer Tapo devices using the same Tapo app; they also leverage a common TP-Link user account…They all remain visible to Alexa, too, and there’s a separate Tapo skill that can also be set up…along with, as with Kasa, support for other services.
A perusal of the outside cosmetics also reveals some differences. The Kasa EP10’s status LED is integrated within the left-side-located multi-function on/off, pairing and reset switch:

whereas the Tapo P105’s status LED is in the top-left corner of the front panel, with the left-side switch now non-illuminated:

The illumination locational variance between the two devices presumably results in at least some internal-layout deviance between them, but what about the building-block components themselves? Reiterating what I’ve asked before in similar teardown comparison projects, how different (if at all) are these two product generations from a hardware standpoint, versus TP-Link relying solely on software-only differentiation schemes? Let’s find out.
I’ll start with a conceptual internal view to whet your appetite:

As mentioned previously, today’s patient was sourced from a four-pack that I’d acquired during a 2025 Thanksgiving-week Amazon Warehouse-now-Renewed promotion for $18.06 ($25.80 minus 30%). I’ll start with some outer box shots, as usual accompanied by a 0.75″ (19.1 mm) diameter U.S. penny for size comparison purposes.


The “US/1.26” bit in the upper right corner of the product label in the following photo, based on my past experiences with TP-Link gear, is suggestive of hardware v1.26 inside the box. I’ve mentioned before both the company’s tendency for hardware-iteration profusion and the inter-version compatibility problems that can result from it. That said, the Tapo P105 product page on TP-Link’s website lists only hardware versions v1 and v1.2 (but not v1.26) for both the one- and four-pack bundle variants. Dive into the product support page, on the other hand, and four to-date hardware versions are listed there (none of them v1.2, ironically):
- v1
- v1.26 (mine)
- v1.60, and
- v1.80
So…

Onward…



Time to dive inside…

The first things I found were a piece of protective foam, a slip of quick-start literature (PDF), and a small sheet of clear plastic.

What I subsequently realized was that the latter was normally folded in thirds and wrapped around two of the smart plugs. Its sibling was still in place, thereby tipping me off that (at least) one of the two lower devices in the box was removed (and presumably tried out) pre-return by the original purchaser.

I went with the one in the lower left corner as my dissection victim. Front:

Left side (and upside-down, I subsequently realized):

Back (note the screw head; hold that thought):

Right side (once again upside-down, too):

Top:

And last but not least, the most informative of the lot, the bottom (the penny’s temporarily taped in place from underneath, in case you were wondering):

There’s that US/1.26 notation again, along with the always useful FCC ID (2AXJ4P105):
Remember that screw head I noted earlier? Buh-bye:


I’ve taken apart a few of these devices’ cases by now, so I’ve figured out how to do so without maiming myself like I did the first time (and yes, I realize I’ve just jinxed myself by writing this):



Mission accomplished.

And now for the perspectives you all care about:
The switch, as noted before, is still on the left side:
but whereas with the Kasa EP10, it had been mounted to the same mini-PCB that contained the system SoC:
it’s now standalone, with the mini-PCB lodged in one corner, as already suggested by the earlier-shown conceptual teardown image and presumably to improve wireless connectivity:
The SoC itself is also evolved, from the Realtek RTL87210 to the same dual-core RTL8720 (PDF) found in the Kasa EP25, whose teardown was published in late March.
Note once again the presence of an antenna connector on the module, not used in this particular system implementation.
A relay merry-go-roundOnce again on the right side is the blue-colored relay:
this time a Churod A16-V-105DA2F (PDF):
Top and bottom side perspectives follow, for your “edumacation” purposes:
And alas, as with its TP-Link-developed predecessors, I was unable to share with you any perspectives of the PCB backside, although as you might be able to tell from the glimpses in the following shots, there’s not much there to share anyway.
As usual, the FCC certification documentation provides additional visual insights.
And that’s “all” I’ve got for you today! Next up in the TP-Link smart plug dissection series, again as I initially alluded to back in December, I plan to tear down the Tapo P125, which builds on the Tapo P105 foundation with Apple HomeKit (now Apple Home) “smart” support. It’s akin to the earlier Kasa EP10-to-EP25 transition, albeit absent added energy monitoring features this time. Until then, and as always, I welcome your thoughts in the comments!
—Brian Dipert is the associate editor, as well as a contributing editor, at EDN.
Related Content
- Tapo or Kasa: Which TP-Link ecosystem best suits ya?
- TP-Link’s Kasa EP10: If at first it doesn’t connect, buy, buy again
- TP-Link’s Kasa HS103: A smart plug with solid network connectivity
- TP-Link’s Kasa EP25: Energy monitoring for a hoped-for utility bill nose-dive
The post TP-Link’s Tapo P105: A Kasa EP10 clone, or evolutionarily derived? appeared first on EDN.
The pulse of power: Mastering the PWM relay

Imagine a component that combines the heavy-duty muscle of a power relay with the surgical precision of a digital signal. That is the essence of a pulse width modulation (PWM) relay. While traditional switches are often strictly binary, the integration of pulse width modulation allows engineers to go beyond simple “on-off” control, enabling significant power savings and reduced heat signatures.
The “PWM relay” myth
While high-speed switching is often associated with the solid-state relay (SSR), the real magic happens when applying these pulses to a standard electromechanical relay (EMR). By modulating the “hold current” of an EMR coil, you can prevent overheating and drastically extend the life of your hardware. Whether you are managing automotive solenoids or optimizing industrial control panels, understanding the synergy between PWM and EMR is the key to transforming a basic mechanical switch into a sophisticated, energy-efficient power management tool.
However, if you head to an electronics distributor, looking for a “PWM relay,” you will likely hit a dead end. You cannot easily buy a dedicated PWM-enabled or PWM-driven EMR off the shelf because PWM is not a physical feature of the relay itself; it’s a control strategy applied by the external circuit.
To achieve this, you typically need a devoted relay driver or a microcontroller to manage the signal. By sending a high-frequency pulse to a standard, inexpensive EMR, you effectively turn a “dumb” mechanical switch into a “smart” energy-saver. While an SSR is natively capable of high-speed switching for load modulation, using PWM with a traditional EMR is specifically about optimizing the coil’s efficiency, allowing you to reap the benefits of mechanical isolation without the drawback of a roasting-hot solenoid.
The “holding current” tweak
Nowadays electromechanical relays are widely used across automation systems because they enable a low-power signal to control a high-power circuit. Yet, the conventional method of relay operation is relatively energy-intensive, often producing excess heat and demanding a sizeable power supply. In practice, energizing a relay requires more power than simply holding it in the active state.
This opens the door to efficiency gains: by applying pulse width modulation to the coil’s holding current, we can reduce the duty cycle and thereby lower the average current. The result is decreased power consumption, less heat generation, and improved thermal management—particularly valuable in applications that employ banks of relays.
As a quick design example, begin by switching the relay driver MOSFET fully on to apply voltage to the coil for at least 100 ms. During this initial energizing phase, set the duty cycle to 100% to ensure the MOSFET is fully on, and the relay pulls in reliably.
Once the relay is engaged, transition to PWM control with a reduced duty cycle—say 50%—to sustain the relay state while cutting power consumption. This approach maintains functionality while significantly lowering average current draw, reducing heat, and improving overall efficiency.

Figure 1 Basic schematic illustrates PWM control for lowering relay coil holding voltage. Source: Author
As an aside, while current is the physical mechanism at play, “holding voltage” is a very common industry term because engineers often think in terms of the voltage applied to the circuit.
Practical switching: EMRs and PWM
On the workbench, additional considerations arise when using PWM to drive EMRs.
In conventional relay designs, the nominal coil voltage must be continuously applied to keep the relay energized, which reduces overall energy efficiency. By contrast, PWM-driven relays can operate with reduced effective coil voltage, significantly lowering power consumption, an advantage in energy-conscious applications.
PWM drivers regulate the effective voltage by adjusting the duty cycle of a DC signal at a fixed frequency. A quick note: Duty cycle is usually given as a percentage, while duty ratio is the same concept expressed as a fraction. Relay coils, being inductive, respond to duty-cycle transitions with current fluctuations. The resulting ripple depends on coil inductance, suppression circuitry, PWM frequency, voltage level, and duty cycle.
Best practice is to begin with a 100% duty cycle until the relay pulls in and stabilizes. The required time varies with relay type and excess voltage but typically falls between 100–500 milliseconds. Afterward, the duty cycle can be reduced to maintain holding current.
Higher PWM frequencies reduce ripple, allowing lower effective coil voltages while keeping other parameters constant. Frequencies in the 20–100 kHz range are generally recommended. Since effective coil voltage equals the product of supply voltage and duty cycle, tight regulation is essential. Even small supply variations demand rapid duty-cycle adjustment—within a few milliseconds—to prevent the effective voltage from dropping below the relay’s minimum requirement.
For reliable performance, coil current must always exceed the holding current plus a margin for shock and vibration. If current falls below this threshold, the armature may release, causing repeated pull-in cycles. Such instability can lead to humming noise, unintended contact opening under load, or even contact welding.
Notably, an increasing range of EMRs now support PWM-regulated holding currents to improve thermal management and efficiency. By modulating the duty cycle once the armature is seated, these relays minimize steady-state power dissipation. The Omron G2RL-1A-E-PW1 exemplifies this trend, featuring a coil architecture optimized for PWM and reduced-voltage holding.

Figure 2 The G2RL-1A-E-PW1 relay utilizes PWM control to minimize coil power consumption and heat. Source: Omron
What is more, dedicated PWM current controllers like DRV110 and DRV120 are specifically engineered to optimize relay and solenoid operation through precise waveform regulation. These ICs rapidly ramp the current to a peak level to ensure the plunger or contactor fully seats.
Once actuation is confirmed, they transition to a significantly lower hold current, which maintains the magnetic field while drastically reducing power dissipation. By managing this peak-to-hold transition automatically, these controllers prevent thermal overhead and extend the operating life of the inductive load.

Figure 3 A prewired DRV120 module empowers makers and experimenters to slash relay power consumption by automatically transitioning from pull-in to hold current. Source: tindie
Clever pulses never stop
Where does this leave us? Whether through basic RC mechanisms, dedicated integrated solutions, or the efficiency gains of PWM applied to electromechanical relays, engineers have a wide range of proven strategies to reduce relay energy consumption.
This is more significant nowadays in the era of EVs and e-mobility, where every watt saved translates into extended range and smarter system design. Yet beyond the established lies the experiment, where unproven methods await bold exploration.
Energy efficiency is not just about saving power; it’s about sparking possibilities, and the next breakthrough may come from your own trial and error. If you have worked with PWM-driven electromechanical relays or discovered alternative approaches, share your insights in the comments and help expand the collective knowledge base for engineers everywhere.
T. K. Hareendran is a self-taught electronics enthusiast with a strong passion for innovative circuit design and hands-on technology. He develops both experimental and practical electronic projects, documenting and sharing his work to support fellow tinkerers and learners. Beyond the workbench, he dedicates time to technical writing and hardware evaluations to contribute meaningfully to the maker community.
Related Content
- One Relay for multiple Supply Voltages
- Circuit makes PWM fan drive linear with temperature change
- Independent control of thyristor half-wave firing angles via PWM
- Relays Are Great, But There’s No Need to Let Them Waste Power
- Relay and solenoid driver circuit doubles supply voltage to conserve sustaining power
The post The pulse of power: Mastering the PWM relay appeared first on EDN.
How Precise Must We Be?

To how many significant digits does Pi (and its peers) remain relevant?
Some while ago, I downloaded a file of Pi calculated to one-hundred-thousand digits. A bit later, I downloaded a different file of Pi calculated to one million digits. I thought those were impressive, but just recently I read of a computer calculation of the value of Pi made to an insanely larger number of digits. I can’t find that article again but from memory, the calculation was run to two trillion digits.
The goal wasn’t to seek the value of Pi itself to that level of precision. It was a test of the computer, to see if it could run long enough to do that calculation without some kind of malfunction coming up. It was a test of the computer’s ability to run through very long computational processes without error. In that article, reference was made to NASA depending on the value of Pi to merely fifteen digits. This seeming disparity merited a look-see.
I looked up the definition of a parsec and found its numerical value in light years to a lot of significant digits, fourteen to be truthful. I then set up the geometry on which that number was based (Figure 1).

Figure 1 This graphic provides a visual definition of a parsec.
As the earth moves around the sun, a far-off object is observed for its apparent position in the sky. Because of parallax, there is an angular shift of that apparent position at earth’s two orbital extremes. Knowing the radius of earth’s solar orbit, half of that angular shift is taken as an angle which I call theta for which the distance to that object from the center of the sun may be calculated. The implicit assumptions are that the earth’s orbit is circular and that the sun is at the center of that circle which we know is not exactly so, but we do that anyway.
When the value of theta is one arc second or one degree divided by 3600, the distance D is defined as one parsec. Table 1 derives (with some admitted finagling which I will describe shortly) the distance of one parsec in terms of light years.

Table 1 The calculation detailed here derives parsecs in terms of light years.
The finagling part here is twofold. First, I used a value of Pi to fifteen significant digits, thus mimicking NASA. Secondly, I set the radius of earth’s solar orbit to precisely that value which yields the published value of one parsec that I found online.
That orbital radius looks just about right, but just how precise these numbers really are eludes me. For example, do we really know the earth’s orbital radius to that many significant digits? Earth’s orbit is not really circular. It is slightly elliptic. What precise refinements were made to establish the published value of D to so many significant digits? I have no idea.
Colloquially however, the value of one parsec is usually taken as 3.26 light years, which is good enough for general reading and good enough to satisfy my own curiosity. I’m perfectly happy with that fifteen digit value of Pi.
John Dunn is an electronics consultant and a graduate of The Polytechnic Institute of Brooklyn (BSEE) and of New York University (MSEE).
Related Content
The post How Precise Must We Be? appeared first on EDN.
From AI silicon observability to governed evidence
Artificial intelligence (AI) silicon is increasingly defined not only by compute capability, but by how data moves through the system. Modern AI SoCs, edge AI processors, automotive compute platforms, and AI accelerators depend on large volumes of data moving among compute engines, memory systems, sensor interfaces, accelerators, chiplet interfaces, firmware controllers, and I/O.
This is why network-on-chip (NoC) architectures have become essential. An NoC provides the internal communication fabric that helps organize routing, arbitration, bandwidth allocation, quality of service, congestion management, and latency behavior inside complex AI silicon.
But it’s important to make a clear distinction.
An NoC is part of the chip execution architecture. It’s not the same as the external signaling interfaces that bring data into or out of the chip.
External signals may arrive through MIPI, SerDes, PCIe, CXL, UCIe, LPDDR, HBM, Ethernet, CAN, or other physical and protocol interfaces. Those interfaces use PHYs, controllers, and protocol layers to move signals into a form the SoC can process internally. Once inside the chip, the NoC routes transactions among internal blocks such as CPUs, NPUs, GPUs, DSPs, memory controllers, sensor-processing blocks, safety islands, and I/O controllers.
In other words, external interfaces move signals into and out of the silicon. The NoC organizes internal data movement inside the silicon. This distinction matters because data movement is not the same as evidence governance.
NoC is not the governance layer
An NoC can move data efficiently, but it does not determine whether a later system symptom was caused by NoC behavior, timing weakness, placement and routing (P&R), power delivery, package behavior, firmware scheduling, workload bursts, or thermal conditions.
For example, a system may observe:
- Accelerator stalls
- Latency spikes
- Traffic congestion
- Power bursts
- Voltage droop
- Timing-margin loss
- Thermal hotspots
- Memory-access delays
- Chiplet-interface errors
- Workload-dependent failures
These symptoms may involve NoC activity, but NoC activity alone does not prove NoC causality.
A thermal hotspot may correlate with NoC traffic, but the root cause could also be local transistor density, P&R, clocking behavior, package thermal resistance, power-delivery weakness, firmware scheduling, workload concentration, sensor placement, board conditions, or cooling limitations.
A latency spike may appear in an NoC counter, but the underlying contributor could be memory-controller contention, cache behavior, firmware policy, workload burstiness, arbitration settings, clock-domain crossing, timing margin, or external I/O behavior.
This is the central point: NoC may be one possible contributor to observed AI silicon behavior, but it should not be assumed to be the source of the problem without admissible evidence.
Where SEGA-AI fits
SEGA-AI does not replace NoC architecture, RTL design, physical implementation, timing closure, P&R, verification, or post-silicon debug. Its role is different.
SEGA-AI defines how NoC-related observability, telemetry, counters, workload traces, firmware logs, power data, thermal data, package evidence, and system behavior are qualified before any root-cause conclusion or lifecycle-governance decision is made.
The contribution is not SEGA-AI sees a problem and knows the cause. The contribution is SEGA-AI governs the evidence path required before the system is allowed to assign cause, trigger corrective action, refine assumptions, or update lifecycle policy.
This distinction is essential for complex AI silicon because many physical, architectural, and operational mechanisms can produce similar symptoms.
- A detected hotspot is a symptom
- A detected latency spike is a symptom
- A voltage droop event is a symptom
- An accelerator stall is a symptom
SEGA-AI asks whether the evidence behind that symptom is mature enough, synchronized enough, causally valid enough, and admissible enough to support a decision.
From symptom to evidence through CEMH
Consider a realized AI SoC where telemetry reports a localized hotspot during a high-throughput workload. At level 1, with raw data, the system has only a thermal sensor observation: a localized temperature rise was detected. This observation is useful, but it’s not yet decision-ready evidence.
At level 2, with interoperable data, the temperature reading can move into a diagnostic environment, firmware log, validation database, or fleet-monitoring system. But movement does not create authority. The hotspot may be visible and accessible, but its cause is still unknown.
At level 3, with normalized evidence, the observation is linked to the context required for interpretation:
- Workload type
- Timestamp and runtime epoch
- Firmware policy state
- NoC traffic counters
- Accelerator utilization
- Memory-controller activity
- Voltage droop measurements
- Clock and power state
- Floorplan region
- Thermal sensor location
- Package thermal path
- Board and cooling condition
- Package lot and assembly history
- Validation correlation status
Only at this stage can the event begin to be compared across domains.
At level 4, with admissible evidence, the evidence must pass the Trusted Convergence Governance (TCG) gate. The system must confirm provenance, synchronization, realization-state validity, causal relevance, measurement confidence, and chain-of-custody integrity before the hotspot data can influence a convergence decision.
At level 5, with convergence-authoritative evidence, the system has enough qualified evidence to support bounded action or lifecycle refinement. That action may be a firmware policy adjustment, workload throttling, degraded mode, validation update, package constraint refinement, or future design-rule feedback.
- The hotspot may be related to NoC congestion.
- It may be related to accelerator placement.
- It may be related to P&R density.
- It may be related to package thermal resistance.
- It may be related to voltage droop and increased local switching.
- It may be related to firmware scheduling or workload concentration.
- The purpose of SEGA-AI is to prevent premature conclusions.
- A thermal sensor does not prove NoC causality.
- An NoC counter does not prove package causality.
- A voltage droop event does not prove timing causality.
SEGA-AI requires that the evidence mature through Convergence Evidence Maturity Hierarchy (CEMH) and pass TCG admissibility before any root-cause conclusion or lifecycle-governance action receives authority.
The role of CEMH, TCG, and GFL
Within the SEGA-AI framework, three layers are especially relevant.
Convergence Evidence Maturity Hierarchy (CEMH) defines how information matures from raw observation into convergence-authoritative evidence. A thermal sensor value, NoC counter, voltage monitor, or firmware trace begin as raw or interoperable data. It does not become decision-ready evidence until it has been contextualized, synchronized, qualified, and connected to the correct realization state.
Trusted Convergence Governance (TCG) acts as the trust gate. It asks whether evidence preserves provenance, synchronization validity, realization-state consistency, causal relevance, and bounded authority before it influences a decision.
Governance for Lifecycle (GFL) asks whether the realized system can remain converged throughout operational life. It’s concerned not only with whether the chip worked at initial signoff, but whether chip, package, board, firmware, workload, and field behavior remain aligned over time.
Together, these layers prevent a common failure mode: mistaking observable behavior for proven causality.
Diagnostic evidence plan
This also changes how AI silicon should be planned before implementation. Here, SEGA-AI can contribute by helping define the diagnostic evidence plan.
- Which NoC counters are needed?
- Which congestion metrics should be exposed?
- Which workload tags must be preserved?
- Which timestamps and synchronization epochs are required?
- Which voltage, thermal, clock, and power monitors are needed?
- Which firmware traces must be connected to physical state?
- Which package and board conditions must be tracked?
- Which evidence fields are required to distinguish NoC behavior from timing, P&R, PDN, thermal, firmware, or package causes?
This does not mean SEGA-AI designs the NoC. It means SEGA-AI asks what evidence must exist later so that realized-system behavior can be interpreted correctly. That is the bridge between design intent and lifecycle governance.
Why data movement alone isn’t enough
NoC architectures are essential because AI silicon needs scalable internal communication. But moving data correctly inside the chip does not automatically explain system behavior after realization. An NoC may deliver a packet correctly while the system still experiences thermal drift. Likewise, a controller may report a valid transaction while the package creates a local thermal bottleneck.
Next, a firmware trace may show a workload transition while the underlying voltage margin is collapsing. Or a sensor may report a hotspot while the causal chain remains ambiguous. This is why observability must become governed evidence before it can support lifecycle decisions.
The key question is not only: Did the data move? The real question is: Is the observed behavior mature enough as evidence to support diagnosis, intervention, or lifecycle refinement? This distinction becomes especially important in edge AI and ADAS systems.
In an ADAS platform, camera, radar, lidar, IMU, wheel-speed, steering, and vehicle-state data enter through physical interfaces and controllers. Inside the AI SoC, the NoC routes internal traffic among image processors, AI accelerators, CPUs, memory controllers, safety islands, and I/O blocks.
The AI accelerator may detect pedestrians, lanes, vehicles, or collision risk. But if a late response, thermal event, inference delay, or braking-decision uncertainty is observed, the system should not automatically blame the NoC, the AI model, the memory controller, or the package. It must first build an admissible evidence chain.
This matters because ADAS is not only a performance application; it’s a safety-critical realization environment.
A latency spike or inference delay may affect warning time, braking distance, steering support, or driver handoff. In that context, clean data movement is not enough. The system must know whether the evidence supporting the decision is synchronized, causally valid, realization-consistent, and authoritative enough for action.
For low-risk edge AI applications, a wrong output may create inconvenience or cost. For ADAS, a wrong output may affect human safety. That changes the required evidence maturity.
A safety-critical output should not receive full action authority simply because data moved correctly through the chip. It should be supported by level 5 convergence-authoritative evidence or by a pre-qualified safety envelope that has already been validated through admissible evidence.
In SEGA-AI terms, the chain is:
Input evidence → local inference → confidence and uncertainty → synchronization check → causality check → TCG admissibility gate → bounded output authority
This is why edge AI and ADAS show the difference between data movement and evidence governance. The NoC may help move sensor data, model data, and inference results; but SEGA-AI governs whether the observed behavior is trustworthy enough to support diagnosis, intervention, degraded mode, fleet learning, or safety-critical action.
From execution fabric to governance framework
The NoC is an execution fabric; SEGA-AI is a governance framework. The NoC helps the chip move data; SEGA-AI helps the system determine whether observed behavior can be trusted as evidence. And these are complementary roles.
As AI silicon becomes more complex, the industry will need both: data-movement architecture to move information efficiently inside the chip, and evidence-governance architecture to determine whether observed behavior can support root-cause analysis, corrective action, lifecycle refinement, or fleet learning.
This becomes increasingly important as systems move from design into package, board, validation, deployment, runtime adaptation, and field operation. And this discussion is not only theoretical. If realized AI systems require governed evidence, then implementation must account for evidence maturity from the beginning.
That means the design and validation plan must define not only what data moves, but what data must later be observable, timestamped, correlated, and qualified. For example, if post-silicon validation or field operation needs to distinguish NoC congestion from P&R density, package thermal resistance, memory-controller contention, or firmware scheduling, then the required evidence must be designed into the system earlier.
This includes counters, monitors, timestamping, workload tags, synchronization epochs, sensor placement, firmware traceability, package-state linkage, and validation correlation methods. In SEGA-AI terms, the theoretical model becomes practical only when it’s translated into implementation artifacts: evidence fields, admissibility checks, traceability rules, synchronization requirements, gate criteria, diagnostic workflows, and lifecycle feedback paths.
This is why the next step after governance theory is implementation specification. A system cannot govern evidence it never planned to observe.
Silicon governance complementing NoC
AI silicon performance depends heavily on data movement. NoC architectures are essential because they organize internal communication among compute, memory, accelerators, controllers, chiplet interfaces, and I/O. But NoC observability is not the same as causality.
A latency spike, hotspot, voltage droop, or accelerator stall may involve NoC behavior, but it may also be driven by timing, P&R, power delivery, package thermal paths, firmware policy, workload behavior, or system-level conditions.
However, the role of SEGA-AI is not to replace NoC design. The role of SEGA-AI is to govern the evidence required before symptoms become conclusions and before conclusions become decisions.
For AI silicon, the next challenge is therefore not only moving data efficiently. It’s qualifying observed behavior into admissible, causally grounded, convergence-authoritative evidence. In short, interoperability moves data; admissibility qualifies evidence; and governed convergence closes decisions.
Dr. Moh Kolbehdari is senior director of IC/packaging at Socionext US.
Related Content
- Speeding AI SoC development with NoC-enabled tiling
- How data movement defines performance for AI silicon
- Why verification matters in network-on-chip (NoC) design
- Taming the beast: Memory efficiency in an AI/crypto world
- Automotive silicon in the era of AI, functional safety, and cybersecurity
The post From AI silicon observability to governed evidence appeared first on EDN.






















