EDN Network

Subscribe to EDN Network feed EDN Network
Voice of the Engineer
Updated: 2 hours 30 min ago

Memory card interfaces keep pace with the internal bus evolution race: Part 1

8 hours 36 min ago

Clock speeds get faster. Per-cycle (and per-clock edge) address and data dollops get larger. And protocols get more efficient. But here we’re talking about external, not internal, buses.

Back in 2023, I devoted two blog posts’ worth of content to comparing various memory card technologies, products and speed bin options, initially in March (identifying a fake card in the process) and more in-depth in July. Since then, I’ve come across numerous examples of both evolutionary and revolutionary successors to the devices discussed in that two-part series, not to mention those covered in even more distant-past writeups (themed, for example, around the cameras, digital audio recorders and other devices that leverage such storage).

I’ve had this follow-up piece in my to-do list for a while now, and I’ve finally decided to actualize my longstanding aspiration before the dust pile accumulating on this specific list entry gets any deeper. Not every technology to be discussed in the paragraphs to follow will likely achieve high-volume market success, mind you, with any sooner-or-later failures not necessarily the result of implementation shortcomings, either. Note, for example, that today’s (and past) industry supply constraints encourage manufacturers to “double down” on maximizing the output and profitability of existing approaches, versus devoting scarce capacity to dubious bets.

That said, win or lose there’s usually an interesting story behind each approach. Without further ado…and with the upfront qualifier that I’ll be intentionally delaying any discussion of USB-interface memory devices until later, since their connector locations compel them to be fully external to the system, either sticking straight out of it or cable-tethered to it…and that for related reasons, I won’t be covering eMMC and other fully internal formats, either…and lastly, that I’ll be skipping over legacy formats that were proprietary and/or otherwise non-impactful

Historical precedents

A short writeup, “History Repeating” at Virginia Tech’s website, begins as follows:

Variations on the repeating-history theme appear alongside debates about attribution. Irish statesman Edmund Burke is often misquoted as having said, “Those who don’t know history are destined to repeat it.” Spanish philosopher George Santayana is credited with the aphorism, “Those who cannot remember the past are condemned to repeat it,” while British statesman Winston Churchill wrote, “Those that fail to learn from history are doomed to repeat it.”

Long-time readers may recall that I’ve referenced variants of this same quote theme in several past writeups, consistently with a negative connotation involving the downsides of ignorance to the past. That said, excessive dependence on history lessons can also be problematic, resulting in evolutionary, overly constraining baby-steps that suppress alternative more revolutionary strides, which may lead to failure but may also dramatically leap beyond traditional approaches.

I’ll leave you to decide for yourselves what to conclude from this first case study, admittedly too personal to likely allow me to be completely arms-length about it! Embedded within the tuple (card identifier) data structures reported by Intel’s Series 2 flash memory cards were the initials of the small team of developers, myself among them, who designed their ASIC (30 years ago…yikes!). I subsequently led the technical marketing launch of the 28F008SA 8 Mbit flash memories inside those same cards, followed by the definition, development and introduction of 16 and 32 Mbit component successors and cards based on them, all in the early-to-mid-1990s.

Products such as these, representing the industry’s first removable and high capacity (for the era, at least) memory cards, added these tuple structures and other enhancements in order to deliver full Personal Computer Memory Card International Association (PCMCIA, later known as PC Card) compatibility, in contrast to Series 1 precursors which were more elementary multi-component arrays along with address decode and chip select logic. Intel’s and others’ similar products were specifically referred to as linear flash memory PC Cards, both to differentiate them from other PCMCIA card types—modems, ISDN and SCSI, for example, and living on (at least to a degree) with CableCARDs—and from alternative ATA-interface flash memory cards.

The key difference between the two memory card types centered on where the flash media management intelligence was located: in the card itself for ATA flash PC Cards, thereby presenting a standardized hardware and software interface to the system regardless of what (and whose) media was inside, versus in the system, implemented as software and/or dedicated hardware, for the linear flash PC card approach. Proponents of the latter scheme touted its claimed reduced media bill-of-materials cost, not to mention the potential ability to direct-execute code out of it (acting as a big parallel-interface chip), but it was inherently relevant for only NOR (vs NAND) memory suppliers, along with being a “heavier lift” for system developers. For these and other reasons, the ATA approach eventually won out in the marketplace.

Miniaturization

That said, Intel and several of its NOR flash memory partner/competitors had also taken a stab at miniaturizing the linear flash PC Card with the creatively named (ha!) Miniature Card format:

Other flash memory suppliers countered with the ultimately much more popular CompactFlash card, now maintained by the aptly named CompactFlash Association (CFA), whose hardware interface was similarly PCMCIA-derived albeit instead (as with the ATA flash PC Card precursor) focused on the IDE/ATA (and later, UDMA) command set:

Amid this “where is the media management intelligence best located” debate, two other notable contending approaches of the same timeframe also bear mentioning. The first, SmartMedia, was championed by Toshiba (as well as, later, by its primary competitor, Samsung):

SmartMedia was essentially a single (although a few variants embedded multiple) NAND flash memory die embedded within a thin plastic membrane, plus a multi-contact metallic interface that wirebond-direct-connected to the die with no intervening media controller intelligence.

Conceptually sounds like linear flash PC Cards and their derivatives, doesn’t it? Yes…and no. For one thing, SmartMedia was much smaller than either Miniature Card or Compact Flash. For another, it was based on NAND flash memory, which was more HDD-like in its core attributes  (notably erase block size and speed) than NOR, simplifying system-side media management development. And then there was the fact that Toshiba wasn’t just a semiconductor supplier; its various systems divisions were potential SmartMedia implementers, and the company also did a good job of cultivating business from other Japanese and broader Asian systems manufacturers.

Finally, near the end of the last century (in 1997, to be exact), Sandisk and systems partners Siemens and Nokia unveiled the MultiMediaCard (MMC), which ultimately came in multiple dimension options, as well as in both standard and clock-boosted performance variants:

MMC is best known today in its aforementioned non-removable eMMC form, which itself is being slowly supplanted by the embedded variant of the MIPI- and SCSI-based Universal Flash Storage (UFS) (an organization whose own removable-version standard ironically has conversely been underwhelmingly adopted by the industry). Today’s generational successor to MMC is the Secure Digital (SD) card, originally referred to as SecureMMC:

which built on the MMC foundation with “enhancements including a digital rights management (DRM) feature, a more durable physical casing, and a mechanical write-protect switch.” The SD standard’s successive iterations have expanded the available clock speed, protocol and electrical contact count options in a backwards-compatible fashion to keep pace with flash memory performance gains, such as in this high-end V90 card from OWC:

The microSD Card derivative tackled substantive dimensional decreases with notable success; here’s one alongside the SmartMedia card I showed you earlier:

One interesting newer SD (and microSD) card specification variation that I became aware of recently when shopping for storage media for a couple of new Raspberry Pi cards is the Application Performance Class. Quoting from Kingston Technology documentation:

A new classification has been presented with the introduction of Android’s Adopted Storage Device feature. The App Performance Class assures minimum random and sequential performance speeds to meet both run and store execution time requirements under given conditions. It does this simultaneously while providing storage for pictures, videos, music, files and other important data. Basically, they’re ideal for use in smartphones and mobile gaming devices that run applications at random read and write speeds while also being used for storage.

 There are two ratings for the App Performance Class which are known as A1 and A2. A1 has a minimum random read of 1500 IOPS and a minimum random write of 500 IOPS while A2 has a minimum random read of 4000 IOPS and a minimum random write of 2000 IOPS. Both A1 and A2 have a minimum sustained write speed of 10MB/s. The App Performance Class is something to consider [editor: for example] when planning on installing Android apps on a microSD card.

And, by the way, unlike the SmartMedia competitor of the day, both MMC and successor SD Cards notably also embed (despite their smaller sizes) media management intelligence that simplifies and standardizes the system implementation. Moore’s Law strikes again, eh?

Hang tight; I’ll be right back

Believe it or not, I originally envisioned this being, and wrote it as, a single unified blog post. However, as thought of more (and more…and more…) things to include, the wordcount grew (and grew…and grew…), transforming it into something resembling a small book (I exaggerate, but you get my drift). Having passed through 1,500 words at the beginning of this paragraph, I’m instead going to pause for now, intending (God willing) to share the other half of this now-two-part series with you next week. Until then, please share in the comments your thoughts on what I’ve covered so far!

Brian Dipert is the associate editor, as well as a contributing editor, at EDN.

Related Content

The post Memory card interfaces keep pace with the internal bus evolution race: Part 1 appeared first on EDN.

The hidden bottleneck in LLM inference and the impact on MLPerf benchmarking

14 hours 22 min ago

Recent frontier LLM inference benchmarks have highlighted a recurring pattern. GPU-based systems deliver outstanding throughput when latency is not a concern, but their performance drops sharply once real-time response requirements are imposed.

This behavior is sometimes attributed to software inefficiencies or suboptimal system tuning. In reality, the root cause lies much deeper. It reflects a fundamental mismatch between how GPUs are architected and how autoregressive inference works.

LLM inference: Prefill versus generation

To understand this limitation, it is useful to examine the two distinct phases of LLM inference: prefill and generation.

During the prefill phase, the model processes the entire input prompt in one pass. The prompt is tokenized, embedded, and propagated through every layer of the transformer network. At each layer, the model computes the attention relationships among all tokens and builds the key-value (KV) cache, which stores the intermediate data needed for subsequent token generation.

This stage maps extremely well onto GPU hardware. GPUs were designed to execute thousands of identical operations in parallel. In the prefill phase, the model performs massive matrix multiplications over large tensors, exactly the type of workload for which GPUs excel. When all tokens are available upfront, the calculations can be distributed across tens of thousands of cores, resulting in very high arithmetic utilization.

The generation phase is fundamentally different.

Once the KV cache has been created, the model begins producing output tokens one at a time. Each token depends on all tokens that came before it. This sequential dependency means that, regardless of how much hardware is available, the model cannot generate the next token until the current one has been completed.

For every generated token, the model must read the parameters for every layer, consult the KV cache, compute the next token probabilities, and then repeat the autoregressive process. The amount of computation per token is relatively modest, but the amount of data movement remains substantial.

Two faces of GPU architecture: Why modern GPUs struggle with real-time latency constraints

This is where the GPU architecture begins to work against the workload.

GPUs achieve peak efficiency when they execute large, highly parallel workloads with regular memory access patterns. Token generation offers neither. The workload is small, inherently sequential, and dominated by repeated memory accesses rather than dense arithmetic. Many of the GPU’s compute units remain idle while the device waits for data to arrive from high-bandwidth memory.

In other words, generation is not compute-bound; it’s memory-bound.

The distinction is crucial. In a compute-bound workload, adding more arithmetic units improves performance. In a memory-bound workload, performance is limited by how quickly data can be moved to the processors. Once memory bandwidth becomes the bottleneck, additional compute resources provide diminishing returns.

This explains why GPUs can appear extraordinarily efficient when throughput is measured without latency constraints. In that scenario, inference servers are free to buffer requests and combine them into large batches. Batching allows the system to process many token streams simultaneously, effectively transforming numerous small sequential tasks into a larger parallel workload that better matches the GPU’s strengths.

The role of batch sizes in GPU’s utilization

At first glance, batching in AI inference may appear straightforward. Unlike image inference where every sample in a batch completes simultaneously, LLM inference involves many conversations progressing independently and asynchronously. Some requests finish quickly, others may continue for hundreds or even thousands of decoding iterations, and new requests may arrive continuously while older conversations are still active.

The workload therefore becomes highly dynamic and irregular. Specifically, the generation of each request ends only when the model produces a special “end-of-sequence” token indicating that the response is complete.

This characteristic fundamentally changes the nature of inference scheduling.

This is where continuous batching becomes essential. Continuous batching is the runtime orchestration algorithm responsible for managing the simultaneous execution of multiple conversations across the same accelerator resources. Instead of treating inference as a sequence of isolated batches, the scheduler continuously inserts, removes, pauses, and resumes requests as tokens are generated.

The objective is to maximize hardware utilization while minimizing user-visible latency. As batch sizes increase, hardware utilization rises and throughput improves dramatically. However, batching comes at the cost of response time.

When users expect low latency, the system cannot afford to delay requests while waiting to accumulate a large batch. Each request must be processed almost immediately. As batch sizes shrink, the GPU loses the parallelism needed to keep its compute resources busy. Utilization falls, and throughput drops accordingly.

This is the central architectural limitation of GPUs in LLM inference.

The issue becomes even more pronounced when the same accelerator must handle both prefill and generation. Prefill is a large, compute-intensive task, while generation consists of many smaller, latency-sensitive operations. When new prompts arrive, the system may need to interrupt ongoing token generation to perform prompt processing. These context switches, often referred to as preemption, increase latency and reduce efficiency further.

Inference disaggregation: A clever shortcut to mitigate GPU’s inefficiencies

To mitigate this problem, system designers have begun disaggregating inference. Instead of assigning both phases to the same accelerator pool, they dedicate one group of GPUs to prefill and another to generation. The prefill GPUs build the KV cache and transfer it to the generation GPUs, which decode tokens independently.

This separation eliminates interference between the two phases and allows each group of GPUs to operate more efficiently. Prompt processing can proceed continuously without disrupting active token generation, and generation can continue without interruption.

In controlled benchmark environments, where prompt lengths, output lengths, and request patterns are known in advance, this approach can deliver substantial improvements.

Yet the underlying limitation of GPU architectures remains.

Inference disaggregation: Does it scale in real-world applications?

The generation phase is still sequential and memory bound. No amount of software optimization can eliminate the need to read model weights and cached data for each token. The disaggregated approach simply reduces scheduling inefficiencies and isolates the phases so that GPU resources are used more effectively.

Whether this strategy can scale efficiently in real-world applications depends on workload predictability.

The real-world AI services process a highly variable mix of requests. Some consist of long prompts and short responses. Others involve short prompts and long outputs. Demand can shift rapidly over time, changing the ideal ratio between prefill and generation resources.

Adapting to these changes requires dynamically reallocating accelerators. That process is not instantaneous. Devices must be initialized, model parameters loaded, and serving infrastructure synchronized. If traffic patterns are highly volatile, the overhead of reconfiguration can offset much of the benefit.

The broader lesson is that GPU performance in LLM inference is governed by more than raw TeraFLOPS.

The prefill phase showcases the strengths of GPUs, leveraging dense matrix operations and massive parallelism. The generation phase exposes their weaknesses, forcing highly parallel processors to execute a fundamentally sequential, memory-dominated workload.

As a result, the impressive throughput numbers often reported in unconstrained benchmarks can be misleading. They reflect idealized conditions in which batching hides architectural inefficiencies. Once latency constraints are introduced, those inefficiencies become visible.

The challenge for the industry is not simply to build larger GPUs, but to develop architectures and system designs better aligned with the realities of autoregressive inference.

Until then, the most significant limitation in real-time LLM serving will remain the same: generation is a sequential, memory-bound process running on hardware originally optimized for massively parallel computation.

Lauro Rizzatti is a business development executive with VSORA, a technology company offering semiconductor solutions that redefine design performance. He is a noted chip design verification consultant and industry expert on hardware emulation.

Editor’s Note

In a two-part series, contributor Lauro Rizzattti examines how LLM inference forced changes to MLPerf benchmarking. He will illustrate the evolution of the MLPerf benchmark and detail how generative AI forced a radical shift in AI hardware evaluation in the upcoming Part 2.

Related Content

The post The hidden bottleneck in LLM inference and the impact on MLPerf benchmarking appeared first on EDN.

Build 2026: Accumulating evidence of Microsoft’s AI independence

17 hours 25 min ago

Abundant use of the AI acronym is increasingly evident at various industry events. Strip away the hype layer and look deeper, however, and interesting trends still emerge into view.

This is my third straight year covering Microsoft’s developer-focused conference, following up on the 2024 and 2025 show editions. And interestingly (at least to me), the event timing, both in an absolute sense and relative to other notable industry trade shows, has shifted each year.

  • 2024’s Build took place on May 21-23, the week after Google’s I/O developer event (May 14-16) and several weeks before Computex (June 4-7)
  • Last year, all three conferences took place on the same week
  • And this year, the Google I/O and Microsoft Build cadence returned to separate-weeks spacing, two weeks apart this time. Conversely, Build and Computex were still in the same-week slot.

Why the upfront focus on this seeming nuance? Well, for one thing, Computex conversely is a consumer-tailored show. That’s why, for example, Microsoft and NVIDIA co-announced one new computer (information on which I’ll share shortly) at Computex, while introducing another with a different form factor but the exact same processing subsystem at Build. Plus, in emphasizing a point that is likely already obvious to at least some of you, any chronological spacing between two companies’ events enables the latter to fine-tune its announcements and their messaging to react to the former…and the more spacing the better from a reaction-robustness standpoint.

Speaking of announcements, let’s get to them, shall we? Microsoft CEO Satya Nadella and his various lieutenants, along with a couple of special guests, covered a lot of ground in the 2.5-hour kickoff keynote, the video of which I’ve embedded below. I’ll hit what I thought were the highlights in the following paragraphs.

AI inference-accelerating hardware

About those computers I just mentioned…stop me if you’ve heard this before. Microsoft and a partner roll out new Windows-on-Arm computer platforms, both mobile and mini-desktop in shape, and intended for both consumers and developers. Two years ago, that partner was Qualcomm, the SoCs were the Snapdragon X Elite and Plus, and the consumer mobile systems were the Surface Laptop and Pro (also accompanied by ones from other OEMs, in a nod to Microsoft’s broader Windows-on-Arm aspirations). The developer mini-desktop was the Snapdragon Dev Kit for Windows, which never made it to production: Qualcomm “indefinitely paused” it only a few months later:

This outcome was more than a bit of a surprise to me, albeit not a complete surprise, as I’d been hearing for some time of both chronic hardware and software issues with the platform. That said, I already owned (and still use) its two Qualcomm application processor-based, developer-tailored predecessors, the Qualcomm-branded ECS LIVA Mini Box QC710:

and Microsoft’s “Project Volterra” (officially: Windows Dev Kit 2023) system:

so the Snapdragon Dev Kit for Windows was unsurprisingly on my wish list, too.

Hopefully NVIDIA will have better luck, although the situation still feels somewhat embryonic. Consumer mobile system(s) first: launched at Computex and coming “this fall” at an as-yet-unannounced price is the Microsoft Surface Laptop Ultra, based on NVIDIA’s RTX Spark SoC:

 

While you might not immediately recognize the processor from its new marketing moniker, you’ve heard about it (from me, to be precise) before. It was previously known as the N1 and N1X, as well as the GB10, and it’s the outcome of a co-development project with MediaTek, who contributed the up-to-20-core CPU constellation and reportedly also took lead on full-chip integration, including the NVLink interconnect to the up-to-6,144 core GPU cluster.

The SoC’s development has been lengthy and troubled, if longstanding and widespread rumors are to be believed, and industry analyst skepticism remains existent. It first appeared in a Linux-based system, the DGX Spark (rebranded from its initial name, Project DIGITS), last October:

And now, NVIDIA has determined that the RTX Spark is finally ready for Windows-based laptops (and not just from Microsoft itself, just as was the case two years before with Qualcomm). But not now. “This fall”. At a price to be announced later, but likely stratospheric if due only to the industry constraints-driven currently pricey “up to 128GB of unified memory”. And what about the developer mini-desktop system, the Surface RTX Spark Dev Box, unveiled at Build?

There’s…umm…a waitlist. Microsoft CEO Satya Nadella invited the Build attendees to join him on it. None of which inspires much in the way of confidence. Maybe one or both systems will be available for sale in time to end up on this November’s edition of my yearly “Holiday shopping guide for engineers”, but at this point, I’d be (pleasantly, mind you) surprised.

If you’re once again feeling déjà vu, by the way, it’s because Microsoft and NVIDIA have been here before. The initial attempt at bringing a Windows-on-Arm system to market, the Surface with Windows RT, was based on an NVIDIA Tegra SoC. I personally owned one and ended up tearing it apart after it eventually died. The hardware was first-rate for the time, although a dearth of native software in conjunction with woeful x86 code emulation support doomed it.

That was 2012. Jump forward again to the other, earlier-mentioned déjà vu moment, Qualcomm’s announced partnership with Microsoft in 2024, and I feel compelled to point out that by no means is it seemingly deceased (or even on life support, for that matter). I recently acquired a gently used Microsoft Surface Pro 11 based on Qualcomm’s Snapdragon X Plus to replace my long-in-the tooth Surface Pro X. The SP11 has 16 GBytes of RAM and a 1 TByte SSD and runs solely on its integrated battery all day with ease, even when emulating x86. Microsoft systems based on second-generation Snapdragon X2 Elite (and presumably also Plus) SoCs are seemingly coming soon. And on a similar note, Microsoft’s still churning out branded systems based on x86 CPUs, too, with most recent updates less than a month ago.

Agentic-centric O/Ss

One particularly memorable quote from Satya Nadella in the keynote was the following:

“There’s a real platform shift. We’re moving from building operating systems, devices for apps, to agents.”

Indicative of this forecasted shift is Project Solara, explained by means of a conversation between Nadella and Qualcomm President and CEO Cristiano Amon:

along with an Android-derived proof-of-concept demonstration showing agent-based interactions with (and between) a smart speaker with a screen, mobile devices, and intelligent ID cards. Google also spoke a great deal about agentic AI at its I/O developer conference two weeks ago; instead of repeating myself again, I’ll refer you to my coverage of that event for the background info if you need it.

Speaking of agents, Microsoft also announced Execution Containers, which keep agents from accessing unintended, critical regions of other agents and applications, the underlying operating system and system hardware. And for when you want to communicate with them, OpenClaw founder Peter Steinberger showed up on stage by means of introducing Scout, an OpenClaw AI Assistant gateway. If you’re thinking it sounds at least something like Gemini Spark, which Google announced two weeks back, you’re not off-base. Remember my comments at the beginning of this piece about competing-event timing and ordering and effects on later-event messaging?

Homegrown models

Last but not least, let’s touch on an event topic that prompted the “AI Independence” title of this piece. In late April, OpenAI and Microsoft “redefined” their business relationship, in the process fundamentally freeing both companies from the various exclusivity arrangements that had previously defined (and arguably dominated) it. While a “divorce” would be overstating the result, a “softer” term such as “conscious uncoupling” wouldn’t be far off.

One tangible outcome of this redefinition was clearly evident this week, as Mustafa Suleyman, head of Microsoft AI, unveiled seven new homegrown AI models with capabilities spanning image, voice and transcription functions and claimed performance matching if not exceeding that of Google, OpenAI and other competitors’ models, both open- and closed-source. I was particularly interested in Suleyman’s declaration regarding MAI-Thinking-1, the flagship reasoning model, that:

“We trained it from the ground up on clean data, without distillation from third-party models.”

And with that, I’ll wrap up for today. As always, I welcome your thoughts in the comments on the topics I’ve covered here, as well as any others that might have caught your eye—Microsoft’s ongoing research work on quantum computing, for example, including the development of Majorana 2, the sequel to last year’s premier quantum computing chip from the company.

Next Monday, Tim Cook and his CEO successor John Ternus (I’m assuming) will hit the stage to kick off Apple’s yearly Worldwide Developers Conference (WWDC), completing the yearly big-tech-company developer conference triumvirate. I’ll see you back here then, if not before!

Brian Dipert is the associate editor, as well as a contributing editor, at EDN.

Related Content

The post Build 2026: Accumulating evidence of Microsoft’s AI independence appeared first on EDN.

Agilex 9 FPGAs power COTS VPX boards

Wed, 06/03/2026 - 23:45

Altera has partnered with Mercury Systems and VadaTech to expand its Agilex 9 FPGA ecosystem with COTS VPX boards for mission-critical defense platforms. These solutions integrate Agilex 9 medium-band Direct RF FPGAs into VPX architectures, including SOSA-aligned OpenVPX, to help defense customers accelerate time-to-market, reduce SWaP, and enable flexible software-defined RF capabilities.

The Agilex 9 FPGAs combine RF data converters, FPGA fabric, and high-speed transceivers into a unified, programmable architecture, enabling real-time processing of large volumes of RF data at the edge. This integration supports distributed, multi-domain operations that require rapid decision-making and adaptation to changing mission requirements. The devices deliver the bandwidth, performance, and I/O needed for demanding embedded applications such as adaptive radar, cognitive electronic warfare, and secure, software-defined communications.

Mercury Systems’ DRF5660 boards and VadaTech’s VPX540 boards with Agilex 9 Direct RF AGRM027 FPGAs are available for order today.

Agilex 9 Direct RF series

Altera

The post Agilex 9 FPGAs power COTS VPX boards appeared first on EDN.

Value DSCs streamline embedded control

Wed, 06/03/2026 - 23:44

Digital signal controllers (DSCs) in Microchip’s dsPIC33CK Value Line provide real-time control for cost-sensitive designs. Starting at $0.51 each, they offer consistent pricing regardless of order size. The 16-bit controllers deliver 100-MHz deterministic processing, high-resolution PWM, and a 12-bit ADC supporting motor control, precision sensing and control, and touch/HMI applications.

A balanced set of peripherals helps reduce external component count, PCB footprint, and overall BOM cost. With flash memory ranging from 32 KB to 256 KB and compatibility across the dsPIC33CK family, the Value Line DSCs enable scalability and migration to future designs. The devices integrate a 12-bit ADC capable of up to 2 Msamples/s, four PWM pairs with resolution down to 2 ns, and on-chip analog comparators with a 12-bit DAC. Communication interfaces include CAN FD, LIN, SENT, UART, SPI, and I2C.

To accelerate evaluation and development, Microchip offers the dsPIC33CK Value Line Curiosity Nano evaluation kit with an onboard debugger. The evaluation platform supports the Curiosity Nano base for Click Boards and a touch adapter board for touch applications. A motor control DIM is also available for rapid prototyping of motor control designs.

Value Line DSCs are available directly from Microchip, its sales representatives, or authorized distributors.

dsPIC33CK Value Line

Microchip Technology 

The post Value DSCs streamline embedded control appeared first on EDN.

RF tool captures reusable design workflows

Wed, 06/03/2026 - 23:43

Keysight’s RF Circuit Simulation Professional software now enables engineers to document their design workflow on an executable whiteboard. The software replicates design decisions while capturing simulations, optimizations, decision trees, and parameters derived from prior analyses. Each step generates editable Python code that can be saved, shared, replayed for design reviews, and redeployed across the Keysight Advanced Design System (ADS), Cadence Virtuoso, and Synopsys Custom Compiler environments with full design data traceability.

Design teams often face workflow inefficiencies, simulation bottlenecks, and knowledge-transfer challenges. Engineers can build workflows visually on an executable whiteboard while the software automatically generates corresponding Python scripts. The platform executes simulations, optimizations, and design decisions in sequence, with support for decision-based loops and parameter settings.

Each workflow becomes a repeatable methodology that can be shared across teams, reused, and driven by AI. Captured workflows help preserve RF design expertise while creating structured design data that can support future AI-driven automation and training. Design review and tapeout tasks that previously required manual configuration now execute automatically.

RF Circuit Simulation Professional

Keysight Technologies 

The post RF tool captures reusable design workflows appeared first on EDN.

Buck controller streamlines in-vehicle USB charging

Wed, 06/03/2026 - 23:41

Diodes’ APK43070Q synchronous buck controller integrates a USB Type-C PD 3.1 source controller, simplifying automotive single- and multi-port charging designs. Operating from a 4-V to 36-V input, it enables USB Type-C charging up to 140 W. The device supports USB extended power range (EPR) and adjustable voltage supply (AVS) up to 28 V, along with standard power range (SPR) and programmable power supply (PPS) up to 21 V.

The constant-frequency controller features integrated drivers, optimized dead time, and elevated gate drive voltage for efficient mid- to high-power charging using external N-channel MOSFETs. This allows flexible MOSFET selection to balance thermal performance and power loss. A VIN DC pass-through mode further improves converter performance by enabling the high-side MOSFET to act as the VBUS switch, eliminating the need for an additional output switch.

An I2C interface with a controller/target addressing scheme enables power sharing across up to eight USB Type-C ports via resistor selection without an external MCU. The APK43070Q also includes overvoltage, overcurrent, undervoltage, and thermal protection. 

The APK43070Q is priced at $0.80 each in 1000-unit quantities.

APK43070Q product page

Diodes Inc.

The post Buck controller streamlines in-vehicle USB charging appeared first on EDN.

Low-noise USB scopes deliver 16-bit resolution

Wed, 06/03/2026 - 23:39

Pico Technology has launched the PicoScope 5000E series of USB-C oscilloscopes for analog, digital, and mixed-signal debugging. The four-channel scopes provide true 16-bit resolution with bandwidths to 200 MHz, sample rates to 2.5 Gsamples/s, and up to 1 GS of memory. PicoScope 5000E Plus models also offer a switchable 8-bit high-speed mode that raises bandwidth to 500 MHz, sample rates to 5 Gsamples/s, and memory to 2 GS.

With an ultra-low-noise front end, the oscilloscopes achieve a noise floor below 22 µV RMS and total harmonic distortion better than -73 dB. The resulting dynamic range helps reveal small-amplitude components, ripple, distortion, and other anomalies that lower resolution or noisier instruments can miss.

The compact, portable scopes connect to a host computer through a SuperSpeed USB 3.0 Type-C interface. For debug and validation, Pico 7 software provides more than 40 serial protocol decoders, advanced math channels, automated measurements including power analysis, multi-capture analysis, and measurement and mask limit testing. The Pico SDK supports custom application development using C, C#, C++, Python, MATLAB, and LabVIEW.

The PicoScope 5000E series is available in four-channel and 4+16-channel mixed-signal oscilloscope variants, with bandwidth options from 60 MHz to 500 MHz depending on model and operating mode. Units are sold through authorized distributors worldwide and directly from Pico Technology.

PicoScope 5000E product page

Pico Technology 

The post Low-noise USB scopes deliver 16-bit resolution appeared first on EDN.

Triply simply sequence supply voltages

Wed, 06/03/2026 - 15:00

This circuit design for power supply on/off sequencing uses Schmidt triggers for triple-positive-rail timing purposes.

Recent design ideas have explored the utility of timed power supply ON/OFF sequencing and provided circuit designs to implement it.  Figure 1 shows a simple topology using Schmidt triggers for timing the turn ON and OFF of triple positive supply rails.  Here’s how it works.

Wow the engineering world with your unique design: Design Ideas Submission Guide


Figure 1 This significantly simple supply sequencing scheme leverages Schmidt triggers.

Switching action begins with SPDT S1 in the OFF position which holds the C1 and C2 timing caps discharged.  The latter holds U1 pin 1 at 15v and therefore its pin 2 and the NFET Q2’s gate at zero, forcing the 5Vout rail OFF.

Meanwhile, C1’s discharged state holds U1’s pins 3 and 5 low so pins 4 and 6 sit high.  The former holds enhancement mode PFET Q1 and the 15Vout rail OFF, while the latter does the same for level shifter Q3, PFET Q4, and the 24Vout rail.

Therefore no power flows to the connected loads.  Yet, at least. Figure 2’s left side graphs the sequence of events initiated by actuating S1.


Figure 2 This plot shows power sequence timing when S1 is flipped ON and later flopped OFF.

C2 connects to ground through R3, quickly charging it to the Schmidt trigger low-going threshold in about R3C2 = 1mS.  This inverts U1 pin 2 to 15v, placing a net forward bias of 15 – 5 =10V on NFET Q2, turning it and the 5Vout rail ON.  Thus they will remain as long as S1 stays ON.

Meanwhile, reset of C1 has been released, allowing it to begin charging through R1 + R3.  The first thing that happens occurs at the end of T1 when U1 pin 3 reaches the ~9V Schmidt threshold.  Since the timeout duration is proportional to C1, any desired interval can be chosen with an appropriate RC product.  U1 pin 4 then snaps low, PFET Q1 turns ON and 15Vout goes active.

Of course C1 continues to charge, so at T2 U1 pin 5 also reaches its triggering threshold.  Then its pin 6 snaps low, turning ON Q3, Q4 and 24Vout.  The ratio R4 = 10 R5/(15 – 0.7) was chosen to apply an adequate and safe ~10V drive to Q4’s gate, independently of 24Vin. The S1 flip ON sequence is now complete.

The right side of Figure 2 shows what happens when S1 subsequently flops OFF. First, C1 is promptly discharged through R3, turning OFF Q1, Q3, Q4 and thereby 15Vout and 24Vout, putting them and whatever they power to sleep.  Meanwhile C2 begins ramping up, taking T3 to get to U1’s threshold.  When it completes the trip, pin 2 goes low, turning Q2 and 5Vout OFF. 

Turnoff sequencing is therefore complete.  Nighty night.

Details of the design include D1 and D2.  Their purpose is to make the sequencer’s response to losing and regaining of the input rail voltage orderly, and to do it regardless of whether S1 is ON or OFF.  If  S1 is OFF, then all output rails remain low and (a safe) nothing occurs when the supply voltages return.  If it’s ON, then a normally timed (and therefore safe) power-up sequence is executed.

Note that the MOSFETs should be chosen for adequate voltage and current handling capacities.  Because Q1 has 15v of gate drive and Q2 and Q4 get 10v, none need be sensitive logic-level types.

Okay.  But what if you also need to sequence a negative supply rail?  Figure 3 shows how.


Figure 3 This power switching circuit works with a negative rail.

When the U1 inverter’s input rises above the Schmidt trigger voltage, its output snaps low, causing the 2N3906  to pass Ic = (+15Vin – 0.6)/15k = 0.96mA.  This develops a 10.6V that’s independent of –Vin across the 11k resistor, saturating the NFET.  If symmetrical polarity rails (e.g. +/-15v) are needed, Figure 3 can be added to Figure 1 to provide the negative side with no other modifications required.

Stephen Woodward‘s relationship with EDN’s DI column goes back quite a long way. Over 200 submissions have been accepted since his first contribution back in 1974.  They have included best Design Idea of the year in 1974 and 2001.

Related Content

The post Triply simply sequence supply voltages appeared first on EDN.

Ruggedized connectors: Not necessarily big or bulky

Wed, 06/03/2026 - 09:44

Ruggedized connectors are usually associated with military/aerospace, industrial, and some medical applications, but there are consumer ones as well, in special circumstances. Of course, the phrase “ruggedized connector” invokes different requirements in different circumstances.

In brief, it’s the ability of the connector to endure and consistently function to specifications despite extreme mechanical, environmental, and thermal stresses. These stresses differ depending on the operating conditions but often have overlap as well. For example:

  • Connectors in land-based military systems must handle severe vibration, dirt accumulation (dust, sand, grit), and cold and heat extremes.
  • Seaborne interconnects must withstand prolonged exposure to corrosive saltwater; deep-sea ones must also withstand crushing pressure.
  • Aerospace applications must tolerate repeated take-offs, landings, and in-flight vibrations in addition to wide temperature ranges.
  • Space applications have more extreme temperature swings, vacuum exposure and outgassing, and intense mechanical stress during launch and re-entry.
  • Industrial applications often need to function despite vibration, shock, dirt, grease, abuse, and even neglect.
  • Some consumer-facing applications such as vending machines, commercial washers/dryers, arcade games, and elevators/escalators also need ruggedized attributes; it’s a surprisingly long list here.

Meeting these requirements involves an understanding of multiple factors, including:

  • Vibration: connectors in military vehicles or fighter jets are tested to resist forces up to 20 g.
  • Shock: a high-impact force during rapid acceleration or deceleration is a distinct from vibration. It can be as high as 50 g for standard connectors and 100 g for nano and micro designs.
  • Temperature extremes: ground-based systems may see temperatures ranging from -65°C to +125°C while space systems can go as high as 200°C.
  • Sealing and ingress protection: connectors may need to be protected against exposure to moisture, dust, and contaminants to ensure long-term operation using sealing solutions such as O-rings, gaskets, and grommets.
  • Corrosion: it’s caused by exposure to moisture and salt spray, leading to oxidation.

Deciding on a ruggedized connector requires attention to two broad design issues: the body or shell, and the electrical contacts.

For the body or shell, vendors and users consider what it’s made of, how it mates, retention and locking, and more. For this reason, rugged connectors are often associated with relatively bulky form factor, locking rings, and similar; but this is not necessarily the case.

For the contacts, ruggedized connectors also have sophisticated, specially designed and fabricated contacts that use suitable base metals and are clad with advanced plating to withstand and maintain contact despite the challenges. The contact pairs are often based on a multipoint design with two or four mating surfaces for redundancy, rather than a single mating point.

Start with a classic

One widely used choice for a ruggedized connection is the classic D-subminiature connector. If you think that the classic 9-pin D-subminiature connector (often called DB-9) and the rest of the broader family of D-subminiature connectors have largely disappeared due to the fading away of the “ancient” RS-232 interface—along with the rise of various versions of USB and Ethernet connectors—that’s not the case at all.

The D-sub form factor has been in use since the 1950s and still offers many advantages. It’s fully shielded against EMI/RFI and provides a sealed or nearly sealed enclosure. And it’s mechanically rugged, and its mating halves can be locked to each other with small jackscrews or other arrangements. This class of connectors is still widely used due to their flexibility, integrity, track record, and wide variety of models and versions. It’s so good that it is widely used in mil/aero and space-related designs.

This connector is offered in six basic standard-size bodies, but that is only part of its versatility. It also offers flexibility in its electrical contact positions and types.

In addition to offering connector shells with the same contact type at all positions, “Combo-D” D-subs such as those from Amphenol Positronic provide a mix of independent signal and power contacts within the connector shell (Figure 1). A single D-sub can support multiple signal contacts, power contacts, and more in a variety of mix-and-match arrangements. There are available contacts for signal, power, shielded, high voltage, thermocouple, and even fiber-optic applications.

Figure 1 The Combo-D subminiature connector style supports many signal- and power-path combinations (upper); these combinations are available in standardized, named shell sizes and contact arrangements (lower). Source: Amphenol Positronic

Among the material options for the shell are:

  • Thermoplastic polymers offer excellent mechanical strength, thermal resistance, and chemical stability. These materials effectively absorb vibration and shock in a low-weight structure.
  • Composite materials such as fiberglass-reinforced polymers and carbon fiber composites provide excellent strength-to-weight ratios. They can be engineered to maximize specific properties such as tensile strength, impact resistance, or thermal stability.
  • Metal enclosures of stainless steel and aluminum alloys are preferred materials for connector housing in the high-shock, high-vibration, and high-EMI environments of aerospace and defense applications.

The virtues of the sub-D shell—or any ruggedized housing—are an important part of the connector story, but they are only half of the ruggedness reality as the electrical contacts and their attributes are also critical. Over the years, there have been many innovations in contact technology with respect to materials, design, and electrical and mechanical performance.

For example, Amphenol Positronic uses its patented PosiBand contact technology (U.S. Patent 7,115,002) in one of its D-sub families. This contact has a unique approach to provide enhanced performance, where its external pressure-element design fully separates the mechanical action from the electrical action of the connection (Figure 2).

Figure 2 The PosiBand uses a patented design to separate the mechanical action and the electrical action of the connection. Source: Amphenol Positronic

The pressure element performs the mechanical action by applying a force pressing the male pin against the inner female cavity, achieving electrical connection along a long line of direct contact. Among its many subtle but important attributes is the spring clip within the PosiBand; it’s a small but critical part of the assembly and a key contributor to its vibration/shock performance (Figure 3).

Figure 3 The PosiBand spring clip provides a normal force across the contact area and so maximizes the electrical mating-surface contact area. Source: Amphenol Positronic

This spring-tempered beryllium copper alloy provides a normal force on the male contact, contributing to a rugged and reliable contact pairing. At the same time, it offers a lower average insertion force while meeting or exceeding performance requirements.

Consumer connectors get a little more rugged, too

The recent European initiative mandating use of USB-C for many classes of consumer end products is a major factor driving the use of this connector. Due to the wide availability of USB-C connected functions and peripherals, it seems logical that the connector and associated standard would be worth considering for medical, industrial, and other non-consumer appliances.

But there’s a problem with USB-C connectors: they are not rugged or sealed against intrusion, yet that’s where many may be used beyond low-end consumer applications.

Addressing this concern, Same Sky has introduced the UJ family of waterproof USB receptacles with IPX5, IPX6, IPX7, IPX8, IP66, IP67, and IP68 ratings, making them well-suited for applications where moisture and environmental contaminants are a concern (Figure 4). If you are not familiar with Same Sky, it was known as CUI Devices until it changed its name in September 2024.

Figure 4 These USB Type C connectors from Same Sky (formerly CUI Devices) feature water/dust intrusion-resistant O-rings to meet multiple IP ratings. Source: Same Sky

The five models are compatible with reflow soldering due to their UV-glued O-rings. This simplifies the PCB assembly process, as there is no need for a separate wave-soldering step (as is often the case with connectors and other larger components).

The five IP-rated USB Type C receptacles conform to a variety of USB standards, from USB 2.0 up to USB 4.0 Gen 3×2, with data-transfer speeds up to 40 Gbps as well as power delivery up to 240 W at 48 V and 5 A. The family also includes power-only models that remove the data-transfer pins to create a more cost-effective solution for designs where charging or power is the sole needed function.

If you are looking for a ruggedized connector, you have these and many other options. The first challenge is defining what you mean by “ruggedized” in your application beyond number and type of contacts and then pick which available connectors meet those criteria.

Maybe AI can help make the selection?

Bill Schweber is a degreed senior EE who has written three textbooks, hundreds of technical articles, opinion columns, and product features. Prior to becoming an author and editor, he spent his entire hands-on career on the analog side by working on power supplies, sensors, signal conditioning, and wired and wireless communication links. His work experience includes many years at Analog Devices in applications and marketing.

Related Content

The post Ruggedized connectors: Not necessarily big or bulky appeared first on EDN.

One-shot surge protection

Tue, 06/02/2026 - 15:00

This moral of this story: any promises of protection and safety should be double-checked for validity.

Surge (over-voltage) protection is a rather frequent function, but implementing it robustly is not a very simple task. Recently, I stumbled upon a device called an “Extension Lead With Surge Protection“. I already owned several other gadgets from the same manufacturer, and they were by and large OK. “Why not?” was my thought, so the new gadget I also bought.

Wow the engineering world with your unique design: Design Ideas Submission Guide

My initial testing involved connecting the device to AC outlets both with and without ground. The gadget correctly identified both of these configurations, which was good, but I wasn’t yet done.

The gadget also promised protection from surges up to 2000 Volts (the normal AC voltage is 220V here where I live). This protection was its main merit, so of course I had to check this feature as well. I did so with the simple circuit shown in Figure 1.


Figure 1 This simple circuit supports confirmation of valid (or not) surge voltage protection.

The circuit produces a DC voltage of roughly double the input AC amplitude, approximately 600V in this case. The DC output shouldn’t be considered a shortcoming! The values of resistors R1 and R2, and diode Z1 (a 200V Zener diode in this case) need to be recalculated if your AC outlet voltage isn’t 220V.

The circuit also includes a LED which will illuminate only when this doubled voltage really is present on the gadget’s output.  The LED should be bright enough for a current of ~ 1mA or less. Should I warn you here to beware of high voltage; to be cautious and not to connect any inappropriate load to the circuit? That said, I’ll continue the story. 

I connected an AC/DC voltmeter to the output of the “Extension Lead with Surge Protection”, while its input was connected to the output of the circuit. The voltmeter showed 600+ volts! The gadget was simply translating its input to the output without any high voltage detection, far from protection.

To figure out what had gone wrong, I had to dismantle the gadget, which was not a simple task, as as it turned out. The screws required a very specific bit in order for them to be unscrewed. At this point, I prepared to see something interesting inside, and indeed there was!

The circuit within had several transistors to detect unconnected ground, which I’d already confirmed worked. It also had two varistor/thermal switch pairs, in thermal contact. Unfortunately, these thermal switches were only single-tasked! Being one-shot fuses, they could protect the load only once, leaving it permanently disconnected afterwards.  “One-shot Surge Protection” would have been more accurate.

It seems that the designers realized this fault too late, so they instead connected the output of the gadget directly to its input, bypassing and completely disabling any surge protection in the process! My disappointing purchase had transformed into an interesting project, enabling me to re-enable the gadget’s protection again, albeit on a one-time-only basis.

Peter Demchenko studied math at the University of Vilnius and has worked in software development.

Related Content

The post One-shot surge protection appeared first on EDN.

The firmware-hardware handshake in a silicon governance system

Tue, 06/02/2026 - 11:10

Design-time closure is no longer the end of system convergence.

In modern AI silicon—encompassing chiplet-based platforms, high-bandwidth memory systems, and advanced heterogeneous packages—the realized system continues to change after release. Workloads shift. Voltage and thermal conditions move dynamically. Network-on-chip (NoC) traffic patterns vary. Memory pressure changes. SerDes links retrain. Aging accumulates. Package and board environments influence behavior over time.

A system may pass design signoff, validation, and qualification, yet still encounter runtime states that were not fully represented during design-time closure. This does not mean the original design was wrong. It means the operating system has entered a lifecycle regime where hardware state, firmware response, and evidence maturity must remain synchronized.

This is where the firmware–hardware handshake becomes important. Hardware senses the condition; firmware executes bounded actions; and governed evidence determines whether the action is valid.

The handshake is not an uncontrolled autonomous loop. It’s a disciplined runtime structure that connects hardware telemetry, firmware policy, causality interpretation, bounded action envelopes, rollback limits, and lifecycle evidence.

In this viewpoint, firmware is not the intelligence. Firmware is the bounded execution layer. The intelligence is in the governed interpretation of evidence: whether a signal is mature enough, synchronized enough, causally grounded enough, and safe enough to support action.

From observability to action

In complex AI silicon, observability is expanding rapidly. NoC counters, voltage monitors, thermal sensors, ECC logs, accelerator stall indicators, memory-controller events, SerDes retraining records, clock-domain telemetry, firmware traces, and package-level sensors can all provide valuable runtime information.

Here is how the firmware–hardware handshake layer works in governed runtime convergence. Source: Author

Hardware telemetry is captured, normalized into evidence, checked for admissibility, evaluated for causality, and passed through bounded firmware policy before any runtime action is executed and recorded as lifecycle evidence. But telemetry alone does not create authority.

An NoC latency spike may correlate with workload congestion, but it may also reflect a localized thermal hotspot, voltage droop, memory backpressure, firmware scheduling behavior, or package-level power delivery instability. A SerDes retraining event may indicate channel degradation, but it may also be triggered by temperature drift, reference-clock behavior, board-level noise, connector variation, or power integrity disturbance.

The runtime system therefore faces a difficult question: When should firmware act?

If firmware acts too slowly, the system may lose performance, reliability, or availability. If firmware acts too aggressively, it may create instability, hide root cause, or trigger unnecessary throttling, rollback, or degraded operation. If firmware acts on weak evidence, it may correct the wrong problem.

This is why runtime telemetry must mature into governed evidence before it’s used to drive consequential action.

Hardware as sensing layer

Hardware provides the first layer of runtime awareness.

Examples include NoC latency, congestion, retry, and utilization counters; voltage droop sensors and current monitors; thermal sensors and hotspot indicators; memory-controller stalls and ECC events; SerDes equalization, retraining, and link-margin information; accelerator utilization and stall counters; clock, reset, and power-state telemetry; and package, board, and system-level sensor data.

These signals provide visibility into how the system behaves under real workload and environmental conditions.

However, hardware signals are not self-explanatory. They must be interpreted in context. A voltage droop event means something different during peak AI workload than during idle transition. A thermal hotspot means something different if it is stable, spreading, oscillating, or correlated with a specific workload pattern. An NoC stall means something different if it aligns with memory saturation, power throttling, package temperature, or firmware scheduling.

The key point is simple: Hardware can sense state, but it does not automatically explain state. And that explanatory layer requires causality, evidence maturity, synchronization, and decision context.

Firmware as bounded execution layer

Firmware is the natural runtime bridge between hardware state and system response. Depending on the platform, firmware may be able to adjust voltage and frequency states, throttle selected regions, retrain high-speed links, reduce lane rate or link width, isolate a tile or accelerator block, migrate workload away from a stressed region, change scheduling policy, request diagnostic capture, enter deterministic degraded mode, or trigger service and validation escalation.

These actions are powerful because they allow the system to respond before a condition becomes a failure. But that power also creates risk.

Firmware should not become an unconstrained autonomous agent. A firmware action can affect performance, lifetime, reliability, customer experience, safety margin, and debug visibility. If firmware changes the operating state without traceable evidence, the system may appear to recover while the underlying cause remains unresolved.

One of the risks of adaptive firmware is that it can unintentionally hide the physical root cause. A system may appear stable because a link retrained, a frequency state changed, a workload migrated, or a region was throttled. But if the intervention is not tied to a normalized evidence record, the original cause may disappear from view. In advanced systems, repeated compensation can become a failure mode of its own.

The purpose of the firmware–hardware handshake is therefore not only to act, but to preserve the evidence trail behind the action. In other words, the correct role of firmware is not unlimited control. The correct role is bounded execution.

Firmware should execute only within approved policy limits, with clear evidence requirements, confidence thresholds, rollback rules, and auditability.

The handshake model

The firmware–hardware handshake can be described as a governed runtime sequence:

Hardware state → contextual capture → normalized evidence → admissibility check → causality assessment → firmware policy → bounded action → updated evidence → lifecycle record

Each step prevents runtime telemetry from becoming uncontrolled action.

First, the hardware signal must be captured with context: timestamp, workload class, physical location, power state, thermal state, firmware version, configuration state, and system region. Second, the signal must be normalized into an evidence object. A raw sensor reading or counter value is not enough. It must be linked to the specific system condition it describes.

Third, the evidence must be checked for admissibility. Is the timestamp valid? Is the firmware version known? Is the sensor calibrated? Is the workload context synchronized? Is the signal consistent with voltage, thermal, memory, package, and board evidence? Is the proposed cause physically plausible?

Fourth, firmware action must remain inside a bounded envelope. The system may allow a defined frequency reduction, limited link retraining, controlled workload migration, or temporary degraded mode. But if evidence confidence is low or the action exceeds policy authority, escalation is required.

Finally, the outcome must be recorded. Did the action stabilize the system? Did the same condition recur? Did the event indicate a one-time workload excursion, a design margin issue, a package-related sensitivity, or an aging trend?

This is how runtime action becomes lifecycle evidence.

Bounded action envelopes

The bounded action envelope is the core safety mechanism. It defines what firmware may do, under what evidence conditions, and with what limits. For example, a firmware policy may allow temporary throttling if thermal evidence is mature, localized, and correlated with workload.

It may allow link retraining if signal-margin evidence crosses a defined threshold. It may allow workload migration if a tile shows repeated voltage-droop sensitivity under known conditions. It may allow deterministic degraded mode if full performance cannot be preserved without violating reliability boundaries.

But the same policy may block action when evidence is incomplete. If an NoC latency spike occurs without synchronized voltage, thermal, workload, and memory context, firmware should not automatically classify the NoC as the root cause.

If a link repeatedly retrains after thermal cycling, firmware should not hide the event indefinitely by retraining silently. If a voltage-droop event becomes recurrent under a specific package lot, board lot, workload class, or thermal condition, the system should escalate the event instead of silently compensating through repeated firmware action.

Bounded action does not mean passive behavior; it means disciplined behavior. The system can respond, but it must respond within governed limits.

Extending convergence into runtime

The handshake extends governed convergence beyond design-time. At design-time, engineers close the system against modeled requirements, simulated margins, validation data, and qualification evidence. At runtime, the system encounters real workload, real aging, real environment, and real variation.

The firmware–hardware handshake allows convergence to continue operationally. Several runtime concepts become useful here.

  • A boot-time realization baseline can capture the initial measured system state at startup. This provides a reference for later drift.
  • A corridor stability index can summarize the health of a specific governed path, such as an NoC region, power domain, HBM interface, SerDes path, or package-to-board corridor.
  • A global convergence epoch can ensure that telemetry from multiple runtime sources is compared within a valid synchronization window.
  • Realization fatigue tracking can monitor accumulated stress, repeated throttling, retraining frequency, thermal exposure, voltage events, or degradation patterns.
  • A deterministic degraded mode can preserve safe operation when full performance is no longer evidence-supported.

These concepts are not meant to add vocabulary for its own sake. They define how runtime signals can be organized into a governed system state rather than scattered logs.

Why this matters for AI silicon

AI workloads are especially relevant because they stress systems dynamically and unevenly.

A training or inference workload may create localized NoC congestion, memory pressure, power spikes, or thermal concentration. The system may remain within global specifications while a local region experiences repeated stress. A package or board condition may interact with workload behavior in ways that were not fully visible during nominal validation.

In such systems, the firmware–hardware handshake becomes a reliability and performance tool. It allows the platform to distinguish between transient workload variation, recurring physical sensitivity, firmware scheduling artifacts, marginal power delivery behavior, thermal containment issues, aging-related degradation, validation escapes, and package or board interaction.

The goal is not to blame the NoC, firmware, package, power delivery network (PDN), memory, board, or workload too early. The goal is to preserve causality until the evidence is mature enough to support a decision.

Relationship to fleet learning

Runtime evidence becomes even more valuable when it’s aggregated across systems, products, lots, platforms, and field conditions. This is where fleet learning enters the picture.

Fleet learning becomes valuable when repeated runtime patterns appear across systems, lots, boards, packages, workloads, or field environments. A recurring SerDes retraining signature after thermal exposure may indicate a package, board, connector, or policy sensitivity.

A workload-specific droop pattern across a defined power domain may inform future PDN design or validation coverage. A degradation signature that appears after a thermal-cycle threshold may reshape future qualification assumptions.

But these patterns should not automatically rewrite firmware policy. Field data should not autonomously change system behavior, alter operating limits, or modify release criteria. Fleet learning recommends and bounded gate authority approves. This preserves the difference between learning and governing.

Physical state and bounded action handshake

The firmware–hardware handshake is becoming a necessary part of advanced system realization.

As AI silicon, chiplets, HBM platforms, high-speed interconnects, and advanced packages become more dynamic, design-time closure alone cannot cover every runtime state. Hardware must sense. Firmware must respond. But the response must remain bounded by evidence maturity, causality, synchronization, rollback limits, and lifecycle governance.

So, the future system will not be defined only by better telemetry or more autonomous firmware; it will also be defined by a disciplined handshake between physical state and bounded action.

In SEGA-AI terms:

  • Observability provides signals
  • Admissibility qualifies evidence
  • Bounded firmware action preserves convergence
  • Fleet learning refines the next lifecycle decision

The system does not remain trustworthy because it can sense everything. It remains trustworthy when it knows which signals are mature enough to act on.

Dr. Moh Kolbehdari is senior director of IC/packaging at Socionext US.

Editor’s Note

This is Part 2 of the article series about silicon governance framework for AI silicon. Part 1 described why data movement alone cannot explain system behavior in modern AI chip designs.

Related Content

The post The firmware-hardware handshake in a silicon governance system appeared first on EDN.

HIL platform automates tests to validate hardware behavior

Mon, 06/01/2026 - 19:02

A new hardware-in-the-loop (HIL) testing framework claims to make automated, hardware-validated testing accessible to every team by offering engineering resources previously available only at large enterprises. This new testing framework—called BootLoop Test—unifies bench, continuous integration (CI), and end-of-line validation on a single platform.

Though HIL testing is one of the most valuable practices in the hardware world, it’s mostly adopted without any rigorous testing infrastructure. That’s because building a hardened HIL framework requires dedicated test engineers, months of custom development, and specialized skills that most firmware teams don’t have.

Consequently, many companies either forgo testing entirely or rely on ad hoc scripts and manual validation processes. That, in turn, slows development cycles, misses errors, and causes fragile release processes.

BootLoop, a startup that provides an AI platform for firmware and embedded development, addresses this problem by offering a complete HIL platform that spans the entire embedded product lifecycle. As a result, a hardware company can go from zero testing infrastructure to a fully automated pipeline in days.

“Most hardware companies know they need more rigorous firmware testing,” said Noah Pacik-Nelson, CEO of BootLoop. “They just don’t have the time or the tools. We built BootLoop Test, so they don’t have to choose between shipping quickly and shipping robust code.”

The HIL test platform helps teams to create a fully automated pipeline in days. Source: BootLoop

BootLoop’s agent ingests PCB design files and component datasheets to automatically generate tests that validate real hardware behavior down to the register level. The agent connects to serial monitors, debuggers, and test equipment to iterate until the code runs clean. So, test teams can go from zero testing infrastructure to a full CI pipeline on real hardware in hours by using a single command install.

BootLoop—a Y Combinator company founded by SpaceX and MIT Media Lab engineers—covers the entire embedded development lifecycle, including development, testing, and debugging. The company was founded in 2025 and is based in San Francisco.

Related Content

The post HIL platform automates tests to validate hardware behavior appeared first on EDN.

TP-Link’s Tapo P105: A Kasa EP10 clone, or evolutionarily derived?

Mon, 06/01/2026 - 15:00

Two devices. Same manufacturer. Similar cosmetics. (Near-)identical dimensions. Different branding. What about the insides?

After taking a month’s break from the TP-Link smart plug family teardown cadence, I’m back for more. This time, we’ll be looking inside the Tapo P105, one member of a four-pack, to be exact.

Back in early December, I’d noted that it was a “seeming Tapo equivalent to the Kasa EP10”, which I’d subsequently dissected for early March publication, and indeed there are many similarities between them:

  • The Kasa EP10 has published dimensions of 2.36 x 1.50 x 1.21 in (60 x 38 x 33 mm), while those of the Tapo P105 are near-identical (in imperial units, that is, identical in metric): 2.4 × 1.5 × 1.3 in (60 × 38 × 33 mm)
  • They both support switching load currents of up to 15 A
  • And they both support Amazon (Alexa), Google (Assistant and Gemini) and Samsung (SmartThings) smart device protocols, in addition to company-proprietary schemes.
A smart plug by any other name…

The last bit of that last bullet, however, is indicative of a minor-at-least deviation between them. The earlier device was the Kasa EP10; this one’s the Tapo P105. Once again requoting my early December piece, appropriately titled “Tapo or Kasa: Which TP-Link ecosystem best suits ya?”:

“Kasa” was TP-Link’s original smart home device brand, predominantly marketed and sold in North America. The company, for reasons that remain unclear to me and others, subsequently, in parallel, rolled out another product line branded as “Tapo” across the rest of the world. Even today, if you visit the “smart plugs” product page on TP-Link’s website, you’ll see a mix of Kasa- and Tapo-branded products. The same goes for wall switches, light bulbs, cameras, and other TP-Link smart home devices. And historically, you needed to have both mobile apps installed to fully control a mixed-brand setup in your home.

Fortunately, TP-Link has made some notable improvements of late, from which I’m reading between the lines and deducing that a full transition to Tapo is the ultimate intended outcome. As I tested and confirmed for myself just a couple of days ago, it’s now possible to manage both legacy Kasa and newer Tapo devices using the same Tapo app; they also leverage a common TP-Link user account…They all remain visible to Alexa, too, and there’s a separate Tapo skill that can also be set up…along with, as with Kasa, support for other services.

A perusal of the outside cosmetics also reveals some differences. The Kasa EP10’s status LED is integrated within the left-side-located multi-function on/off, pairing and reset switch:

whereas the Tapo P105’s status LED is in the top-left corner of the front panel, with the left-side switch now non-illuminated:

…would switch as sweet?

The illumination locational variance between the two devices presumably results in at least some internal-layout deviance between them, but what about the building-block components themselves? Reiterating what I’ve asked before in similar teardown comparison projects, how different (if at all) are these two product generations from a hardware standpoint, versus TP-Link relying solely on software-only differentiation schemes? Let’s find out.

I’ll start with a conceptual internal view to whet your appetite:

As mentioned previously, today’s patient was sourced from a four-pack that I’d acquired during a 2025 Thanksgiving-week Amazon Warehouse-now-Renewed promotion for $18.06 ($25.80 minus 30%). I’ll start with some outer box shots, as usual accompanied by a 0.75″ (19.1 mm) diameter U.S. penny for size comparison purposes.

The “US/1.26” bit in the upper right corner of the product label in the following photo, based on my past experiences with TP-Link gear, is suggestive of hardware v1.26 inside the box. I’ve mentioned before both the company’s tendency for hardware-iteration profusion and the inter-version compatibility problems that can result from it. That said, the Tapo P105 product page on TP-Link’s website lists only hardware versions v1 and v1.2 (but not v1.26) for both the one- and four-pack bundle variants. Dive into the product support page, on the other hand, and four to-date hardware versions are listed there (none of them v1.2, ironically):

  • v1
  • v1.26 (mine)
  • v1.60, and
  • v1.80

So…🤷‍♂️

Onward…

Time to dive inside…

The first things I found were a piece of protective foam, a slip of quick-start literature (PDF), and a small sheet of clear plastic.

What I subsequently realized was that the latter was normally folded in thirds and wrapped around two of the smart plugs. Its sibling was still in place, thereby tipping me off that (at least) one of the two lower devices in the box was removed (and presumably tried out) pre-return by the original purchaser.

Let there (not) be blood

I went with the one in the lower left corner as my dissection victim. Front:

Left side (and upside-down, I subsequently realized):

Back (note the screw head; hold that thought):

Right side (once again upside-down, too):

Top:

And last but not least, the most informative of the lot, the bottom (the penny’s temporarily taped in place from underneath, in case you were wondering):

There’s that US/1.26 notation again, along with the always useful FCC ID (2AXJ4P105):

Remember that screw head I noted earlier? Buh-bye:

I’ve taken apart a few of these devices’ cases by now, so I’ve figured out how to do so without maiming myself like I did the first time (and yes, I realize I’ve just jinxed myself by writing this):

Mission accomplished.

SoC swap motivation: Processing necessity or product availability?

And now for the perspectives you all care about:

The switch, as noted before, is still on the left side:

but whereas with the Kasa EP10, it had been mounted to the same mini-PCB that contained the system SoC:

it’s now standalone, with the mini-PCB lodged in one corner, as already suggested by the earlier-shown conceptual teardown image and presumably to improve wireless connectivity:

The SoC itself is also evolved, from the Realtek RTL87210 to the same dual-core RTL8720 (PDF) found in the Kasa EP25, whose teardown was published in late March.

Note once again the presence of an antenna connector on the module, not used in this particular system implementation.

A relay merry-go-round

Once again on the right side is the blue-colored relay:

this time a Churod A16-V-105DA2F (PDF):

Top and bottom side perspectives follow, for your “edumacation” purposes:

And alas, as with its TP-Link-developed predecessors, I was unable to share with you any perspectives of the PCB backside, although as you might be able to tell from the glimpses in the following shots, there’s not much there to share anyway.

As usual, the FCC certification documentation provides additional visual insights.

And that’s “all” I’ve got for you today! Next up in the TP-Link smart plug dissection series, again as I initially alluded to back in December, I plan to tear down the Tapo P125, which builds on the Tapo P105 foundation with Apple HomeKit (now Apple Home) “smart” support. It’s akin to the earlier Kasa EP10-to-EP25 transition, albeit absent added energy monitoring features this time. Until then, and as always, I welcome your thoughts in the comments!

Brian Dipert is the associate editor, as well as a contributing editor, at EDN.

Related Content

The post TP-Link’s Tapo P105: A Kasa EP10 clone, or evolutionarily derived? appeared first on EDN.

The pulse of power: Mastering the PWM relay

Mon, 06/01/2026 - 10:59

Imagine a component that combines the heavy-duty muscle of a power relay with the surgical precision of a digital signal. That is the essence of a pulse width modulation (PWM) relay. While traditional switches are often strictly binary, the integration of pulse width modulation allows engineers to go beyond simple “on-off” control, enabling significant power savings and reduced heat signatures.

The “PWM relay” myth

While high-speed switching is often associated with the solid-state relay (SSR), the real magic happens when applying these pulses to a standard electromechanical relay (EMR). By modulating the “hold current” of an EMR coil, you can prevent overheating and drastically extend the life of your hardware. Whether you are managing automotive solenoids or optimizing industrial control panels, understanding the synergy between PWM and EMR is the key to transforming a basic mechanical switch into a sophisticated, energy-efficient power management tool.

However, if you head to an electronics distributor, looking for a “PWM relay,” you will likely hit a dead end. You cannot easily buy a dedicated PWM-enabled or PWM-driven EMR off the shelf because PWM is not a physical feature of the relay itself; it’s a control strategy applied by the external circuit.

To achieve this, you typically need a devoted relay driver or a microcontroller to manage the signal. By sending a high-frequency pulse to a standard, inexpensive EMR, you effectively turn a “dumb” mechanical switch into a “smart” energy-saver. While an SSR is natively capable of high-speed switching for load modulation, using PWM with a traditional EMR is specifically about optimizing the coil’s efficiency, allowing you to reap the benefits of mechanical isolation without the drawback of a roasting-hot solenoid.

The “holding current” tweak

Nowadays electromechanical relays are widely used across automation systems because they enable a low-power signal to control a high-power circuit. Yet, the conventional method of relay operation is relatively energy-intensive, often producing excess heat and demanding a sizeable power supply. In practice, energizing a relay requires more power than simply holding it in the active state.

This opens the door to efficiency gains: by applying pulse width modulation to the coil’s holding current, we can reduce the duty cycle and thereby lower the average current. The result is decreased power consumption, less heat generation, and improved thermal management—particularly valuable in applications that employ banks of relays.

As a quick design example, begin by switching the relay driver MOSFET fully on to apply voltage to the coil for at least 100 ms. During this initial energizing phase, set the duty cycle to 100% to ensure the MOSFET is fully on, and the relay pulls in reliably.

Once the relay is engaged, transition to PWM control with a reduced duty cycle—say 50%—to sustain the relay state while cutting power consumption. This approach maintains functionality while significantly lowering average current draw, reducing heat, and improving overall efficiency.

Figure 1 Basic schematic illustrates PWM control for lowering relay coil holding voltage. Source: Author

As an aside, while current is the physical mechanism at play, “holding voltage” is a very common industry term because engineers often think in terms of the voltage applied to the circuit.

Practical switching: EMRs and PWM

On the workbench, additional considerations arise when using PWM to drive EMRs.

In conventional relay designs, the nominal coil voltage must be continuously applied to keep the relay energized, which reduces overall energy efficiency. By contrast, PWM-driven relays can operate with reduced effective coil voltage, significantly lowering power consumption, an advantage in energy-conscious applications.

PWM drivers regulate the effective voltage by adjusting the duty cycle of a DC signal at a fixed frequency. A quick note: Duty cycle is usually given as a percentage, while duty ratio is the same concept expressed as a fraction. Relay coils, being inductive, respond to duty-cycle transitions with current fluctuations. The resulting ripple depends on coil inductance, suppression circuitry, PWM frequency, voltage level, and duty cycle.

Best practice is to begin with a 100% duty cycle until the relay pulls in and stabilizes. The required time varies with relay type and excess voltage but typically falls between 100–500 milliseconds. Afterward, the duty cycle can be reduced to maintain holding current.

Higher PWM frequencies reduce ripple, allowing lower effective coil voltages while keeping other parameters constant. Frequencies in the 20–100 kHz range are generally recommended. Since effective coil voltage equals the product of supply voltage and duty cycle, tight regulation is essential. Even small supply variations demand rapid duty-cycle adjustment—within a few milliseconds—to prevent the effective voltage from dropping below the relay’s minimum requirement.

For reliable performance, coil current must always exceed the holding current plus a margin for shock and vibration. If current falls below this threshold, the armature may release, causing repeated pull-in cycles. Such instability can lead to humming noise, unintended contact opening under load, or even contact welding.

Notably, an increasing range of EMRs now support PWM-regulated holding currents to improve thermal management and efficiency. By modulating the duty cycle once the armature is seated, these relays minimize steady-state power dissipation. The Omron G2RL-1A-E-PW1 exemplifies this trend, featuring a coil architecture optimized for PWM and reduced-voltage holding.

Figure 2 The G2RL-1A-E-PW1 relay utilizes PWM control to minimize coil power consumption and heat. Source: Omron

What is more, dedicated PWM current controllers like DRV110 and DRV120 are specifically engineered to optimize relay and solenoid operation through precise waveform regulation. These ICs rapidly ramp the current to a peak level to ensure the plunger or contactor fully seats.

Once actuation is confirmed, they transition to a significantly lower hold current, which maintains the magnetic field while drastically reducing power dissipation. By managing this peak-to-hold transition automatically, these controllers prevent thermal overhead and extend the operating life of the inductive load.

Figure 3 A prewired DRV120 module empowers makers and experimenters to slash relay power consumption by automatically transitioning from pull-in to hold current. Source: tindie

Clever pulses never stop

Where does this leave us? Whether through basic RC mechanisms, dedicated integrated solutions, or the efficiency gains of PWM applied to electromechanical relays, engineers have a wide range of proven strategies to reduce relay energy consumption.

This is more significant nowadays in the era of EVs and e-mobility, where every watt saved translates into extended range and smarter system design. Yet beyond the established lies the experiment, where unproven methods await bold exploration.

Energy efficiency is not just about saving power; it’s about sparking possibilities, and the next breakthrough may come from your own trial and error. If you have worked with PWM-driven electromechanical relays or discovered alternative approaches, share your insights in the comments and help expand the collective knowledge base for engineers everywhere.

T. K. Hareendran is a self-taught electronics enthusiast with a strong passion for innovative circuit design and hands-on technology. He develops both experimental and practical electronic projects, documenting and sharing his work to support fellow tinkerers and learners. Beyond the workbench, he dedicates time to technical writing and hardware evaluations to contribute meaningfully to the maker community.

Related Content

The post The pulse of power: Mastering the PWM relay appeared first on EDN.

How Precise Must We Be?

Fri, 05/29/2026 - 15:00

To how many significant digits does Pi (and its peers) remain relevant?

Some while ago, I downloaded a file of Pi calculated to one-hundred-thousand digits. A bit later, I downloaded a different file of Pi calculated to one million digits. I thought those were impressive, but just recently I read of a computer calculation of the value of Pi made to an insanely larger number of digits. I can’t find that article again but from memory, the calculation was run to two trillion digits.

The goal wasn’t to seek the value of Pi itself to that level of precision. It was a test of the computer, to see if it could run long enough to do that calculation without some kind of malfunction coming up. It was a test of the computer’s ability to run through very long computational processes without error. In that article, reference was made to NASA depending on the value of Pi to merely fifteen digits. This seeming disparity merited a look-see.

I looked up the definition of a parsec and found its numerical value in light years to a lot of significant digits, fourteen to be truthful. I then set up the geometry on which that number was based (Figure 1).


Figure 1 This graphic provides a visual definition of a parsec.

As the earth moves around the sun, a far-off object is observed for its apparent position in the sky. Because of parallax, there is an angular shift of that apparent position at earth’s two orbital extremes. Knowing the radius of earth’s solar orbit, half of that angular shift is taken as an angle which I call theta for which the distance to that object from the center of the sun may be calculated. The implicit assumptions are that the earth’s orbit is circular and that the sun is at the center of that circle which we know is not exactly so, but we do that anyway.

When the value of theta is one arc second or one degree divided by 3600, the distance D is defined as one parsec. Table 1 derives (with some admitted finagling which I will describe shortly) the distance of one parsec in terms of light years.


Table 1 The calculation detailed here derives parsecs in terms of light years.

The finagling part here is twofold. First, I used a value of Pi to fifteen significant digits, thus mimicking NASA. Secondly, I set the radius of earth’s solar orbit to precisely that value which yields the published value of one parsec that I found online.

That orbital radius looks just about right, but just how precise these numbers really are eludes me. For example, do we really know the earth’s orbital radius to that many significant digits? Earth’s orbit is not really circular. It is slightly elliptic. What precise refinements were made to establish the published value of D to so many significant digits? I have no idea.

Colloquially however, the value of one parsec is usually taken as 3.26 light years, which is good enough for general reading and good enough to satisfy my own curiosity. I’m perfectly happy with that fifteen digit value of Pi.

John Dunn is an electronics consultant and a graduate of The Polytechnic Institute of Brooklyn (BSEE) and of New York University (MSEE).

Related Content

The post How Precise Must We Be? appeared first on EDN.

From AI silicon observability to governed evidence

Fri, 05/29/2026 - 11:06

Artificial intelligence (AI) silicon is increasingly defined not only by compute capability, but by how data moves through the system. Modern AI SoCs, edge AI processors, automotive compute platforms, and AI accelerators depend on large volumes of data moving among compute engines, memory systems, sensor interfaces, accelerators, chiplet interfaces, firmware controllers, and I/O.

This is why network-on-chip (NoC) architectures have become essential. An NoC provides the internal communication fabric that helps organize routing, arbitration, bandwidth allocation, quality of service, congestion management, and latency behavior inside complex AI silicon.

But it’s important to make a clear distinction.

An NoC is part of the chip execution architecture. It’s not the same as the external signaling interfaces that bring data into or out of the chip.

External signals may arrive through MIPI, SerDes, PCIe, CXL, UCIe, LPDDR, HBM, Ethernet, CAN, or other physical and protocol interfaces. Those interfaces use PHYs, controllers, and protocol layers to move signals into a form the SoC can process internally. Once inside the chip, the NoC routes transactions among internal blocks such as CPUs, NPUs, GPUs, DSPs, memory controllers, sensor-processing blocks, safety islands, and I/O controllers.

In other words, external interfaces move signals into and out of the silicon. The NoC organizes internal data movement inside the silicon. This distinction matters because data movement is not the same as evidence governance.

NoC is not the governance layer

An NoC can move data efficiently, but it does not determine whether a later system symptom was caused by NoC behavior, timing weakness, placement and routing (P&R), power delivery, package behavior, firmware scheduling, workload bursts, or thermal conditions.

For example, a system may observe:

  • Accelerator stalls
  • Latency spikes
  • Traffic congestion
  • Power bursts
  • Voltage droop
  • Timing-margin loss
  • Thermal hotspots
  • Memory-access delays
  • Chiplet-interface errors
  • Workload-dependent failures

These symptoms may involve NoC activity, but NoC activity alone does not prove NoC causality.

A thermal hotspot may correlate with NoC traffic, but the root cause could also be local transistor density, P&R, clocking behavior, package thermal resistance, power-delivery weakness, firmware scheduling, workload concentration, sensor placement, board conditions, or cooling limitations.

A latency spike may appear in an NoC counter, but the underlying contributor could be memory-controller contention, cache behavior, firmware policy, workload burstiness, arbitration settings, clock-domain crossing, timing margin, or external I/O behavior.

This is the central point: NoC may be one possible contributor to observed AI silicon behavior, but it should not be assumed to be the source of the problem without admissible evidence.

Where SEGA-AI fits

SEGA-AI does not replace NoC architecture, RTL design, physical implementation, timing closure, P&R, verification, or post-silicon debug. Its role is different.

SEGA-AI defines how NoC-related observability, telemetry, counters, workload traces, firmware logs, power data, thermal data, package evidence, and system behavior are qualified before any root-cause conclusion or lifecycle-governance decision is made.

The contribution is not SEGA-AI sees a problem and knows the cause. The contribution is SEGA-AI governs the evidence path required before the system is allowed to assign cause, trigger corrective action, refine assumptions, or update lifecycle policy.

This distinction is essential for complex AI silicon because many physical, architectural, and operational mechanisms can produce similar symptoms.

  • A detected hotspot is a symptom
  • A detected latency spike is a symptom
  • A voltage droop event is a symptom
  • An accelerator stall is a symptom

SEGA-AI asks whether the evidence behind that symptom is mature enough, synchronized enough, causally valid enough, and admissible enough to support a decision.

From symptom to evidence through CEMH

Consider a realized AI SoC where telemetry reports a localized hotspot during a high-throughput workload. At level 1, with raw data, the system has only a thermal sensor observation: a localized temperature rise was detected. This observation is useful, but it’s not yet decision-ready evidence.

At level 2, with interoperable data, the temperature reading can move into a diagnostic environment, firmware log, validation database, or fleet-monitoring system. But movement does not create authority. The hotspot may be visible and accessible, but its cause is still unknown.

At level 3, with normalized evidence, the observation is linked to the context required for interpretation:

  • Workload type
  • Timestamp and runtime epoch
  • Firmware policy state
  • NoC traffic counters
  • Accelerator utilization
  • Memory-controller activity
  • Voltage droop measurements
  • Clock and power state
  • Floorplan region
  • Thermal sensor location
  • Package thermal path
  • Board and cooling condition
  • Package lot and assembly history
  • Validation correlation status

Only at this stage can the event begin to be compared across domains.

At level 4, with admissible evidence, the evidence must pass the Trusted Convergence Governance (TCG) gate. The system must confirm provenance, synchronization, realization-state validity, causal relevance, measurement confidence, and chain-of-custody integrity before the hotspot data can influence a convergence decision.

At level 5, with convergence-authoritative evidence, the system has enough qualified evidence to support bounded action or lifecycle refinement. That action may be a firmware policy adjustment, workload throttling, degraded mode, validation update, package constraint refinement, or future design-rule feedback.

  • The hotspot may be related to NoC congestion.
  • It may be related to accelerator placement.
  • It may be related to P&R density.
  • It may be related to package thermal resistance.
  • It may be related to voltage droop and increased local switching.
  • It may be related to firmware scheduling or workload concentration.
  • The purpose of SEGA-AI is to prevent premature conclusions.
  • A thermal sensor does not prove NoC causality.
  • An NoC counter does not prove package causality.
  • A voltage droop event does not prove timing causality.

SEGA-AI requires that the evidence mature through Convergence Evidence Maturity Hierarchy (CEMH) and pass TCG admissibility before any root-cause conclusion or lifecycle-governance action receives authority.

The role of CEMH, TCG, and GFL

Within the SEGA-AI framework, three layers are especially relevant.

Convergence Evidence Maturity Hierarchy (CEMH) defines how information matures from raw observation into convergence-authoritative evidence. A thermal sensor value, NoC counter, voltage monitor, or firmware trace begin as raw or interoperable data. It does not become decision-ready evidence until it has been contextualized, synchronized, qualified, and connected to the correct realization state.

Trusted Convergence Governance (TCG) acts as the trust gate. It asks whether evidence preserves provenance, synchronization validity, realization-state consistency, causal relevance, and bounded authority before it influences a decision.

Governance for Lifecycle (GFL) asks whether the realized system can remain converged throughout operational life. It’s concerned not only with whether the chip worked at initial signoff, but whether chip, package, board, firmware, workload, and field behavior remain aligned over time.

Together, these layers prevent a common failure mode: mistaking observable behavior for proven causality.

Diagnostic evidence plan

This also changes how AI silicon should be planned before implementation. Here, SEGA-AI can contribute by helping define the diagnostic evidence plan.

  • Which NoC counters are needed?
  • Which congestion metrics should be exposed?
  • Which workload tags must be preserved?
  • Which timestamps and synchronization epochs are required?
  • Which voltage, thermal, clock, and power monitors are needed?
  • Which firmware traces must be connected to physical state?
  • Which package and board conditions must be tracked?
  • Which evidence fields are required to distinguish NoC behavior from timing, P&R, PDN, thermal, firmware, or package causes?

This does not mean SEGA-AI designs the NoC. It means SEGA-AI asks what evidence must exist later so that realized-system behavior can be interpreted correctly. That is the bridge between design intent and lifecycle governance.

Why data movement alone isn’t enough

NoC architectures are essential because AI silicon needs scalable internal communication. But moving data correctly inside the chip does not automatically explain system behavior after realization. An NoC may deliver a packet correctly while the system still experiences thermal drift. Likewise, a controller may report a valid transaction while the package creates a local thermal bottleneck.

Next, a firmware trace may show a workload transition while the underlying voltage margin is collapsing. Or a sensor may report a hotspot while the causal chain remains ambiguous. This is why observability must become governed evidence before it can support lifecycle decisions.

The key question is not only: Did the data move? The real question is: Is the observed behavior mature enough as evidence to support diagnosis, intervention, or lifecycle refinement? This distinction becomes especially important in edge AI and ADAS systems.

In an ADAS platform, camera, radar, lidar, IMU, wheel-speed, steering, and vehicle-state data enter through physical interfaces and controllers. Inside the AI SoC, the NoC routes internal traffic among image processors, AI accelerators, CPUs, memory controllers, safety islands, and I/O blocks.

The AI accelerator may detect pedestrians, lanes, vehicles, or collision risk. But if a late response, thermal event, inference delay, or braking-decision uncertainty is observed, the system should not automatically blame the NoC, the AI model, the memory controller, or the package. It must first build an admissible evidence chain.

This matters because ADAS is not only a performance application; it’s a safety-critical realization environment.

A latency spike or inference delay may affect warning time, braking distance, steering support, or driver handoff. In that context, clean data movement is not enough. The system must know whether the evidence supporting the decision is synchronized, causally valid, realization-consistent, and authoritative enough for action.

For low-risk edge AI applications, a wrong output may create inconvenience or cost. For ADAS, a wrong output may affect human safety. That changes the required evidence maturity.

A safety-critical output should not receive full action authority simply because data moved correctly through the chip. It should be supported by level 5 convergence-authoritative evidence or by a pre-qualified safety envelope that has already been validated through admissible evidence.

In SEGA-AI terms, the chain is:

Input evidence → local inference → confidence and uncertainty → synchronization check → causality check → TCG admissibility gate → bounded output authority

This is why edge AI and ADAS show the difference between data movement and evidence governance. The NoC may help move sensor data, model data, and inference results; but SEGA-AI governs whether the observed behavior is trustworthy enough to support diagnosis, intervention, degraded mode, fleet learning, or safety-critical action.

From execution fabric to governance framework

The NoC is an execution fabric; SEGA-AI is a governance framework. The NoC helps the chip move data; SEGA-AI helps the system determine whether observed behavior can be trusted as evidence. And these are complementary roles.

As AI silicon becomes more complex, the industry will need both: data-movement architecture to move information efficiently inside the chip, and evidence-governance architecture to determine whether observed behavior can support root-cause analysis, corrective action, lifecycle refinement, or fleet learning.

This becomes increasingly important as systems move from design into package, board, validation, deployment, runtime adaptation, and field operation. And this discussion is not only theoretical. If realized AI systems require governed evidence, then implementation must account for evidence maturity from the beginning.

That means the design and validation plan must define not only what data moves, but what data must later be observable, timestamped, correlated, and qualified. For example, if post-silicon validation or field operation needs to distinguish NoC congestion from P&R density, package thermal resistance, memory-controller contention, or firmware scheduling, then the required evidence must be designed into the system earlier.

This includes counters, monitors, timestamping, workload tags, synchronization epochs, sensor placement, firmware traceability, package-state linkage, and validation correlation methods. In SEGA-AI terms, the theoretical model becomes practical only when it’s translated into implementation artifacts: evidence fields, admissibility checks, traceability rules, synchronization requirements, gate criteria, diagnostic workflows, and lifecycle feedback paths.

This is why the next step after governance theory is implementation specification. A system cannot govern evidence it never planned to observe.

Silicon governance complementing NoC

AI silicon performance depends heavily on data movement. NoC architectures are essential because they organize internal communication among compute, memory, accelerators, controllers, chiplet interfaces, and I/O. But NoC observability is not the same as causality.

A latency spike, hotspot, voltage droop, or accelerator stall may involve NoC behavior, but it may also be driven by timing, P&R, power delivery, package thermal paths, firmware policy, workload behavior, or system-level conditions.

However, the role of SEGA-AI is not to replace NoC design. The role of SEGA-AI is to govern the evidence required before symptoms become conclusions and before conclusions become decisions.

For AI silicon, the next challenge is therefore not only moving data efficiently. It’s qualifying observed behavior into admissible, causally grounded, convergence-authoritative evidence. In short, interoperability moves data; admissibility qualifies evidence; and governed convergence closes decisions.

Dr. Moh Kolbehdari is senior director of IC/packaging at Socionext US.

Related Content

The post From AI silicon observability to governed evidence appeared first on EDN.

5 takeaways from Samsung Foundry’s design tie-up with Synopsys

Thu, 05/28/2026 - 20:00

A fundamentally new approach is required to fuse AI-driven automation and multiphysics intelligence across the entire design and manufacturing flow. That was the crux of the keynote by Synopsys president and CEO Sassine Ghazi at the SAFE Forum 2026, held by Samsung Foundry in San Jose, California.

Ghazi especially mentioned design and technology co-optimization (DTCO) initiatives for synthesis and layout, as well as sign-off, delivering meaningful power, performance, and area (PPA) enhancements. He also talked about the design partnership between Samsung Foundry and Synopsys, which encompasses production-ready, AI-powered EDA tools, certified interface IP, and silicon-based test capabilities.

Hyung-Ock Kim, VP and head of the Foundry Design Technology Team at Samsung Electronics, echoed similar views, stressing the need for close alignment across design, test, and manufacturing to ensure the success of AI and multi-die designs on advanced nodes.

He also presented an update on Samsung Foundry’s collaboration with Samsung for production-ready, AI-powered digital and analog flows. “Our continued close collaboration with Synopsys delivers silicon-based, customer-validated solutions that help our customers reduce design integration risk, improve silicon predictability, and move confidently from design to production for their most innovative solutions,” Kim said.

Ravi Subramanian, chief product management officer at Synopsys, briefed on AI-powered digital and analog flows for Samsung’s second- and third-generation 2-nm processes. “As designs become more heterogeneous, customers need production-ready, silicon-proven solutions that address complexity and minimize risk from silicon to systems,” he said. “Our work with Samsung Foundry translates years of DTCO and silicon learning into enablement that helps our customers get their advanced designs to market quickly and with confidence.”

The partnership encompasses AI-powered EDA flows, multiphysics sign-off, interface IPs, and silicon-based test patters. Source: Synopsys

Below are the five key tenets of this design partnership between Samsung Foundry and Synopsys.

  1. Production-ready digital and analog flows for 2-nm process

As part of DTCO initiatives, Synopsys Fusion Compiler delivers measurable power and performance improvements in the third-generation 2-nm class process compared to the second-generation 2-nm class process.

  1. Sign-off with certified multiphysics capabilities

Synopsys PrimeShield process sensitivity analysis and PVT Explorer support design-specific optimization and engineering change order (ECO) decisions during sign-off. That leads to frequency improvement of up to 2.7% within 5% leakage current degradation. Moreover, Synopsys Totem-SC, a newly certified electromigration (EM) and IR drop analysis solution, improves silicon design power integrity and reliability in second-generation 2-nm and 4-nm class processes.

  1. 3DIC with hybrid copper bonding

Samsung Foundry and Synopsys have joined hands to enable scalable 3D multi-die designs through certified multiphysics signoff solutions delivered within Synopsys 3DIC Compiler, a unified exploration-to-signoff platform being validated on a hybrid copper bonding (HCB) 3D test chip.

This platform brings together planning, implementation, and multiphysics analysis to enable co-optimization across integrated compute, memory, and advanced packaging systems for Samsung’s 3DIC solutions with HCB technology. And it replaces manual, margin-based approaches with automated, AI-driven system optimization to accelerate productivity and enhance the quality of results (QoR).

  1. Interface and foundation IP portfolio

Synopsys offers a broad portfolio of IPs across Samsung Foundry’s advanced processes, ranging from 14-nm, 8-nm, and 5-nm processes to the latest 4-nm and second-generation 2-nm nodes. The interface IP offerings cover UCIe, PCIe 7.0, 112G/224G, MIPI, LPDDR6, DDR5 MRDIMM Gen2, and USB4. Likewise, its foundation IPs include embedded memories, logic libraries, GPIOs, security IP, and Silicon Lifecycle Management (SLM).

  1. AI-powered tests

Samsung Foundry and Synopsys are also applying silicon-proven methodologies to design-for-test (DFT) and manufacturing test capabilities to reduce test cost and improve test quality for designs on advanced process nodes. Furthermore, physically aware tests and failure diagnosis at the die and multi-die level improve test quality and failure analysis turnaround time with results validated on silicon at Samsung Foundry.

For instance, Samsung Foundry teams employed Synopsys TestMAX along with AI-assisted automatic test pattern generation (ATPG) technologies to reduce test patterns and test cycles by up to 20%. Samsung Foundry customers leveraging these AI-powered, silicon-based design and manufacturing test capabilities acknowledge test efficiency improvements of up to 20%.

Related Content

The post 5 takeaways from Samsung Foundry’s design tie-up with Synopsys appeared first on EDN.

De-commingling (?) LAN equipment: It’s all in what you call it

Thu, 05/28/2026 - 15:00

A welcome career transition (and employer-responsibility expansion) begs for a hardware-plus-software evolution. Hold his beer; this engineer’s got this.

As some of you may have already noticed (assuming you even care about such things), my relationship with EDN recently (and happily) re-deepened. After being a full-time as a (senior, eventually) technical editor from 1997 to 2011, I returned beginning a year later, this time as a content contributor. And now I’ve added associate editor to my EDN repertoire.

“Wait,” you might be asking, “isn’t Aalyia Shaukat the associate editor at EDN?” You’re part-right; for nearly four years, she was. And for a couple of recent months, she (somehow) worked a double shift of jobs. But she’s now the full-time editor-in-chief at Power Electronics News, where she’s already rockin’ the house with her talent abundance. And I’m grateful to follow in her EDN associate editor footsteps, along with continuing my own frequent content-contribution cadence.

What’s this all got to do with “de-commingling (or if you prefer simpler vocabulary, “separating”) LAN equipment”? An excellent question. Now that I’m more intimately interacting with the EDN website and other publication (and publisher, and corporate owner) resources and services, I needed to set up a standalone computer so that nothing attacking my home office LAN could make its way to the corporate network and other facilities, too. That said, I remained heavily broadband-reliant. And I wasn’t up for setting up a completely separate Comcast service connection just for a single (albeit also a singularly important) computer. What to do?

Just call me “guest”

That last part was actually the easiest part to solve, it turns out. My home LAN, as mentioned before, is based on a multi-node mesh implemented using multiple Google Nest Wifi routers, with the primary one connected to the cable modem in the furnace room.

One nifty nuance of the Google Nest Wifi system (shared by not only other Google LAN equipment generations and gear from other suppliers, mind you) is that you can set up a distinct “guest” network that by default (which I’ve left unchanged in my case) is packet-isolated from the main LAN beyond their shared WAN connection.

The computer I’m dedicating to my EDN associate editor work is one you’ve seen before; a Microsoft Surface Pro 7+ (SP7+):

along with my longstanding tech-gear companion, a Kensington Dock:

mated as so:

LAN-migrating the SP7+ was easy-peasy. I disconnected the wired Ethernet cable from the back of the Kensington Dock, switched the computer from my main “RockyMountainBri” wireless network to “RockyMountainBri-guest”, and…that was it. And since my Brother multifunction laser printer was right next to the computer, I didn’t even need to bother migrating the wireless network that the MFC was connected to, foregoing printing support for the rest of my LAN in the process. I just ran a USB cable from the Kensington Dock to it, and…I was done. Perhaps obviously, by the way, any real guests are no longer able to use my “guest” wireless network.

Split personality

How do I handle the fact that, still acting as a contributing editor along with my other contributor colleagues, I’m now in effect submitting content to myself for subsequent publication, now wearing my associate editor hat? My contributing editor workflow is unchanged, actually. The only thing that’s different is the email address I now send my stuff to.

It used to be that I’d submit content from my personal email account to Aalyia’s corporate email address. Now, instead, it’s my corporate email address that the goods go to. I’m still using one of my other systems for initial writing—typically but not always a Mac. But, to maintain “firewall” purity between my newly transformed associate editor work system and the rest, I exclusively receive corporate email (and don’t send or receive personal email) on the SP7+.

Going loc(al, not o)

And what about backing up and archiving all this content I’m now receiving? Regular readers may remember that I’ve long been a fan, along with a frequent implementer and upgrader, of network-attached storage (NAS) for such (and other) purposes. That said, unless I wanted to dedicate a NAS solely to my “guest” network and connect it exclusively over slow Wi-Fi, I was going to need to transition to some other solution.

Therein lies the admittedly and intentionally somewhat obscure title of this piece. Instead of network-attached storage, I wanted something locally tethered. It had to be at least dual drive configuration, with RAID 1 support so I didn’t lose everything if a hard drive died. And ideally it’d run hardware RAID to avoid bogging down the computer. Yes, I know, if the RAID controller fails, you’re dead in the water, too, which is why I also wanted something that was reasonably popular. That way, I could, if necessary, find a replacement to slot the HDDs into without too much trouble.

I figured I’d start my search using the term “DAS”, for direct-attached storage. Interface technologies I’d used in the past—Firewire, Thunderbolt, and eSATA among them—weren’t relevant to this particular hardware configuration, so I settled on USB 3.x, as fast a flavor as possible, over USB-C. My (perhaps imperfect) search yielded exactly one result, QNAP’s TR-002, which ironically is primarily intended to capacity-expand the company’s NASs but can also find use as a standalone storage peripheral.

Tomato, tomahto

At this point, I reset my lingo-options list, expanding beyond “DAS” to also include “enclosure”. That change helped a lot from a results-options list length standpoint. What I’ve ended up with is the Mercury Elite Pro Dual from a company I’ve mentioned multiple times before, Other World Computing (aka, OWC) and bought open-box (with 1-year warranty) for $167.50.

It’s hardware RAID-based, supporting four different operating modes (albeit only one at a time):

  • RAID 0 “Drive Striping”
  • RAID 1 “Drive Mirroring” (the mode I’m using)
  • Span, and
  • Independent Drives

Its interface to the computer is 10 GBps USB 3.2; perhaps obviously, I’m direct-connecting it to the SP7+ versus going through the Kensington Dock intermediary. It also embeds a three-port hub, a particularly attractive proposition given the SP7+’s dearth of integrated connections. And here’s a rarity (as I’ve written about before); the hub’s USB-C and dual USB-A ports are all 10 Gbps peak bandwidth-capable, too.

Why, you might be asking, did I go with HDDs instead of SSDs? I’ll turn around and ask you a question in response to yours: have you priced SSDs lately? That said, HDD price tags are also skyrocketing lately, although they still hold a tangible edge over solid-state alternatives especially at higher capacities. And in my case, I thankfully was able to repurpose a couple of spare 3TB HDDs I’d already bought in the “before times” and still had sitting around unused (I’ll have more to say here in an already-planned upcoming follow-up post).

Software completes the magic trick

The last, but not the least, question: how to integrate it with my computer for mirroring and broader backup purposes? I planned on consistently using the SP7+’s upgraded-by-me 1 TByte SSD as primary storage of in-process and completed associated editor work, so one-way mirroring (versus two-way syncing) that portion of the SSD to external storage would be fine.

But I wanted that mirroring to be file-by-file, not lumped together into some unified-file or otherwise nonstandard format (Apple’s Time Machine, for example) that would make it difficult to resurrect the contents if primary storage in the computer failed, say, or if I needed to physically pass the external storage device to someone else. And, of course, I’m also looking for cheaper solutions, so open source or another free source would be best.

I found my solution in a two-part open-source program suite, developed and maintained by the FreeFileSync project and supporting Linux, MacOS and Windows platforms. FreeFileSync itself does the sync-and-mirror heavy lifting for both files and the folders containing them. And the closely related RealTimeSync monitors directories for content changes, which then kick off FreeFileSync (or any other operation more broadly).

This discussion thread was very helpful when I was setting up RealTimeSync and FreeFileSync on my system. And ever since then, it’s run like a charm; the only time it pauses is when it detects an abnormally large number of changes (multiple directories-and-files moved at once) and wants my OK before it proceeds.

Oh, and by the way…since I’ve got plenty of empty capacity available, at least at this early stage in my associate editor career, I’m also using the OWC Mercury Elite Pro Dual more broadly as a successor to the NAS for my ongoing computer-wide backup purposes using Windows’ built-in File History and (deprecated but still functional) Backup and Restore facilities that I’ve mentioned before. With that, I’ll wrap up for today. I hope what I’ve shared will be of help to at least some of you in similar configuration situations either now or in the future. As always, please share your thoughts on what’s worked (or not) for you in the comments!

Brian Dipert is the associate editor, as well as a contributing editor, at EDN.

Related Content

The post De-commingling (?) LAN equipment: It’s all in what you call it appeared first on EDN.

Taming the beast: Memory efficiency in an AI/crypto world

Thu, 05/28/2026 - 10:23

The planet is facing a crisis in energy demand versus supply, and data centers are at the center of this dilemma due to the increasing demand from new data-intensive applications. This article will explore the causes of data center inefficiency and speculate on methods to improve efficiency. It will also acknowledge the U.S. Department of Energy’s analysis on energy efficiency, which provides a basis for this work.

Energy demand and where it’s being used

The announcement that Three Mile Island nuclear reactor was being recommissioned to power an AI data center might have been shocking news to some, but it’s no secret in the industry that the exploding demand for energy is outpacing our ability to deliver power to data centers. For the first time, power efficiency is now a higher priority to data center architects than performance of the individual components.

Semiconductor Research Corp. modeled this increase in energy demand in the context of the planet’s projected energy generation capacity, which includes the assumption that more nuclear power plants will be deployed. Figure 1 shows a daunting projection, and the potential for the lines of supply and demand to intersect around the year 2055 has the electronics industry rethinking its choices in how data centers can be designed.

Figure 1 The worldwide energy consumption trends show that we will eventually consume more energy than we produce. Source: Stanford University

Sadasivan Shankar at Stanford University broke down the places where we are spending that energy. In addition to AI, another culprit in energy demand is cryptocurrency. When combined, AI and crypto are consuming over 1.5% of the planet’s energy already. Some projections estimate that their data consumption will increase to 3% by 2030 and 4.4% by 2035 (see Figure 2). Note the scaling for the Y-axis in Figure 2: Applications such as cryptocoin mining require 18 orders of magnitude more energy than the base instructions on which the computers operate.

Figure 2 The energy demands for AI and cryptocurrency are a magnitude greater than that of other operations. Source: The U.S. Department of Energy

With this in mind, it makes sense to determine the efficiency of a data center by measuring the work accomplished for each watt that is spent. Figure 3 breaks down the power consumption per operation. It’s critical to note that almost every operation in the top two-thirds of the table refers to moving data around, while the bottom third of the table represents data processing.

Figure 3 Data centers consume different amounts of power for different functions. Source: Wolley Inc.

The memory, storage, and communications hierarchy is commonly shown as a pyramid, with processor registers at the top, various levels of cache followed by DRAM, then storage and communications at the bottom. This article will use this simplistic model, as shown later in Figure 5. The pyramid’s biggest issue is that it does not highlight how each resource is on a separate bus. In addition, moving information from one resource to another typically involves multiple movements on many buses, each of which consumes power and generates heat.

Figure 4 shows an example in which an application is read from the disk though the CPU across one channel—for instance, a PCIe—to be written to the memory over another channel (for example, a DDR), only to be read back to the CPU one cache line at a time to execute the application and store the temporary results back to the memory.

Figure 4 Here is how data movement demands high power. Source: IEEE

The application may read content across a communications channel, such as PCIe to a wide area network, then crunch that data to be written back to the disk. Even in this simple example, it’s obvious that data processing is an exceptionally minor outcome and that data movement is dominant. The percentage of data operated upon rather than moved around is close to zero as to be unmeasurable.

Why focus on memory?

Memory utilization is a focus area because there is a high potential to make substantial improvements in energy efficiency. Memory consumes as much power as many CPUs, at about 22% of server power. The increasing number of tiers of memory creates both the best and worst of trends.

The good news is that more power-efficient memories are being added closer to the processor. The bad news is that these near-memory tiers have limited capacity and require additional larger capacity, higher power memories to keep filling the datasets into the local memory. The power consumption of each tier adds to the total power footprint.

High bandwidth memory (HBM), for example, offers an interface around 1.5 pJ/bit, which compares favorably to a double data rate memory module at 15pJ/bit (see Figure 5). Unfortunately, these memories still burn significant power—for instance, 75 W or 100 W per HBM stack—and they are co-located with the high-power processor on the same substrate. This makes cooling extremely challenging compared to DDR modules, which are around 15 W each but located farther from the processor in areas that may be air-cooled.

Figure 5 Memory and accompanying storage consume considerable amounts of energy. Source: Monolithic Power Systems

Efficiency by tier

Speculation can improve system performance tremendously, but speculation always implies waste as well—even processor registers have implied waste. A system variable with a 32-bit integer that never assumes a value outside the range 1 to 10 has an implied waste factor of 87.5%. Processor caches have very high hit rates of 95% and higher, so one could invert that number to imply a 5% waste. DRAM access efficiency drops the further the memory is from the processor, with direct attached DDR memory at 27% waste and CXL-attached DDR at over 40% waste.

These numbers may not sound bad until one considers the activity inside each DRAM that allows cache line hit rates. The majority of processors operate with a 64-byte cache line. Consider how 64 bytes map to the internal structure of a DRAM. Each DRAM has an internal page buffer of 1 kB, and DRAMs are typically combined into ranks for 10 DRAMs energized per access (see Figure 6).

Figure 6 DRAMs are typically combined into ranks for 10 DRAMs energized per access. Source: Monolithic Power Systems

To fulfill a single cache line, a DRAM module is “activated” to read 1 kB from each DRAM into its sense amplifiers, or 10 kB across the width of the module. 64 bytes are read and sent to the processor. DRAM activation is destructive—the cells of the memory core are wiped out by the activation—so the cells must be rewritten from the sense amplifiers back into the core. The math for a single random access is 20 kB moved for 64 bytes of work, or 99.7% waste.

This factor of 0.3% efficiency is only against that movement of a 64-byte cache line. If that DRAM tier is operating at a 60% hit rate, efficiency drops to 0.18%. If only 1 byte from that cache line was actually needed, the waste factor increases to 99.98%. As you can see in this simple example, data center efficiency is rapidly approaching zero.

Another form of speculation that improves system performance is execution and access speculation, where a processor may pre-load code on both sides of a branch condition in case the branch is taken. Many SSDs do the same, pre-loading pages that may be accessed. These forms of speculation have 100% waste if the branch is not taken or the access is never made.

Total cost of ownership (TCO)

With electricity access becoming a bottleneck for data center expansion, architects are finally acknowledging that total cost of ownership (TCO) is a primary factor driving system design. While processor vendors focus strictly on performance, their customers are forced to determine whether they can power these machines and cool them. By some estimates, cooling a data center is currently consuming 43% of the cost of operating a data center, which is equivalent to the 43% required to run the machines themselves.

This expenditure is driving architects to measure efficiency not only as petaFLOPS/second but also petaFLOPS/watt-hour.

Improving memory energy efficiency

Improving the accuracy of speculative accesses is an obvious key to taming memory subsystem power consumption. Similar to telling a doctor “It hurts when I do this,” system architects should ask the question, “Is this speculative access successful often enough to pay for the energy consumed?”

For example, if a CXL memory module is in a memory pool and shared by multiple processors, what is the hit rate on any particular bank of DRAM? Should a page be left open, delaying precharge in case of another hit on that row of memory or be closed, issuing the precharge immediately under the assumption it will not be accessed?

Non-uniform memory access (NUMA) has been in server architectures for years to allow tightly coupled processors to share memory resources as demand shifts. However, multiple hops for each memory access can more than triple the power consumed, whereas moving the task to a processor closer to the memory resource can significantly reduce power (see Figure 7). Computational storage is a variation of task relocation that has had some success, though this success is limited by standards for the tasks executed on the devices.

Figure 7 For a server DRAM module, moving the task to a processor closer to the memory resource can significantly reduce power. Source: Monolithic Power Systems

Similarly, placing data in the appropriate tier of memory can have a significant impact on energy consumption. Figure 8 shows the temperature of the data, where hot data is accessed often, and cold data is accessed less often.

Figure 8 Map data based on how often it’s accessed to determine its temperature (where “hotter” data is accessed more often). Source: Monolithic Power Systems

Persistent memory is a system option that can be exploited for data reliability. Persistent memory is either based on a memory technology that does not lose its contents if the power fails (for example, MRAM) or uses an energy source to maintain data integrity by saving DRAM contents in a non-volatile memory (NVM), such as a flash-on power failure. Persistent memory can also be thought of as a significant way to reduce system power by eliminating the need for “checkpointing,” or saving intermediate results (see Figure 9). In many systems, checkpointing is responsible for 7% to 8% of the system traffic and therefore power.

Figure 9 Persistent memory can reduce checkpointing. Source: Monolithic Power Systems

Hybrid memory modules that combine storage and direct access memory on the same module are available to minimize system traffic as well. For example, flash memory mounted as an SSD can be coupled with DRAM, which is directly accessed by a cache line at a time. The efficiency of hybrid modules comes from the statistic of the typical 4-kB block moved from SSD to system memory; only 100 bytes on average are used, which results in an efficiency of only 2.5%.

Software has a huge impact on efficiency

Hardware cannot fix every challenge; software plays a significant role in taming this beast, too. Zooming in on the power consumed by data type, orders of magnitude more power are used for complex and large data types such as floating point, whereas integer math consumes far less power (see Figure 10). This may be as simple as programmers considering the range of values needed by variables in their software. For example, “for (i=0; i<10; i++)” does not need for i to use a 32-bit counter value.

Figure 10 Software plays a significant role in energy consumption. Source: The U.S. Department of Energy

The choice of variable types is sometimes the result of using the wrong programming language for the task (see Figure 11). Not all programming languages allow much flexibility in choosing the data types for variables, and these impacts are magnified tremendously by the matrix math employed by languages such as Python, a common tool for AI applications. Python has another energy-consuming characteristic: the programmer source is compiled to bytecode and then interpreted by a virtual machine as opposed to C programming, which compiles to processor native codes.

Figure 11 Programming languages can be ranked based on their energy consumption. Source: Wireunwired Research

You can’t fix what you can’t measure

Measuring runtime power is a key to tuning efficiency. The voltage regulators for memory modules—such as the MPQ8894, MPQ8895, and MPQ8896—are power management integrated circuits (PMICs) with an integrated system management interface to I2C, I3C, or SidebandBus. This system management interface allows the host system to interrogate the PMIC while the system is running. The current used by each voltage rail can be read from the PMIC to calculate the total power for the memory module while running test and measurement programs, or even while customer applications are running.

Triggers may be configured into the PMICs, and these devices can keep logs of any conditions that exceed the expected maximums. The host system may respond to the triggers by reading the telemetry registers and then acting on those conditions, such as by throttling applications that exceed system-imposed limits.

Choosing the right PMIC is a power-saving measure. With improved 4% power regulation efficiency when compared to competing solutions, this results in a total data center power reduction of 2%. For a typical 300 megawatt-hour installation, this would reduce power by 6 MWh and CO2 emissions by roughly 4 metric tons per year.

The power balancing act

Data centers are projected to keep increasing power demands until they become physically or financially impossible to expand. So, the total cost of ownership has become a focus for all datacenter architects as they balance the needs for performance from their customers with the reality of providing those services in a cost-effective manner.

Data center efficiency, as measured by the data processed vs. data moved around, is embarrassingly low. However, there are several ways to adjust efficiency, from cache management parameters to speculation priorities. Resource and job allocation over fabrics such as NUMA and CXL enable new classes of optimization.

The careful selection of energy efficient components such as voltage regulators can play a significant role in reducing the energy use of a data center. Every percentage of efficiency improvement leads to major reductions in CO2 emissions, a leading cause of pollution. Voltage regulators, for instance, take a holistic view of the system solution, providing high efficiency coupled with methods for measuring and fine tuning the solution to achieve optimal power savings.

Software plays a huge role in efficiency as well, from the low-level allocation of data types to the choice of programming languages for each task. In addition, measuring system efficiency at runtime helps data center operators monitor the health of the system and give insight into ways to improve or limit power as needed. Next, telemetry information helps system software to understand where energy is being used.

Most importantly, TCO analysis requires a change in mindset from operations per second to operations per watt-hour, a major shift forced on the industry by skyrocketing power demand. The use of high efficiency voltage regulators helps reduce data center energy usage, which lowers the cost of providing data services.

Bill Gervasi is principal memory solutions architect at Monolithic Power Systems.

Related Content

The post Taming the beast: Memory efficiency in an AI/crypto world appeared first on EDN.

Pages