EDN Network

Subscribe to EDN Network feed EDN Network
Voice of the Engineer
Updated: 6 min 27 sec ago

Surface mount and microwaves

Fri, 06/12/2026 - 15:00

Upside-down mounting can deliver inductance upsides for surface mount passives and other components.

Please visualize the structure of a surface mount resistor as shown in the following Figure 1:


Figure 1 Surface mount resistor constituents include this writeup’s showcase electrical contacts.

Normally this part would be installed on a circuit board with the outer coating visible for inspection and with the substrate adjacent to the circuit board’s surface. However, if the circuit board’s goodies are operating at microwave frequencies, this might not be the best idea.

There is an alternative, however, as shown in Figure 2:


Figure 2 A surface mount resistor in its normal-service mounting orientation (a) may be re-positioned for optimal microwave-service operation (b).

If the surface mount resistor is installed on the circuit board “upside down”, the inductances presented by the electrical contacts will be much reduced versus that of the usual mounting. At microwave frequencies this can be significant, especially if the resistance is 50Ω in a matched impedance application.

John Dunn is an electronics consultant and a graduate of The Polytechnic Institute of Brooklyn (BSEE) and of New York University (MSEE).

Related Content

The post Surface mount and microwaves appeared first on EDN.

Bring-up and testing of systems with CXL Type 3 memory expanders

Fri, 06/12/2026 - 10:32

This series of articles is written for system bring-up engineers, post-silicon validation engineers, platform firmware developers, kernel and driver integrators, and test architects who are—or will soon be—working with Compute Express Link (CXL) Type 3 memory expanders in real hardware. If your job involves taking a server from first power-on to production-ready memory expansion, reconciling what firmware advertises with what the operating system actually consumes, or explaining why a workload is “slow on CXL” when link training looks clean, this material is aimed at you.

This mini-series assumes you already understand PCIe fundamentals and have a working mental model of CXL device types and topologies. It does not re-teach CXL from first principles; instead, it focuses on the practical cross-layer problems that dominate bring-up and validation; discovery versus usability, non-uniform memory access (NUMA) placement versus link health, and policy configuration versus silicon defects.

How will this mini-series help

CXL Type 3 memory is deceptively familiar. From software’s perspective, it looks like RAM; from a validation perspective, it behaves like a small distributed system spanning expander ASIC firmware, host BIOS, ACPI tables, kernel drivers, and user-space tooling. Failures at one layer often masquerade as symptoms at another—a missing NUMA node that is really an HDM validity problem, or a “slow” benchmark that is really default allocator placement on far memory.

This mini-series gives you a structured playbook to:

  • Set performance and correctness expectations using the latency–capacity pyramid and NUMA topology, so you know when a workload should tolerate CXL-attached memory and when it will not.
  • Verify platform prerequisites across CPU, BIOS, kernel, and device firmware before spending days on the wrong debug path.
  • Use the standard Linux tooling chain—cxl, ndctl, daxctl, numactl, lspci—to distinguish “device not seen,” “device seen but not consumable,” and “device online but misconfigured”.
  • Walk the boot timeline from slot power and DRAM training through DVSEC discovery, decode programming, CDAT delivery, and driver bind, with a validation mindset at each gate.
  • Interpret transport-layer and CXL-specific configuration-space indicators, run targeted memory traffic, and separate link issues from NUMA policy and memory-mode configuration faults.

The goal is to reduce time spent debugging the wrong layer and to give you checklists and command-level examples you can adapt into lab gates, CI smoke tests, and field triage runbooks.

What each part covers

Part 1: Why CXL Type 3 memory matters, and what your platform must provide

Part 1 establishes the system context. It explains why AI and data-intensive workloads are driving interest in memory expanders, how CXL Type 3 devices differ from local DIMMs even when they appear as ordinary RAM, and where expander memory sits in the latency–capacity pyramid relative to socket-local DRAM and storage.

It then walks through platform prerequisites—CPU enablement, BIOS/firmware, kernel support, device firmware, and RAS—and explains why features such as CXL IDE or tiered memory only work when every layer is aligned. The part closes with the NUMA story on Linux: how cxl_pci binds Type 3 endpoints, why expander memory often appears as a separate or “far” NUMA node, and why many CXL issues show up as placement and bandwidth imbalance rather than hard functional failures.

Part 2: Tooling and boot path from power-on to usable memory

Part 2 is the operational core. It introduces the user-space utilities that make CXL state visible beyond dmesg—cxl/libcxl for fabric topology, ndctl and daxctl for region and DAX/system-RAM modes, numactl for placement experiments, and lspci/hwloc for bus- and topology-level sanity checks.

It then traces the end-to-end boot sequence: power and clocks, on-device DRAM training and SPD discovery, gating of host-managed device memory (HDM) until mem_info_valid is asserted, PCIe/CXL link up and DVSEC-based discovery, decode programming and mem_enable, CDAT transport over DOE and mailbox health, ACPI handoff via CEDT/SRAT/HMAT, and final OS driver binding. Each stage is framed as an implied test with characteristic failure signatures, so you can map symptoms to the most likely layer quickly.

Part 3: Test, debug, and validation of CXL memory expanders

Part 3 turns theory into hands-on practice. It covers integration modes—system RAM versus device DAX—and when boot parameters such as efi=nosoftreserve or daxctl reconfigure-device apply.

It shows how to confirm expander memory as a distinct NUMA node with numactl, decode key lspci fields (link width/speed, CXL DVSEC capabilities, HDM range Valid/Active bits, cxl_pci binding), and drive traffic with numactl placement plus tools such as Intel MLC, stressapptest, and memtester. The series concludes with a cross-layer validation mindset, suggested future work for multi-device and pooled topologies, and references for deeper reading.

Read all three parts if you are new to CXL Type 3 bring-up; jump to Part 2 or Part 3 if you already have a booting system and need tooling or debug guidance.

Ameet Sanghavi works in post-silicon validation for PCIe and CXL at Nvidia with a focus on interface bring-up and validation on shipping products. He has worked on PCIe since 2005 (from PCIe 1.1 onward) and on CXL since 2020 (from CXL 1.1 onward).

Related Content

The post Bring-up and testing of systems with CXL Type 3 memory expanders appeared first on EDN.

Memory card interfaces keep pace with the internal bus evolution race: Part 2

Thu, 06/11/2026 - 15:00

Learning from and adapting the lessons of the past is wise, as long as it’s not taken to overly constraining excess. So, too, is adopting others’ ideas (in a non-patent-infringing way, of course).

As you already know if you read last week’s blog post (and if not, please do so first before continuing with today’s…I’ll be right here, waiting for your return…), I initially planned on covering this topic in a single writeup. It ended up, however, being at least twice as long as I’d originally envisioned, so I basically chopped it in two. Part 1 covered the historical precedents that led to the ongoing memory card innovations of more modern times, which I’ll discuss this time.

Interface evolutions

I’m spending all this time on past-history factoids and trends because, as you’ll soon see, they conceptually continue(d) to repeat themselves multiple times over with the passage of time. To that point, one other historical example, involving performance, also bears mention. PCMCIA, introduced in 1990, tackled a mid-life enhancement five years later, from the 16-bit ISA bus-derived PC Card to the PCI bus-based and 32-bit, but still backwards-compatible, CardBus.

A more radical transformation, ExpressCard (originally called NEWCARD), followed roughly a decade after that. Based on the combination of PCI Express and USB 2.0, it was not directly backwards compatible with CardBus, far from with PC Card, thereby either forcing systems adopters to include slots for both standards in designs or forcing users to use clumsy adapters:

More generally, as my attempted blending of two Wikipedia entry excerpts notes:

Despite being much faster in speed/bandwidth, ExpressCard was not as popular as PC Card, due in part to the ubiquity of USB ports on modern computers. When the PC Card was introduced, the only other way to connect peripherals to a laptop computer was via RS-232 and parallel ports of limited performance, so it was widely adopted for many peripherals. More recently, virtually all equipment has Hi-Speed USB ports, and most types of peripherals which formerly used a PC Card connection are available for USB (and have the advantage of being compatible with desktop computers as well as portable devices) or are built-in, making the ExpressCard less necessary than the PC Card was in its day.

Wash, rinse, repeat

Let’s now fast-forward to more modern times. CFast, short for CompactFast, which I mentioned in both of my 2023 writeups (in the context of their use by my Blackmagic cameras), is based on CompactFlash (and is also managed by the CFA) but migrates from ATA to SATA. CFast 1.x dates from 2009 and is based on SATA 2.0; the backwards-compatible CFast 2.0 upgrades to SATA 3.0 but has seen limited-at-best industry uptake since being initially unveiled in 2012.

Why? Enter, for example, the alternative CFexpress, also managed by the CFA, which switches from SATA to the solid-state media-optimized NVM Express (i.e., NVMe) as its command set and to PCI Express (PCIe) as its hardware interface foundation (as I’d mentioned at the end of 2023), as well as coming in multiple dimensional options. The smaller Type A (at left in the following image) and larger Type B (right) card variants are today commonplace in the industry, with the even larger Type C conversely not yet in production to the best of my knowledge:

In this context, an overview of the earlier XQD standard also bears mention. XQD, once again now managed by the CFA (albeit initially announced solely by Sandisk, Sony and Nikon), dates from 2010. It’s dimensionally and connector-compatible with CFexpress Type B and is also based on PCIe, albeit only in a single-lane implementation (with PCIe 3.0 support added with XQD 2.0 in mid-2012). The XQD and CFExpress standards are therefore cross-compatible, although only to a degree, generally requiring firmware updates which not all camera, memory card reader and other system manufacturers have provided.

CFexpress 1.0, announced by the CFA in September 2016 as the successor to XQD, launched with support for PCIe 3.0, albeit this time in higher-bandwidth dual-lane form (for the size option now known as Type B and used by my high-end Canon and Panasonic cameras, among others). CFExpress 2.0, following in February 2019, added the single-lane PCIe Type A and quad-lane Type C options, along with upgrading the NVMe command set from 1.2 to 1.3. And the latest iteration, August 2023’s CFexpress 4.0, upgrades the supported PCIe interface to 4.0 (again, at up to four lanes with Type C), and the NVMe command set to 1.4. CFExpress 4.0-optimized systems are not yet in the market, to the best of my knowledge, but cards (such as this OWC Atlas Pro) are prevalent and backwards-compatible with existing cameras and such:

No, I don’t know what happened to CFexpress version 3.0, either. While buying a CFexpress 4.0 card now will leave potential performance “on the table” with CFexpress 2.0-only systems, it does provide obsolescence protection for subsequent camera-or-other upgrades you might make in the future. And conversely, if future-proofing isn’t a concern, you’ll be able to (as I’ve personally done) get some great deals on CFexpress 2.0 memory cards right now, despite overall semiconductor memory supply constraints, as manufactures strive to “fire sale” deplete their inventories of legacy product variants.

Don’t count out Donkey Kong

And what about the SD and related microSD card standards; are they in danger of falling by the wayside as these high-performance newcomers ramp into the market? Not if the SD Association has anything to say about it, specifically with next-generation “Express” offerings. See if you notice anything familiar trend-wise in the paragraphs that follow:

When the SD Association (SDA) first announced SD Express in June 2018, it set the bar high and opened a world of possibilities for manufacturers to integrate supercharged removable storage into their designs. SD Express is capable of delivering SSD performance levels of up to 4GB/sec. This makes it perfect for use in high-performance electronic devices and products. With the introduction of advanced security features in May 2022 found in the SD specification version 9, performance and versatility merge to create an innovative, and advanced powerhouse solution for SD memory cards.

SD Express leverages the PCI Express and NVMe interfaces and uses the well-known SD memory card form factor for compatibility with existing SD slot architectures. The SDA also introduced a microSD Express memory card format that is backward compatible with devices. SD Express is not just about SD memory cards getting faster, it is also about SD memory cards doing more.

After languishing for several years awaiting market demand that stubbornly refused to emerge, “Express” variants’ fortunes are finally looking up. Specifically, the microSD Express card is used in the Nintendo Switch 2 game console, notably (and singlehandedly) increasing the likelihood of a high-volume long-term future for the standard.

Blazing a trail

I’ll wrap up this writeup with coverage of a recently emergent sole-source memory card option (in spite of my earlier comment that I planned to avoid diving into past-history proprietary offerings) that I’d earlier caught mention of at The Verge and elsewhere. It’s Biwin’s Mini SSD:

Biwin is, if you hadn’t already guessed from the coin at left in this “stock” image, a China-based memory subsystem manufacturer (to the right of the 1-yuan coin is the rare U.S. $1 coin). Most of the products on the company’s website are industry standards-based: PCIe NVMe internal SSDs, for example, along with USB flash sticks and drives, DRAM DIMMs and SoDIMMs, SD/microSD and CFexpress memory cards (an image of which you saw earlier), and memory card readers. But with the Mini SSD, the company has apparently decided to try its hand at also going proprietary.

Interestingly, the Mini SSD is slightly larger (at 15x17x1.4 mm) than the microSD Express (15x11x1 mm) counterpart. And at least from a latest-generation ratified-spec standpoint, it’s seemingly no faster than microSD Express, either; both are based on dual-lane PCIe 4.0 and NVMe (once again: sound familiar?). The key differentiator that Biwin seems to be betting on is timing; as Ars Technica notes, currently available microSD Express cards “top out around 900MB per second, roughly the amount of bandwidth available from a single PCI Express 3.0 lane.”

Conversely, Biwin was demonstrating functional products at CES in January, claiming read speeds up to 3,700 MB/s and write speeds up to 3,400 MB/s (at least in combination with the company’s own card reader peripheral), and with capacities ranging from 512 GB to 2 TB. Biwin also touts Mini SSD’s IP68-rated dust- and water-proof chops. One note: while the company was referring to them as the “BL100” series late last summer, it’s now calling them “CL100”. Why? 🤷‍♂️

Will Biwin be able to gain a defendable beachhead (and then expand its addressable customer “footprint”) before SD Association members release similar-performance microSD Express products into the market? Let me know your thoughts on that question, or anything else I’ve discussed in this series, in the comments!

Brian Dipert is the associate editor, as well as a contributing editor, at EDN.

Related Content

The post Memory card interfaces keep pace with the internal bus evolution race: Part 2 appeared first on EDN.

Carbon nanotube coating creates on-chip terahertz waveguides

Thu, 06/11/2026 - 11:25

There’s considerable interest in leveraging the bandwidth and other potential virtues of terahertz waves that occupy the spectrum between the conventional RF and optical worlds, generally considered to span 100 GHz (3 mm wavelength) to 10 THz (30 μm). However, managing electromagnetic energy at these wavelengths presents many challenges, as they are too short for most electronics, yet too long for all-optical components.

Nonetheless, there’s a significant amount of ongoing research in developing the materials and components needed, especially with many potential applications, including the emerging 6G standards being developed now.

At these frequencies and corresponding wavelengths, signal energy must be conveyed via waveguides—discrete wires won’t do, of course. But making the needed waveguide physical transitions is difficult when they are fabricated in silicon as part of a larger set of on-chip functions.

Addressing this issue, a team of researchers at The Skolkovo Institute of Science and Technology—or Skoltech, a private institute in Moscow—working with a team from KTH Royal Institute of Technology in Sweden, has developed a key technology that could support silicon-based terahertz waveguides and their on-chip transitions.

Their solution is based on carbon nanotubes, one of those amazing materials that keeps offering solutions to diverse problems. The single-wall carbon nanotube (SWCNT) was discovered in 1991 (see “A Brief Introduction of Carbon Nanotubes: History, Synthesis, and Properties“). Like fullerene and graphene, SWCNTs are one of the allotropes of carbon.

Allotropes present a different structural form of the same chemical element within the same physical state; because their atoms are bonded differently, allotropes have vastly different physical and chemical properties from each other—think diamond versus graphite.

A key challenge in building these complex terahertz arrangements is devising properly matched terminations. Without proper termination, reflections at device discontinuities can cascade, thus degrading performance and altering the intended operational profile. In addition, these terminations are necessary for characterization of multi-port devices such as directional couplers, where the unused ports must be terminated with matched loads.

The conventional solution is to use adiabatic or impedance-matched tapering of the waveguide cross-section to free space, gradually expanding the guided mode to induce radiation losses while operating as a dielectric rod antenna. However, the efficiency of these structures depends on the length of the tapering, therefore consuming valuable chip area; it can also radiate power in undesirable directions, thus complicating packaging, limiting integration density, and creating electromagnetic pollution.

Note that in the adiabatic-coupling approach, the optical mode is coupled from one waveguide to another by a slow change of a waveguide parameter (width, thickness, or both) such that the optical mode remains in the fundamental mode and does not couple to unwanted higher-order modes. As a result, the tapered waveguides need to be long enough to meet the requirements of the adiabatic conditions of slow change of waveguide parameter. However, at the same time, they need to meet the device compactness requirement. Therefore, there is a trade-off to be made

The research team devised and tested a carbon nanotube-based coating that blocks electromagnetic radiation, thereby creating waveguides compatible with terahertz wavelengths. The ultrathin single-walled carbon nanotube films that they synthesized are similar to those that they used previously to create small-scale components, such as lenses and antennas, but with a big difference, as this time it’s not for standalone components. Instead, they leveraged carbon-based material to control electromagnetic radiation in 2D-integrated optical circuits, eliminate interference, and enable additional functionality.

They demonstrated a compact, broadband termination by coating silicon dielectric rod waveguides (DRW) with ultrathin single-walled carbon nanotube films. Fabricated via a floating-catalyst (aerosol) chemical vapor-deposition process, the film thickness varies from 2 to 53 nm and was characterized in the 140-220 GHz range. A 53-nm thick film introduced up to 47 dB of attenuation while maintaining over 20 dB reflection loss, confirming nearly reflection-free absorption (Figure 1).

Figure 1 Reflection measurements of the SWCNT-loaded DRWs show ∣S11∣ for the 6-mm long samples (a) and ∣S11∣ for the 12-mm long samples (b). The light grey line is baseline reflection after calibration by measuring a thru-standard (flanges of the frequency extenders connected); dark grey is the reflection coefficient of an unloaded DRW. Source: Nature Communications

Shielding analysis shows absorption dominates over reflection, and they achieved a record specific shielding efficiency of 5.5 × 109 dB cm2/g (Figure 2).

Figure 2 Shielding efficiency components for the SWCNT-coated dielectric waveguides: reflection component SER (a, b), absorption component SEA. (c, d), and total shielding SET (e, f) for 6-mm (left column) and 12-mm (right column) samples over 140-220 GHz, with light grey as the equivalent shielding efficiency of an unloaded silicon waveguide provided for reference. Source: Nature Communications

This approach offers a footprint-efficient solution for high-density terahertz circuits without bulky, radiative terminations. The work is presented in their paper “Ultrathin Single-Walled Carbon Nanotube Surface Wave Absorbers for Terahertz Dielectric Waveguides” published in Nature Communications. It’s unfortunate that the paper does not have any microphotographs of the SWCNT waveguide and transitions in silicon, so you’ll just have to visualize those yourself.

Have you had any interaction with or uses for carbon nanotubes? If so, in what way? Do you see a role for them in any of your projects, whether terahertz or other?

Bill Schweber is a degreed senior EE who has written three textbooks, hundreds of technical articles, opinion columns, and product features. Prior to becoming an author and editor, he spent his entire hands-on career on the analog side by working on power supplies, sensors, signal conditioning, and wired and wireless communication links. His work experience includes many years at Analog Devices in applications and marketing.

Related Content

The post Carbon nanotube coating creates on-chip terahertz waveguides appeared first on EDN.

SoC FPGA advances wideband RF processing

Wed, 06/10/2026 - 23:28

Altera is now sampling its Agilex 9 Direct RF AGRW039 wideband SoC FPGA for aerospace, defense, and communication systems. According to Altera, the device delivers a 40% increase in compute capability per square millimeter. It also provides 45% greater logic and DSP density than the previous generation and supports DDR5 and LPDDR5 memory technologies.

With integrated 64-Gsample/s wideband RF and increased compute and memory resources, the programmable device eliminates the need for multichip designs and enables advanced beamforming, radar, and data cube processing. The AGRW039 provides high-bandwidth signal capture and generation, allowing customers to scale performance while maintaining design flexibility.

Agilex 9 Direct RF SoC FPGAs combine high-speed data converters, programmable logic, and processing elements in a single package. The integrated architecture helps reduce system complexity and power consumption for wideband RF applications that require real-time performance.

Production silicon and development kits for the Agilex 9 Direct RF AGRW039 are expected to be available in Q3 2026.

Agilex 9 Direct RF series

Altera

The post SoC FPGA advances wideband RF processing appeared first on EDN.

TO-247 SiC package boosts high-voltage isolation

Wed, 06/10/2026 - 23:28

Navitas has developed a TO-247 package offering more than 6000 V of isolation for its 1200-V, 2300-V, and 3300-V SiC MOSFETs. Designated the UHV-TO-247-4-ISO, the through-hole package supports direct-cooled thermal management through a reflow-compatible isolated thermal pad. It also provides over 12 mm of pin-to-pin creepage, enabling module-level performance in a compact discrete form factor.

Compared to standard non-isolated through-hole packages, the UHV-TO-247-4-ISO reduces the need for external high-voltage isolation while improving thermal and EMI performance. These benefits extend to high-voltage grid-tied power conversion systems, solid-state transformers, battery energy storage systems, and renewable energy applications.

The UHV-TO-247-4-ISO delivers integrated high-voltage isolation using an AlN substrate, reducing die-to-heatsink capacitance and helping lower common-mode noise and radiated EMI. Its reflow-compatible, direct-cooled thermal interface enables direct mounting to liquid- or air-cooled heatsinks, improving thermal performance while eliminating the need for external TIM and isolation materials. The package also enhances thermal cycling and power cycling lifetime through its AlN/AMB construction and robust heatsink interface.

To request samples or additional product information, please contact a Navitas sales representative or email info@navitassemi.com.

Navitas Semiconductor 

The post TO-247 SiC package boosts high-voltage isolation appeared first on EDN.

Vertical power platform cuts AI thermal bottlenecks

Wed, 06/10/2026 - 21:33

Lotus Microsystems’ vStrata vertical power delivery platform targets the electrical, thermal, and mechanical challenges of AI infrastructure. The first module in the vStrata Power Series, the LS0580, is a fully integrated power-system-in-package (PSiP) that places power conversion closer to the load to reduce distribution losses and board complexity. The device has completed tape-out for leading CPU, GPU, and AI accelerator platforms, with engineering samples shipping in Q3 2026.

Built on a silicon-based substrate, vStrata combines power delivery, thermal management, and packaging in a single architecture. Designed for kiloampere-class AI workloads, the platform delivers up to 96% point-of-load efficiency while reducing power losses and thermal constraints. Its low-profile vertical architecture is enabled by silicon PIT technology, supporting ultra-thin designs below 1 mm by placing power directly beneath the processor to shorten electrical paths and improve transient response.

The vStrata platform is compatible with existing power management controllers and reference designs. Lotus is currently evaluating the platform with hyperscale customers and additional partners through an early access program.

vStrata product page

Lotus Microsystems 

The post Vertical power platform cuts AI thermal bottlenecks appeared first on EDN.

Hall switch streamlines automotive position sensing

Wed, 06/10/2026 - 21:32

The Melexis MLX92344 is a 2-wire, 2-bit Hall-effect switch for contactless detection of up to four positions in automotive body electronics. Unlike conventional microswitch-based approaches that often require multiple mechanical switches to detect intermediate positions, the MLX92344 simplifies system design by providing programmable current levels and magnetic thresholds. It is suited for applications such as seat track positioning, soft-closing doors, and multilevel trunk locks.

A dual programmable architecture lets designers assign output current levels directly to the device’s magnetic operating and release thresholds, with temperature compensation for both neodymium and ferrite magnets. Up to four different current levels can be configured between 3 mA and 28 mA, enabling the MLX92344 to emulate standard microswitch interfaces while maintaining compatibility with existing hardware and ECUs. The device can be sensed through standard I/O triggers or an ADC, requiring only software readout adjustments.

The MLX92344 offers a wide magnetic operating range from 0.5 mT to 200 mT. It is ASIL B SEooC compliant, AEC-Q100 qualified, and operates from 2.7 V to 28 V over a temperature range of -40°C to +150°C. The switch is available in both surface-mount and through-hole packages.

MLX92344 product page 

Melexis

The post Hall switch streamlines automotive position sensing appeared first on EDN.

NVIDIA chip powers local AI workloads

Wed, 06/10/2026 - 21:22

NVIDIA has unveiled the RTX Spark, a “superchip” delivering up to 1 petaflop of AI compute to enable Windows PCs to run personal AI agents. The device combines 128 GB of unified memory with an NVIDIA Blackwell RTX GPU featuring 6,144 CUDA cores and fifth-generation Tensor Cores that provide FP4 precision. The GPU connects to a high-performance 20-core Grace CPU via the NVLink-C2 chip-to-chip interconnect.

NVIDIA collaborated with MediaTek on the custom CPU design, contributing to strong power efficiency, performance, and connectivity. NVIDIA also partnered with Microsoft to deliver a secure Windows platform for on-device agents, incorporating new Windows security primitives and the NVIDIA OpenShell runtime to safely run autonomous AI agents.

RTX Spark brings NVIDIA’s AI and graphics technologies to creators, developers, and gamers. It can run 120-billion-parameter language models, render large 3D scenes, and accelerate 12K video editing. For gaming, the platform supports ray tracing and NVIDIA DLSS technologies for enhanced visual quality and performance.

RTX Spark-based laptops and compact desktops will be available this fall from leading manufacturers.

RTX Spark product page 

NVIDIA

The post NVIDIA chip powers local AI workloads appeared first on EDN.

How fleet learning works under bounded gate authority

Wed, 06/10/2026 - 20:28

The first article in this silicon governance series established a fundamental reality: observability is not automatically governed evidence. Advanced AI silicon platforms generate a massive stream of runtime telemetry, including network-on-chip (NoC) counters, voltage diagnostics, thermal maps, memory-state logs, firmware traces, error signatures, and workload-dependent behavior. But raw observability alone lacks the context, synchronization, and causality required to explain physical system behavior.

The second article extended that thesis into runtime operation by introducing the firmware–hardware handshake. Hardware senses transient states. Firmware executes localized, bounded actions. A governance layer determines whether those runtime actions remain valid, safe, and causally justified.

This third article closes the loop.

Once complex AI accelerators, multi-die chiplets, HBM modules, advanced heterogeneous packages, and cloud-scale systems are deployed at enterprise scale, a new question appears. How does field evidence refine future silicon, package, firmware, and system decisions without creating an uncontrolled feedback loop of autonomous adaptation?

That question requires fleet learning to operate under bounded gate authority. The operating principle is simple: Fleet learning recommends and bounded gate authority approves.

Fleet learning can identify macro-scale failure signatures, detect structural drift across deployed systems, and recommend policy refinement. But fleet learning should not independently close development gates, alter firmware release criteria, rewrite operating envelopes, or approve lifecycle actions.

That final step requires bounded decision authority.

From single-chip handshake to cluster-scale drift

The firmware–hardware handshake begins locally.

A voltage droop appears on an internal rail.

  • A thermal sensor reports a localized hot spot.
  • A SerDes lane loses operating margin.
  • A memory controller logs an error correcting code (ECC) event.
  • Firmware responds through a pre-validated, bounded action envelope.

At the single-device level, this can preserve operation. But modern AI infrastructure does not operate as isolated silicon. Instead, a single accelerator becomes a board.

  • A board integrates into a rack.
  • A rack scales into a data-center cluster.
  • A cluster becomes a globally distributed fleet.

At that scale, localized runtime compensation is no longer sufficient. Thousands of multi-die devices operating under shifting workloads begin to reveal multi-physics patterns that no isolated lab test, qualification plan, or pre-silicon simulation could fully predict.

A high-speed SerDes retraining event may appear harmless on one device. Across a fleet, it may reveal an advanced-package escape, connector-aging issue, or workload-dependent signal-integrity margin deficit.

A recurrent voltage droop may look like firmware tuning noise. Across many systems, it may correlate with one package substrate lot, one raw-material source, one board configuration, or one power delivery network (PDN) resonance condition.

A persistent thermal asymmetry may look like a local cooling issue. Across a data-center tier, it may expose thermal interface material (TIM) variation, substrate warpage, lid-attach tolerance, or airflow interaction. Next, scattered ECC events may appear random. Across workload, voltage, temperature, memory location, and package population, they may reveal a wafer-to-package interaction or localized timing drift.

The purpose of fleet learning is not to collect more telemetry; the purpose is to normalize field behavior into governed lifecycle evidence.

Telemetry is not convergence

Modern AI clusters are already saturated with logging mechanisms. They continuously capture physical, electrical, firmware, and workload states. But this raw telemetry stream is not system convergence. A monitoring dashboard can flag a symptom.

  • A generic AI model can identify a statistical correlation.
  • An error log can timestamp an interruption.
  • A fleet database can reveal clustering.

But none of those observations automatically confirms physical causality.

A recurring signal-integrity degradation event may look like normal channel aging. In reality, the root cause could be board-level connector variation, package escape routing discontinuity, local thermal expansion, substrate variation, return-path interruption, or mechanical stress accumulation at the package-to-board interface.

A voltage instability event may look like a firmware behavior. In reality, it may originate from package inductance, PDN resonance, voltage regulator module (VRM) response, decoupling placement, silicon switching current, or thermal drift.

A thermal excursion may look like a cooling problem. In reality, it may involve workload placement, TIM thickness, lid attach, airflow, die placement, package warpage, or power-map concentration. This is why unconstrained AI analytics can be risky in high-reliability semiconductor environments.

A system that blindly changes operating bounds based on weakly governed telemetry may optimize the wrong variable, amplify false correlations, mask physical defects, or push firmware parameters outside validated design boundaries. But the objective is not more raw data; the objective is trusted, admissible evidence.

SEGA-AI response: A governed feedback architecture

Fleet learning within the SEGA-AI/governance for lifecycle stack is fundamentally different from standard cloud-level log analytics.

  • It’s not generic telemetry analytics.
  • It’s not unconstrained AI optimization
  • It’s not self-modifying infrastructure

Fleet learning is a governed realization-feedback architecture. Its purpose is to connect deployed behavior back to the assumptions made during pre-silicon design, packaging floor-planning, post-silicon validation, qualification, manufacturing release, and firmware policy definition.

It asks: 

  • Was the original design guardband correct?
  • Was the package-level simulation model complete?
  • Did the system EM corridor have enough high-frequency margin?
  • Did the physical PDN respond as predicted under maximum dI/dt load steps?
  • Did the firmware policy preserve global convergence or only local stability?
  • Did one package lot behave differently from another?
  • Did one board configuration or connector population age differently?
  • Did field behavior expose a validation escape?

This transforms the field from a passive reliability archive into an active lifecycle evidence source. But the field does not rule the system. Instead, deployed behavior informs the governance stack, and bounded gate authority governs the decision.

Fleet learning recommends and bounded gate authority approves

The most important safety principle is that fleet Learning can recommend refinement, but bounded gate authority must approve action.

This prevents a dangerous failure mode: allowing field data, machine learning, or runtime analytics to directly modify firmware policy, release criteria, validation guardbands, or corrective-action rules without sufficient evidence authority.

In large fleets, an unsafe automated update can create systemic instability. A local firmware action that works on one device may create thermal imbalance across a rack. A voltage policy that improves one workload may reduce aging margin elsewhere. A SerDes retraining policy may preserve one link but increase synchronization overhead across a cluster.

Therefore, fleet-scale learning must pass through a multi-state decision gate. Here, bounded gate authority can issue one of six outcomes.

  1. Close: The fleet evidence is mature, admissible, causally verified, and sufficient to advance the configuration.
  2. Remain open: The evidence is immature, stale, incomplete, conflicting, or not yet tied to critical to quality (CTQ) parameters.
  3. Reopen: Authoritative fleet evidence invalidates a previously closed validation, firmware, package, or release assumption.
  4. Escalate: Uncertainty, risk severity, or cross-domain conflict exceeds the bounded authority envelope and requires human engineering review.
  5. Approve bounded action: A limited mitigation is allowed inside a pre-validated safe envelope, such as narrowing a frequency range, changing a retraining threshold, adjusting a voltage policy, or applying a lot-specific firmware constraint.
  6. Block release: A critical CTQ, causality path, or reliability condition remains unresolved.

This is the difference between learning from the fleet and being controlled by the fleet. Fleet learning identifies the pattern; bounded gate authority decides whether the pattern is mature enough to authorize action.

Example 1: SerDes retraining across a fleet

Consider a high-speed SerDes interface operating across thousands of deployed systems. A single lane retraining event may not be alarming. It may result from temperature, workload burst, supply noise, aging, or normal link management. But if fleet learning detects repeated retraining patterns across a specific package lot, board revision, connector family, thermal condition, or workload pattern, the signal becomes more important.

The system must ask:

  • Is this random runtime behavior or a repeatable system EM corridor weakness?
  • Does the pattern correlate with package escape, PCB material, connector transition, thermal gradient, return-path discontinuity, or voltage noise?
  • Does it appear only under specific workloads or across all operating conditions?
  • Does retraining preserve operation, or does it mask progressive margin loss?

Fleet learning can recommend a refinement: adjust validation thresholds, update link-margin assumptions, modify firmware retraining policy, or reopen a system EM corridor gate. But bounded gate authority decides whether that recommendation is admissible and actionable.

The gate should not close until the evidence is mature enough to distinguish a transient workload excursion from a real corridor degradation pattern.

Example 2: Voltage droop tied to one package lot

A runtime voltage droop may initially appear as a firmware or VRM issue. But fleet-scale evidence may show that the event occurs more frequently in systems built from one package lot, one substrate batch, one board stackup, one decoupling configuration, or one supplier population. That changes the engineering question.

The issue may involve package inductance, silicon switching current, decoupling placement, VRM response, PDN anti-resonance, substrate variation, thermal concentration, or workload-driven current transients.

Fleet learning can identify the population-level pattern. But the decision cannot be automatic. Bounded gate authority must determine whether the evidence is strong enough to reopen a package PDN assumption.

  • Adjust firmware voltage policy
  • Change validation stress conditions
  • Hold a package lot
  • Escalate to package reliability or failure analysis
  • Approve a bounded runtime mitigation

The field may reveal the pattern, but the gate determines authority.

Example 3: Thermal asymmetry and package realization

Thermal asymmetry is common in AI systems because workloads are uneven, packages are large, and cooling solutions interact with board and chassis design. A single hot region may not prove a package problem.

But if repeated thermal asymmetry appears across a fleet and correlates with package construction, TIM behavior, lid attach, substrate warpage, airflow condition, or power map, it becomes lifecycle evidence. Here, fleet Learning may recommend updates to thermal guardbands.

  • Package model assumptions
  • Assembly admissibility criteria
  • Firmware workload placement
  • Throttling thresholds
  • Future validation conditions

However, bounded gate authority must decide whether the evidence is mature enough to change policy. Otherwise, the system risks overcorrecting a local symptom and creating a new global instability.

Example 4: ECC events under workload and temperature

ECC events are another important fleet signal. An isolated ECC event may not indicate a major issue. But patterns across workload, temperature, voltage, memory stack, package lot, board configuration, or aging profile may reveal a deeper convergence problem. The source may be memory behavior, power noise, package stress, thermal gradients, firmware scheduling, silicon aging, or a wafer-to-package interaction.

Fleet learning can detect that the event population is no longer random. Next, bounded gate authority must determine whether to remain open and collect more evidence.

  • Reopen a validation assumption
  • Escalate to memory, package, or system teams
  • Approve a bounded firmware mitigation
  • Block a release configuration
  • Refine next-generation design constraints

Again, the value is not only anomaly detection; it’s also governed lifecycle authority.

Example 5: When local firmware action creates fleet-level drift

The firmware–hardware handshake allows local corrective action. That is necessary. But local action can create fleet-level consequences.

A firmware policy that throttles one tile may preserve local thermal margin but shift workload stress to another region. A voltage adjustment may stabilize one condition but accelerate aging under another workload. A SerDes retraining rule may improve link continuity but increase synchronization overhead, operational variability, or latency across a cluster.

So, fleet learning is needed to detect these second-order effects. And bounded gate authority is needed to prevent uncontrolled policy changes.

So, the system must ask:

  • Is the local action preserving global convergence?
  • Is the firmware response still inside the approved action envelope?
  • Does the correction create hidden thermal, timing, power, or reliability debt?
  • Should the action remain approved, be narrowed, be escalated, or be retired?

This is the lifecycle version of the firmware–hardware handshake. Runtime action is not enough, and it must remain governed as fleet evidence accumulates.

Realization in practice: Reopening a validation assumption

Consider a next-generation AI accelerator cluster that successfully cleared pre-silicon signoff, post-silicon validation, and package-level qualification. After several months of deployment, firmware on multiple independent racks begins executing repeated SerDes link retraining sequences. A standard facility log may classify these events as isolated thermal excursions or normal link maintenance.

A governed fleet learning system treats the events differently. It aggregates the retraining events across the fleet, normalizes timestamps, maps them against package lots and board configurations, and compares them with workload signatures, thermal maps, substrate data, and system operating conditions.

The pattern becomes clear: the retraining events occur after localized multi-core workload bursts that generate a thermal gradient across a specific package/substrate population. This is no longer random operational noise. It’s a possible validation escape where real-world multi-physics interaction has violated an original design or package guardband.

Fleet learning generates the recommendation. And bounded gate authority evaluates the evidence package, checks admissibility, verifies causality, and may issue a Reopen outcome on the affected configuration milestone.

The system should not blindly mask the issue through continuous retraining. Instead, it can approve a bounded mitigation for the affected population while sending convergence-authoritative evidence back to validation, package engineering, firmware teams, and pre-silicon architecture groups.

That is the lifecycle loop. Field evidence does not simply become a log; it becomes governed input for the next design, package, validation, and firmware policy decision.

Closing the loop back to design and validation

The most important output of fleet learning is not only field mitigation; it’s lifecycle refinement. Mature fleet evidence should flow back into pre-silicon design assumptions.

  • Package constraints
  • System EM corridor models
  • PDN and CPAM assumptions
  • Firmware policies
  • Thermal guardbands
  • Qualification thresholds
  • Design for test (DFT) and observability planning
  • Manufacturing tolerances
  • Supplier and lot-level evidence models
  • Next-generation architecture decisions

This is how the silicon governance loop closes and the field becomes a governed evidence source for the next design cycle. But only if the evidence is admissible.

That requires the SEGA-AI stack in which test case generator (TCG) protects trust and admissibility.

  • Convergence evidence maturity hierarchy (CEMH) defines evidence maturity
  • Fleet learning recommends lifecycle refinement
  • Bounded gate authority approves the decision

Without this structure, field telemetry remains operational logging. With this structure, field telemetry becomes lifecycle convergence evidence.

The SEGA-AI view

From a SEGA-AI perspective, fleet learning is not an uncontrolled feedback loop. It’s a governed lifecycle refinement system; it does not replace engineering judgment.

  • It does not replace firmware teams.
  • It does not replace validation.
  • It does not replace failure analysis.
  • It does not independently close gates.

It connects runtime behavior to governed decision authority. That allows deployed systems to improve future realization decisions while preserving deterministic control. And that is the difference between learning from the fleet and being controlled by the fleet.

Closing the silicon governance loop

The semiconductor industry has moved beyond isolated design-time closure. In the era of hyperscale AI platforms, multi-die chiplets, HBM systems, advanced packages, and volatile workloads, no single signoff event can guarantee long-term physical convergence across thousands of deployed systems.

The answer is not unconstrained autonomous adaptation. The answer is governed lifecycle learning.

Fleet learning provides the analytical path to uncover systemic patterns, detect drift, and recommend refinement. Bounded gate authority provides the engineering boundary that determines whether those recommendations are mature, admissible, causally aligned, and safe enough to act upon.

Together, they close the silicon governance loop.

Dr. Moh Kolbehdari is senior director of IC/packaging at Socionext US.

Editor’s Note

This is Part 3 of the article series about silicon governance framework. Part 1 explained why data movement alone cannot explain system behavior in modern AI chip designs. Next, Part 2 described the firmware-hardware handshake in a silicon governance system.

Related Content

The post How fleet learning works under bounded gate authority appeared first on EDN.

4mA-20mA to 0mA-20mA converter’s current mirror drives grounded load

Wed, 06/10/2026 - 15:00

The ubiquity of the 4 to 20mA current loop in analog process monitoring and control creates possibilities for peculiar designs of circuits for unusual accessory functions.  Figure 1 shows an example.  It does precision conversion of 4—20mA to 0—20mA.  That’s useful for accommodating analog inputs that wouldn’t like a 4mA zero offset.

Wow the engineering world with your unique design: Design Ideas Submission Guide


Figure 1 This current conversion circuit’s function is define by the following equation: Iout = (IinR1 – 1.24v)/R2 = 1.25(Iin – 4mA).

The core of the circuit is the Vin = IR1 = 1.24v to 6.20v developed by the 4mA – 20mA input working into R1 and sensed by the Vref input of Z1. The principle in play is discussed here.

A potentially annoying shortcoming of the Figure 1 design, however, is its current sink output that’s referred not to ground but to the V+ source node, which needs to be at least 8v.  Figure 2 offers an accurate and straightforward fix: an active current mirror as described here. The input max overhead voltage is 8v.


Figure 2 This circuit adds an active current mirror to its predecessor to drive a grounded load.

Stephen Woodward‘s relationship with EDN’s DI column goes back quite a long way. Over 200 submissions have been accepted since his first contribution back in 1974.  They have included best Design Idea of the year in 1974 and 2001.

Related Content

The post 4mA-20mA to 0mA-20mA converter’s current mirror drives grounded load appeared first on EDN.

Edge AI deployment made easy for system integrators

Tue, 06/09/2026 - 19:09

In 2025, Innodisk launched the “AI beyond the edge” initiative at a forum that also hosted Intel, Nvidia, and Qualcomm, which shared details of their latest developments in edge AI. But what does “AI beyond the edge” really mean?

Don Yu, special assistant to the GM at Innodisk, said that “AI beyond the edge” is about enabling systems that operate autonomously, remain connected, and scale across real-world environments. He also mentioned two complementary domains as part of this initiative.

First, industry AI—built for smart manufacturing, automation, transportation, healthcare, retail, and smart cities—enhances on-site responsiveness through real-time recognition, predictive maintenance, and intelligent workflow optimization.

Second, enterprise AI—designed for data centers, on-premise AI, and advanced models such as large language models (LLMs) and visual language models (VLMs)—supports secure, intelligent decision-making across corporate, financial, medical, and public sectors. “That allows small and mid-size businesses (SMBs) to have their own AI engines locally instead of relying on the cloud,” Yu said.

But despite all the promise, deployment of edge AI has been a challenge so far. So, how are these edge AI initiatives faring so far, EDN asked Yu. And what is Innodisk doing to overcome these challenges in effectively implementing edge AI at scale?

Edge AI deployment challenges

Innodisk chairman Randy Chien acknowledges that the exponential rise of generative AI and LLMs has fundamentally changed the design equation at the edge. More specifically, as AI workloads grow in complexity, companies are facing increasing pressure in system integration, hardware-software coordination, and the ability to scale solutions across diverse deployment environments.

“Anticipating this shift early on, Innodisk has built on its strong hardware foundation by structuring its product portfolio into modular building blocks across memory, storage, camera modules, and a wide range of embedded peripherals,” Yu said. “On this foundation, the company has positioned itself as an AI architect, combining these building blocks to meet diverse industry requirements with tailored edge AI systems.”

So, edge AI developers can implement these solutions as individual modules or as fully integrated systems, depending on their application needs. Take the example of the APEX series of edge AI systems, which brings together key building blocks, including AI accelerators, DRAM modules, flash storage, industrial MIPI and GMSL camera modules, and embedded peripherals for networking and industrial I/O.

“The platform enables flexible system configuration based on specific use cases, while supporting customization to meet diverse deployment requirements,” Yu said.

Figure 1 Individual modules are fully integrated systems tailored according to edge AI application needs. Source: Innodisk

Yu added that Innodisk is heavily investing in firmware and software development to bolster its design ecosystem. Take vision-related AI, for instance, where Innodisk provides fully ported drivers for industrial camera modules, supporting both VLMs and computer-vision applications to streamline deployment and minimize integration friction.

Innodisk also provides specialized software toolkits to accelerate system integration. For example, it has introduced IQ Studio to support the development of Qualcomm-powered edge AI systems. IQ Studio is an open-source developer portal that provides essential board support packages (BSPs), reference code, and benchmarking tools.

How modular solutions aid system integrators

These modular solutions—segmented across five layers of compute, memory, storage, sensing and connectivity, and software—are aimed at addressing design challenges before the last mile of AI deployment in vertical markets. This cohesive system-level approach addresses common development challenges for system integrators and solution providers, enabling them to focus on developing their applications rather than managing integration.

Figure 2 Modular solutions handle integration complexity, which allows system integrators to focus on developing their applications. Source: Innodisk

Moreover, there is a wide range of pre-validated solutions that significantly shorten system integration development cycles. Case in point: AI on Arm series of computer-on-modules (COMs) are designed to be deployment-ready. “They can be directly integrated into customer systems with minimal development effort,” Yu said. “Additionally, they can be paired with Innodisk carrier boards and peripherals to support different system configurations.”

Figure 3 COM modules can be paired with carrier boards and peripherals to support different system configurations. Source: Innodisk

These deployment-ready solutions provide system integrators with practical reference points and inspiration for application design when applied in real-world scenarios. Take the APEX-X200 edge AI platform, for instance, which Innodisk showcased at Nvidia GTC 2026. This on-device inference platform analyzes X-ray and CT images in real time, generating draft medical reports and clinical insights through AI-assisted healthcare workflows.

APEX-X200, powered by an Intel Core Ultra 9 processor, also integrates an Nvidia RTX PRO 6000 Blackwell Server Edition GPU with 24,064 CUDA cores and 752 Tensor cores. Furthermore, it supports up to 96 GB of industrial-grade DDR5 memory and a 1 TB PCIe Gen5 x4 NVMe SSD.

Innodisk has also developed perception systems for heavy machinery and large vehicles in collaboration with its subsidiary Aetina. It integrates the Nvidia Jetson AGX Orin platform with up to eight GMSL2 camera modules alongside capture cards and extenders that support cable lengths up to 30 meters.

Figure 4 The edge AI-based perception system facilitates surround-view stitching, blind-spot detection, and driver-monitoring functions. Source: Innodisk

These perception systems enable surround-view stitching, blind-spot detection, and driver-monitoring functions, supporting real-time environmental awareness and helping identify potential risks such as fatigue or distraction under complex operating conditions. “It’s also an example of a modular architecture that supports future system upgrades without requiring major redesign efforts,” Yu said.

Eyeing U.S. and Europe

Innodisk, headquartered in New Taipei City, Taiwan, has global ambitions with more than 1,000 field-proven edge AI deployments worldwide. In Europe and the Unites States, it’s operating in close collaboration with regional distributors and partners in edge AI segments such as industrial automation, healthcare, aviation, and professional workstations.

Innodisk considers industry events a key tool for bolstering its presence in these crucial markets. It has showcased its edge AI solutions at Nvidia GTC 2026 in the United States, ICE Barcelona in Spain, and Embedded World 2026 and CloudFest 2026 in Germany.

Next, to support global deployment requirements, the company ensures its products comply with regional regulations. Its edge AI solutions meet CE and UKCA requirements for Europe and the U.K. and FCC regulations for the United States.

Also, in Europe, where cybersecurity requirements have become increasingly mandatory, Innodisk attained IEC 62443-4-1 certification in late 2025, embedding security throughout the product development lifecycle rather than treating it as a separate feature. It’s critical because the EU Cyber Resilience Act (CRA) is expected to be fully enforced by 2027.

Related Content

The post Edge AI deployment made easy for system integrators appeared first on EDN.

Derivative-controlled low pass filter, simplified

Tue, 06/09/2026 - 15:00

How to design a simpler filter (or filter-like circuit) with a varying time constant dependent on what kind of waveform is fed to it.

Discussions with some former coworkers have focused on how to design a filter or circuit with filter-like performance that has the characteristic of a slower time constant on on increasing-signal waveforms and a faster time constant on decreasing-signal ones. Such a circuit was proposed in Reference 1, which made use of the Analog Devices AD534 chip.

Wow the engineering world with your unique design: Design Ideas Submission Guide

Along with the “squirming baby” example in Reference 1, another example using such a filter might be a scale at a deli counter, filtering weight as a slice or two is added to the order. When weighing is complete and the slices are removed from the scale, the reading should conversely decrease quickly.

Could there be a different, simplified circuit that might find use in accomplishing the same effect? Thus this Design Idea.

Simplification using an op amp

One way to simplify is to use the same input voltage level as the output, which precludes requiring an input isolation circuit. See Figure 1 for an example.


Figure 1 This simplified derivative-controlled low pass filter has its output at V.

Starting with the circuit in Reference 1 as a foundation, the simplified circuit requires an R1C2 combination to act as the derivative function. The input signal requires a filter, R3C1 as the filter time constant. This derivative signal should be wired to a transistor switch, Q1, a 2N2907A, which discharges that capacitor at a faster rate, R4C1. A non inverting amplifier, ¼ of an LM324N, acts to provide isolation of the derivative input to the transistor switch. This is accomplished by ensuring that the Q1 emitter to base junction is zero, therefore not conducting at steady state.

Figures 2-4 show the actual circuit being tested, and the results.


Figure 2 The circuit in this Design Idea was breadboarded and lab-tested, not just simulated.


Figure 3 In this graph of test results, the red trace is the input, with the output at C1 in blue. Note that the output is at the same level as the input, but the time constants are different.


Figure 4 Conversely, in this graph of test results, the red trace is the output and the blue trace shows the derivative action.

Further simplification

Removing the op amp is possible if the emitter to base junction is biased below the cut-in voltage. Reference 2 has an extensive discussion on the subject, based on the Shockley diode equation. The emitter base junction is the diode in question. There is a point where the forward bias current quite low, assumed to be 1% of the maximum load current. The voltage at that point is considered to be the cut-in voltage; for silicon devices it is assumed to be 0.6V.

For this application, R1 is lowered to 500Ω, which results in a 0.238V difference across the forward-biased Q1 junction, below the cut-in voltage at steady state.


Figure 5 This schematic shows a further simplification of the previous circuit.


Figure 6 In this graph of test results for the further simplified version of the circuit, the red trace is again the input, with the output at C1 in blue.


Figure 7 Conversely, in this graph of test results for the further simplified version of the circuit, the red trace shows the voltage across R1, with the blue trace referencing the C1 voltage. Note the voltage difference in this case.

Conclusion

This circuit will not work for small changes in the input voltage, a topic which is discussed in Reference 1. The values used in these circuits are arbitrary; they can be scaled based on filtering requirements.

References

  1. Sheingold, Daniel H., Transducer Interfacing Handbook, Analog Devices, Inc., Norwood, MA., 1980.
  2. Millman, J.; Taub, H., Pulse, Digital, and Switching Waveforms, McGraw-Hill, New York, NY., 1965.

Robert Heider is a retired engineer with over 50 years’ experience with emphasis on the design of advanced process controls and process development.

Related Content

The post Derivative-controlled low pass filter, simplified appeared first on EDN.

Apple’s question for the developer: Are you up for an AI do-over?

Tue, 06/09/2026 - 11:51

Take two, two years later. That’s the 2026 WWDC in a nutshell, at least for developers. And for consumers? If your Apple Watch is more than a few years old, it’s headed for retirement-and-replacement.

Ironically, albeit not atypically, Apple announced no new hardware at this year’s Worldwide Developers Conference (WWDC) keynote, even though the featured image for the event’s summary press release contained an assortment of it:

And also typically (of late, at least) and as-always disappointingly, the keynote was as-usual pre-recorded.

Which was particularly disappointing in this instance, as the company’s messaging would have benefitted greatly from the presence of live demos, regardless of whether (but especially if) they went off without a hitch. Why? In 2024, Apple made big promises regarding the AI-enhanced version of its Siri virtual assistant and the broader AI-enabled capabilities of its various coming-soon operating systems and application suites.

Two years and a $250 million class action lawsuit settlement later, the company’s trying again, this time in partnership with Google (who held its own developer event just a few weeks ago). I concur with TechCrunch that the demo videos seemed more genuine this time around, with real people interacting with real devices and doing real-life-reminiscent things. Still…pre-recorded.

It’s 2009 all over again

But Apple didn’t lead with AI…sorry, Apple Intelligence…this year. Instead, it focused first on the broader nips and tucks that upcoming (and in the first three cases, already available in developer beta form) 27-series operating systems for computers (just-christened MacOS “Golden Gate”), iOS, iPadOS, watchOS, visionOS and tvOS aspire to deliver above and beyond their generational precursors. All of which takes me back nearly two decades.

At the June 2009 WWDC, Apple unveiled Mac OS 10.6 “Snow Leopard”, which the company proudly trumpeted as having “zero new features” versus its two-years-earlier Mac OS 10.5 “Leopard” predecessor. Instead, Apple focused on, quoting from the Wikipedia entry, “improved performance, greater efficiency and the reduction of its overall memory footprint.” One key means of doing so (quite effectively, in my personal experience along with broader industry reputation) was to strip out legacy PowerPC CPU support. And one year and one O/S generation later, OS X Lion 10.7 also dropped the Rosetta emulation support that had enabled legacy PowerPC-compiled applications to continue to run on top of an Intel x86-centric operating system base.

Fast forward to today and the sense of déjà vu is strong. The last clutch of Intel-based systems (two of which I ironically own, as noted in my last-year’s WWDC coverage) are no longer supported in MacOS 27. And although Rosetta 2 emulation support for x86-compiled code is still baked in, I’d wager that (again like last time) it won’t remain there for long. More generally, all the new operating system versions focused notably on performance, stability and other improvements, such as Liquid Glass U/I tweaks.

The enemy of my enemy…

I still struggle a bit to wrap my head around the partnership between Apple and Google on both AI models and cloud services (the latter alongside NVIDIA, interestingly)…but only a bit. After all, as I noted in my recent Google I/O coverage, Google’s on quite a roll right now. Apple had previously worked with OpenAI to add ChatGPT support to Siri, with limited-at-best success as far as I can tell. And OpenAI’s made no secret of its aspirations to deliver Apple-competitive hardware, going so far as to partner with former Apple design chief Sir Jony Ive.

Yes, Google (Android and derivates, including Wear OS, plus ChromeOS and the upcoming “Aluminum”) and Apple (iOS, iPadOS, watchOS, visionOS and tvOS) are market competitors, but so too are Microsoft (Windows) and Apple (MacOS). Microsoft is increasingly becoming a broad AI technology supplier in its own right. And then there’s Meta, still pushing VR, increasingly enthusiastic about smart glasses and rumored to be branching into other hardware. And Amazon, supposedly flirting with smartphones again. And…get my point?

While Apple (along with Apple fanboy sites) goes to great pains to position the Google arrangement as a partnership, I strongly suspect that in reality, Google-developed models were distilled (at most, and maybe not even that) to come up with Apple architecture-optimized versions, leveraging unique acceleration coprocessor capabilities, for example, or using data formats (and sizes of those formats) that inference-execute optimally on Apple Silicon.

Beyond that, along with (I suppose) a dedicated Siri AI app this time around, it all sorta feels like two years ago all over again, this time leveraging a robust trained-model foundation. Which isn’t a bad thing, mind you, quite the contrary. And Apple’s not unrecoverably late, mind you, although if the company had kept waffling for another year or few, I might be saying something different. It’s all just …well…meh.

Obsolescence by design strikes again

Switching to hardware (still mentioned, albeit not newly introduced), and beyond the aforementioned Intel-based computer support demise, the messaging was something of a mixed bag. The company is already beginning to feature-set differentiate between various Apple Silicon system generations, although it hasn’t (yet, at least) started culling any of them from the supported-at-all list. The same goes for iPhones.

Apple has apparently decided that in the midst of a shaky economy, telling folks that they need to go buy new iPhones isn’t a particularly wise move. Similarly, although not exactly so, many (but not all) iPads that run iPadOS 26 are upgradeable to iPad OS 27, too, including I’m happy to say the four fondleslabs in the Dipert household.

And what about smart watches? The story here is unfortunately far more ugly. Apple has apparently decided that in the midst of a shaky economy, it’s still going to be able to (or at least try to) tell lots of folks that they need to go buy new Apple Watches. Including my wife, whose first-generation Watch Ultra has just gotten knifed. I guess I now know what I’ll be buying her for her birthday in a few months…

I’ve only hit here what I thought were the high points; plenty more announcements and tidbits also got covered elsewhere. But what do you think about what I’ve focused on in this piece? As always, let me know your thoughts in the comments!

Brian Dipert is the associate editor, as well as a contributing editor, at EDN.

Related Content

The post Apple’s question for the developer: Are you up for an AI do-over? appeared first on EDN.

Would custom memory ease your SoC design?

Tue, 06/09/2026 - 07:35

Memory customization is not always a top priority when a design team plans a new system-on-chip (SoC) project. But often it should be.

This may not be an obvious statement. Granted, SRAM claims a lot of area on most SoCs. The speed and power consumption of SRAM arrays can affect the overall chip performance and energy efficiency.

But today’s memory compilers are flexible tools that support a variety of cell designs. At Faraday, for example, the 14FFC compiler offers eight variants, tuned to diverse needs, ranging from high-density to high-performance to ultra-low-power. So why do you consider custom memory?

One answer to the above question is the need for an unusual word or bit length. Relatively simple customization can produce the exact SRAM configuration required for a specific instance, not just the compiler’s closest approximation.

Similarly, there are times during floorplanning—or, more concerningly, during timing closure—when giving an SRAM instance an unusual aspect ratio can ease a difficult situation. This may be a more complex customization, requiring changes to array layout and routing, multiplexers, drivers, and cell designs.

Recently, we designed a multi-Mbit SRAM array with an aspect ratio of nearly 1:19. This memory architecture is ideally suited for seamless integration into frame-buffer applications specifically designed for display processing. The memory configuration, characterized by its unique aspect ratio, is carefully engineered to accommodate wide I/O widths and specialized non-2n-column multiplexing requirements.

Figure 1 Special aspect ratio memory in this case is x = 1775 um, y = 95 um; giving an SRAM instance an unusual aspect ratio can ease a difficult situation. Source: Faraday Technology

Another situation involves yield and reliability. Compilers typically only generate a specific number of redundant columns of bit cells. In the event of a bit failure, the array can disconnect the offending cell’s column and replace it with a redundant column if one is available. This technique is effective if failures only occur in one or a few columns.

But for various reasons, some designs require more protection: redundant columns and redundant rows. The additional cells, routing, and logic to implement this expanded redundancy can be achieved by customizing the array.

Figure 2 This memory offers redundant rows and columns for additional rows and columns. Source: Faraday Technology

An automotive case study

Another example of memory customization comes from a recent SoC design we participated in. The project was for a mission-critical automotive SoC. Our customer specified an Automotive Grade 1 (AG1) operating ambient temperature range of -40 to +125 °C.

Within that range, the customer required an extended operating life, as is customary for automotive electronics. And the chip would require ISO 26262 functional safety certification, which would require enhanced failure analysis and documentation during design.

This project illustrates the level of detail sometimes needed in memory customization. But it also shows the extent of additional support—analysis, documentation, design assistance, and test services—that a custom memory design can entail.

We determined that existing tools could produce an array that would operate reliably over the AG1 temperature range in the short term. But to achieve the required operating life, we had to address aging issues in the circuitry.

First, there was the issue of high-current signals on the array’s word lines and bit lines. The customer was rightly concerned that, over the operating life and at elevated temperatures, the high currents could cause sufficient electromigration to trigger chip failure. So, we redesigned the line drivers and the array, preserving array performance, signal integrity, and line-direction management while reducing the risk of electromigration.

Bias temperature instability (BTI) was another threat to chip life: time and elevated temperature cause a gradual but significant drift in MOSFET threshold voltages. Unfortunately, NMOS and PMOS devices age differently under BTI. So very gradually, the timing of rising and falling signal edges can diverge. Eventually, this can lead to circuit failure at points where the relative arrival times of two signals, one positive-going and one negative-going, are critical. Accordingly, we altered the memory design.

We further inspected the remaining control logic for the risk of developing race conditions over time and adjusted timing margins to account for eventual threshold-voltage drift. The result was a significant improvement in SRAM’s expected operating life.

Functional safety

Certification under ISO 26262 was another requirement. This comprehensive standard delves deep into the design process to ensure that chip failure modes are identified, traced to their root causes, and addressed. This process extends to IP used in the design and to the original circuitry. So, the documentation required for ISO 26262 certification was deliverable for the custom memory team.

Two primary documents are required: a Design Failure Mode and Effects Analysis (DFMEA) and a safety manual. The former, as its name suggests, is an exhaustive list of the ways the IP could cause an error, the possible causes of those failure modes, and the remedial actions taken. The safety manual, in contrast, is an instruction manual for the chip and system designers who will integrate the IP into the overall design.

One entry in the DFMEA might include a failure in which a bit cell flips, corrupting data in the SRAM. Under this heading, list potential causes of a flipped bit, including design-rule violations in the cell array, radiation upset, and aging. For each reason, there would be a list of controls to prevent it or detect it, an assessment of the remaining failure risk, and recommendations for further action.

The safety manual tells IP integrators and system developers how to use the IP without violating the conditions for which it was designed. Directions might include, for instance, input signal and supply voltage ranges, noise limits, substrate noise and temperature limits, output loading specifications, and maximum duty cycle limits.

Why custom memory design?

As these examples illustrate, custom memory design can adapt an array exactly to functional, timing, or layout requirements of a particular SoC. It can also produce arrays for demanding performance, environmental, or reliability requirements.

But seeing a customer through to a finished SoC requires far more than just providing the design files for a custom SRAM array. The design partner should be ready to assist the SoC design team with integration, provide verification and test support, and thoroughly document the characteristics and requirements of the new SRAM design.

In addition, the partner should be able to work intimately with the SoC foundry to ensure yield, and with the test vendor to ensure adequate test coverage for the new array. In many cases, it’s an advantage for the partner to have in-house testing capability.

Roger Chen is deputy division manager for memory IP development at Faraday Technology.

 

 

Related Content

The post Would custom memory ease your SoC design? appeared first on EDN.

Not smart, but solar: Analyzing another thermo-plus-hygrometer

Mon, 06/08/2026 - 15:00

Connectivity is all well and good…well, sort of, as it invariably comes with a price, literally and/or figuratively. Simple’s sometimes best, all things considered, and ambient-light power’s also nice.

When you want to monitor and adjust the internal humidity (and temperature, while you’re at it) of your residence or other facility, a “smart” connected hygrometer such as the one I tore down last month is convenient, since you can check both the measurements-of-the-moment and longer-term legacy trends from anywhere (even when you’re away) using your mobile device. A “smart” hygrometer can even alert you when those measurements stray beyond predefined boundary conditions. And if it includes a built-in display, you can keep your smartphone stowed away and still see the data.

All that connectivity and integrated intelligence comes with a bill-of-materials cost adder, however. And there’s always also the latent (or not) potential for hackers to gain access to that same data stream. While you might not care if someone halfway around the world (or down the street, for that matter) knows your home’s humidity and temperature, you’ll undoubtedly care a lot more if that same “smart” hygrometer ends up being a penetration “vector” for a broader attack, revealing your location and Wi-Fi network login details, for example, along with providing strangers with access to more privacy-violating LAN devices such as security cameras.

Acceptable = respectable

As such, a non-connected sensor is a credible (and sometimes the preferable) alternative. At the beginning of April, I saw a two-pack of BaldrTherm 2.2” solar-powered digital thermometer and hygrometers marked down to $9.99 at Amazon and, curious to try out (and tear down) such a device myself, pressed “purchase”.

I’ve subsequently seen the same two-pack listed there for as low as $8.99, exemplifying a broader BaldrTherm promotion that I’m guessing is motivated by a product line transition combo of redesign and migration to larger, more visible data-rich, 3.2” display devices:

with in-progress awkward consequences:

And to be clear, the company offers plenty of “connected” product variants, too. But today we’ll dive inside a fully standalone-operation offering, complete with a solar cell power option that’s more broadly photon-source agnostic (albeit presumably still visible light spectrum-centric).

Since I know how much you all love conceptual teardown “stock” images, I’ll start with one of ‘em:

And now for our actual patient, as usual beginning with some outer box shots, also as-usual accompanied by a 0.75″ (19.1 mm) diameter U.S. penny for size comparison purposes:

Flip open either of the latter two flaps:

and inside you’ll find two slips o’literature (the “user manual”, such as it scantly is, can be accessed in PDF form here):

and two sleeve-swathed examples of today’s teardown victim:

Diminutive in size and price

Here’s the now-“naked” device from various perspectives. Note the transparent piece of plastic (which BaldrTherm refers to as an “insulation sheet”) sticking out one side, which keeps the battery inside from prematurely draining while sitting on store shelves pre-purchase, until removed by the buyer-now-owner (and whose very presence was initially confusing to me, as I’d assumed the energy storage cell in the interior was solar-rechargeable; keep reading).

In spite of the battery still being disconnected, and after a brief delay after initial exposure to my home office’s overhead lighting:

the display came on and the device started working:

I was initially surprised by this unexpected functional transition, until I pondered and realized the underlying reason why, which the user manual also spells out:

Time to get inside. You may have already noticed in one of the earlier overview shots the two coin edge-inviting slots (one of them doing double-duty for the “insulating sheet”) on one side.

Had I thought to grab the penny I had handy, they might have sufficed. As it was, the flexible tip of the “spunger” I was trying to use made it ineffective, so much so that I peeled off the backside sticker to see if I could find any screw heads underneath it. Nope:

Switching to a flat-head screwdriver eventually accomplished my objective, however:

Hot and (not) heavy

Here’s where things started getting interesting and, in retrospect, amusing. I happened to notice that, presumably during the initial disassembly process, the spring terminal at the anode (“negative”) end of the AAA battery inside had become dislodged.

Normally, such batteries’ cases have a thin plastic outer insulating layer that prevents short-circuits with the cathode directly below it:

Not in this case (bad pun intended), however, or maybe it got scratched during disassembly, too. Because when I grabbed the sides of the battery to remove it, my fingertips got scorched. I quickly grabbed the aforementioned flat-head screwdriver and flipped the battery out of the chassis that way instead.

While I waited for it to cool, I carefully rolled it around and learned that it was a non-rechargeable conventional alkaline cell, instead.

In retrospect, including not only a rechargeable battery but also the necessary recharging circuitry in the design would have ballooned the bill-of-materials cost, and I later noticed that the documentation made it clear that the battery was not to be replaced, apparently if for no other reason than to preclude owner burns and other potential mishaps.

If so, though, then why the tempting coin-shaped slots on one side? Inquiring minds want to know. Surprisingly, the cell still held a meaningful modicum of charge; I’d apparently been sufficiently speedy in noticing and rectifying the short-circuit circumstances:

And the device still worked, both with the battery removed:

and with it temporarily reinstalled once safe to touch again.

Internal details

Onward. The solar cell is tenuously held in place with a single piece of tape on one side and the case sides on the other.

The PCB to which it’s attached is conversely more firmly ensconced by two screws.

You know what comes next:

We have a liftoff:

Now for the other, more circuitry-meaningful front side:

Flipping the LCD over reveals its elastomeric connector on one end, which normally presses up against electrical contacts on the PCB itself:

This is one rugged little device; pressing the two halves back together with my fingers and exposing the solar cell to light reignites the display and broader sensing-and-reporting capabilities (albeit with the measured temperature presumably inflated by my body proximity).

Here’s a closeup of the PCB frontside:

showing the elastomer-mating contacts at bottom, a piece of insulating tape at upper left and normally between the LCD backside and a 220-µF capacitor first glimpsed in the assembly rear-view images I shared earlier:

and at upper right, and left-to-right, the humidity and temperature sensors. Underneath the identification-blocking black epoxy blob in the center is presumably the SoC.

Capacitor and missing-battery buffers

In closing, after putting everything back together, the device still worked, after a brief wakeup delay and initially for only a short and cyclical timeframe.

After which, functionality eventually stabilized as long as sufficient light remained available.

Specifically, I’m guessing, commensurate with the fact that there’s still no battery (re)installed. What’s the relationship here? It has to do, I think, with the core purpose of that previously noted capacitor. Remember my “backup batteries and supercaps” piece from last month? This is effectively the supercapacitor, intended to smooth out transient ambient illumination variability-induced impermanence in the solar cell’s output.

I’m guessing that the capacitor is taking a few system-reboot cycles to get to full stored charge capacity, particularly given that there’s (abnormally, versus the normal configuration) no battery installed to alternatively supply the system with the necessary electrons. Agree or disagree, readers? As always, please let me know your thoughts on this and/or anything else that caught your fancy in the comments!

Brian Dipert is the associate editor, as well as a contributing editor, at EDN.

Related Content

The post Not smart, but solar: Analyzing another thermo-plus-hygrometer appeared first on EDN.

Radiosondes: Disposable guardians of the sky

Mon, 06/08/2026 - 10:09

Fifteen miles above you, a small styrofoam box is shrieking into the void. Its voice is binary—relentlessly transmitting temperature, pressure, and wind speed from the freezing stratosphere. In two hours, it will be gone, torn apart by the very atmosphere it was sent to measure.

This is the radiosonde’s hidden existence: the most successful yet expendable Internet of Things (IoT) device ever launched.

From balloons to big data

Radiosondes are the unsung workhorses of atmospheric science. First launched in the 1930s, these lightweight sensor packages ride weather balloons into the upper atmosphere, relaying streams of temperature, pressure, and humidity data that form the backbone of modern weather forecasting.

Every day, hundreds are released worldwide, their short lives fueling the long-range models that guide aviation, agriculture, and disaster preparedness. Though each unit is designed to perish after a single flight, the collective impact of radiosondes is enduring—an invisible infrastructure that keeps our understanding of the sky precise and predictive.

Vehicle vs. instrument: Understanding the weather balloon system

While people often use the terms interchangeably, a weather balloon and a radiosonde are distinct components of a single flight system. The weather balloon is an expendable transport vehicle; a large latex sphere filled with hydrogen or helium designed to provide the lift necessary to reach the stratosphere.

In contrast, the radiosonde is the scientific payload; a small, battery-operated instrument package tethered below the balloon. While the balloon’s only job is to climb until it bursts, the radiosonde performs the actual work of measuring temperature, humidity, and pressure and then transmitting that data via radio waves to meteorologists on the ground in real-time.

Figure 1 A sonde balloon and a radiosonde facilitate upper-air observations for numerical weather prediction models. Source: Azista Aerospace

The science of atmospheric sounding: How radiosondes work

A radiosonde primarily tracks pressure, temperature, and humidity using sensitive electronic sensors. While these provide the “ingredients” of the air, the device also tracks wind speed and direction by monitoring its own movement via GPS; as the balloon drifts, its change in position reveals exactly how the wind is blowing at different altitudes.

Together, these measurements allow meteorologists to build a complete vertical profile of the atmosphere—from the ground all the way up to the stratosphere. Furthermore, these variables are used to calculate geopotential height, which determines the precise altitude of pressure levels used to map global weather patterns.

Figure 2 The balloon-borne DFM-17 radiosonde provides atmospheric data for meteorological sounding. Source: graw

In essence, a radiosonde is a portable weather station integrated with a radio transmitter. Suspended from a rubber or latex balloon, the device ascends deep into the stratosphere to capture high-altitude data, transmitting real-time measurements of temperature, pressure, and humidity to a receiving station. The maximum altitude is determined by the diameter and thickness of the balloon.

By tracking the unit’s trajectory via GPS, meteorologists also map the strength and direction of winds aloft, creating a comprehensive vertical profile of the atmosphere. The flight concludes when the balloon reaches its expansion limit and bursts, triggering a small parachute to slow the radiosonde’s descent. While many units land in inaccessible areas, others are recovered by the public and returned for refurbishment, closing the loop on a single atmospheric mission.

Radiosonde system: Vertical layers from balloon to ground station

A radiosonde system is organized in vertical layers, beginning with the sounding balloon, also known as the sonde balloon, which ascends into the upper atmosphere carrying the payload. Suspended beneath is the radiosonde unit, integrating a glass bead thermistor for precise temperature measurement, a capacitive humidity sensor to monitor moisture levels, and a GPS receiver to provide accurate position, altitude, and wind data.

These measurements are transmitted through the radiosonde transmitter to a ground-based receiver and processing system, where the data is decoded and analyzed. This layered architecture—from balloon to ground station—creates a continuous vertical profile of atmospheric conditions, enabling reliable weather forecasting, climate monitoring, and deeper research into atmospheric dynamics.

Beyond the core radiosonde unit, several design enhancements improve measurement accuracy and reliability. The capacitive humidity sensor is equipped with a miniature heater element to prevent condensation and ensure stable reading in saturated conditions. The glass bead thermistor used for air temperature measurement is often treated with hydrophobic coating, reducing the impact of water droplets and improving response time in cloud environments.

Many radiosondes also include an optional barometric pressure sensor, adding direct pressure measurements to complement GPS-derived altitude data. These refinements—heater stabilization, protective coatings, and auxiliary pressure sensing—extend the robustness of the radiosonde system, ensuring dependable atmospheric profiles even in challenging weather regimes.

Figure 3 The Vaisala Radiosonde RS41-SGP features a specialized chassis that integrates a high-precision pressure sensor into its compact design, ensuring robust and accurate atmospheric profiling even in GNSS-challenged environments. Source: Vaisala

Radio subsystem: Transmission and data handling

The radio subsystem of a radiosonde is engineered for efficient, narrow-band communication between the airborne unit and the ground station.

Modern designs support programmable frequencies and channel selection, allowing flexible operation across different meteorological networks. Transmission parameters include controlled bandwidth allocation, adjustable transmitter power, and defined coverage ranges to ensure reliable signal reception over long ascents. Data is typically modulated using Gaussian frequency-shift keying (GFSK), balancing spectral efficiency with robustness against noise.

The downlink stream carries structured data bits at a specified sampling rate, enabling continuous atmospheric profiling. For pre-launch verification, many systems integrate near field communication (NFC) capability, allowing quick ground checks of sensor calibration and transmitter health. Together, these radio features—programmable channels, efficient modulation, and diagnostic NFC—form the backbone of dependable data delivery from balloon to ground station.

Here is a side note regarding AFSK vs. GFSK. Earlier radiosonde systems often relied on audio frequency-shift keying (AFSK), a simple scheme that encodes data by alternating between two audio tones. While easy to implement, AFSK suffers from poor spectral efficiency and limited robustness in noisy RF environments.

So, modern designs have largely transitioned to GFSK, which applies Gaussian filtering to smooth frequency shifts. This reduces bandwidth usage, minimizes adjacent-channel interference, and improves reliability when multiple sondes are launched simultaneously. In practice, GFSK delivers cleaner signals and higher data integrity, making it the preferred modulation method for today’s radiosonde telemetry.

Figure 4 Modern pocket-sized radiosondes, such as the Windsond S2, capture real-time weather profiles for immediate analysis. Source: Sparv Embedded

Telemetry and ground receiver

While the airborne unit handles transmission, the ground receiver ensures accurate acquisition, synchronization, and validation of the telemetry stream. Selective filtering and error-detection routines safeguard data integrity even under weak-signal conditions, while multi-channel capability allows simultaneous monitoring of several sondes during coordinated launches. Once captured, the telemetry is processed through digital signal blocks that reconstruct temperature, humidity, pressure, and positional data into usable atmospheric profiles.

Modern systems further enhance reliability with multi-GNSS technology, leveraging multiple satellite constellations to improve positional accuracy and wind profiling. Coupled with real-time visualization interfaces, operators can track balloon ascent, sensor health, and data quality throughout the flight. By combining robust acquisition with intelligent decoding, the receiver transforms radiosonde measurements into actionable meteorological information for forecasting systems.

External payloads and research extensions

Beyond standard meteorological instrumentation, radiosondes can be adapted to carry external payloads for specialized research. A common example is the ozone sonde, which measures ozone concentration profiles using electrochemical sensors to support atmospheric chemistry studies.

Other payloads may include aerosol samplers, radiation detectors, or custom research modules, depending on mission objectives. These add-on packages are typically integrated beneath the radiosonde unit, sharing the balloon lift and telemetry link while operating within defined weight and power budgets.

By accommodating external payloads, radiosonde platforms extend their role from routine weather monitoring to flexible airborne laboratories, enabling targeted investigations into atmospheric composition, pollution transport, and climate dynamics.

High-altitude scavenger hunt

Every day, thousands of radiosondes drift back to Earth, largely unnoticed by the world below. However, with a modest receiver and a bit of technical curiosity, these silent travelers become the centerpiece of a high-tech scavenger hunt.

Radiosonde hunting, also known as radiosonde tracking, is a unique hobby that bridges the gap between radio engineering, software-defined radio (SDR), and outdoor exploration. By leveraging specialized hardware and open-source software, enthusiasts can intercept live telemetry, decode atmospheric data in real time, and pinpoint a sonde’s landing site for recovery.

Radiosondes as tools and inspiration

Radiosondes have proven indispensable across a wide application range—from core meteorology and climate science to agricultural forecasting, where vertical profiles of humidity, temperature, and wind inform crop management and irrigation planning. Their adaptability extends further through external payloads such as ozone sondes, and even specialized launch techniques like double-balloon configurations, which extend flight duration and altitude coverage for advanced research missions.

Yet radiosondes are more than just instruments of record; they are also objects of curiosity and experimentation. Around the world, enthusiasts collect spent sondes, hack their electronics, and repurpose them for creative experiments, turning routine weather balloons into platforms for learning and innovation. This dual identity—precision tool for science and playground for exploration—underscores why radiosondes continue to inspire both professionals and hobbyists alike.

Well, whether you are a researcher, a student, or a curious tinkerer, radiosondes invite you to explore the atmosphere, experiment with technology, and contribute to the collective understanding of our dynamic skies.

T. K. Hareendran is a self-taught electronics enthusiast with a strong passion for innovative circuit design and hands-on technology. He develops both experimental and practical electronic projects, documenting and sharing his work to support fellow tinkerers and learners. Beyond the workbench, he dedicates time to technical writing and hardware evaluations to contribute meaningfully to the maker community.

The post Radiosondes: Disposable guardians of the sky appeared first on EDN.

The RF-ready GaN-on-silicon with lower parasitic losses

Fri, 06/05/2026 - 17:10

A new technology addresses a key performance barrier limiting the use of GaN-on-silicon semiconductors in mainstream RF applications. According to Scott Bibaud, president and CEO of Atomera, this will change the economics of GaN in RF by unlocking breakthrough RF performance on low-cost silicon substrates.

Gallium nitride (GaN) devices for high-performance RF applications are typically built on silicon carbide (SiC) substrates; while they offer robust performance, they are also costly and difficult to scale. On the other hand, silicon substrates offer a lower-cost, more scalable foundation with the potential to support larger wafer sizes and greater compatibility with standard silicon manufacturing.

However, GaN-on-silicon underperforms in RF applications due to parasitic channel losses that reduce efficiency, especially at high frequencies. Enter Atomera’s Mears Silicon Technology (MST), which claims to reduce these losses while offering robust linearity and lower-cost GaN solutions for 5G and other high-frequency RF devices.

MST—a quantum-engineered thin-film technology—introduces a thin, oxygen-modified layer near the surface of the silicon wafer to create a more favorable platform for GaN growth, making silicon a more viable foundation for high-performance RF devices. This controlled layer modifies the silicon lattice structure and helps block the diffusion of electrical dopants. That, in turn, improves crystal quality at the GaN-silicon interface.

MST can improve various wafer-level reliability measures in nitrided oxide planar devices. Source: Atomera

Incize, which provides characterization and modeling services for RF semiconductors, has performed RF characterization of the first MST-enabled samples. The Belgian company reports a substantial reduction in parasitic interface charge and a significant reduction in RF losses.

“Beyond the small-signal improvements, the large-signal results are particularly compelling,” said Mostafa Emam, founder and CEO of Incize. “Then there is a linearity benefit that extends into the high-power regime, approaching performance levels typically associated with advanced RF SOI technologies.”

In Atomera’s own testing, MST enabled more than a 10x reduction in parasitic channel charge, reducing a key mechanism of RF power loss and supporting improved high-frequency GaN device performance. The test data also shows that MST enables devices to handle significant power while maintaining signal quality—linearity—under stress.

Robert Mears, founder and CTO of Atomera, is quick to add that linearity is a top concern for RF designers. “The new data shows MST GaN-on-silicon achieving both the ultra-low RF losses and linearity metrics of advanced trap-rich RF SOI,” he said. “At the benchmark input power of 30 mW, the linearity is exceptional, 1000x better than the GaN-on-silicon reference wafer.”

Atomera, a semiconductor materials and technology licensing company, is based in Los Gatos, California.

Related Content

The post The RF-ready GaN-on-silicon with lower parasitic losses appeared first on EDN.

How to design a digital-controlled PFC, Part 4

Fri, 06/05/2026 - 15:00

Editor’s note: This is a multi-part series on how to design a digital-controlled PFC. Previous entries: 

High efficiency is a mandatory requirement in some applications, especially in data centers. The recently announced 80 Plus Ruby certification sets the highest efficiency standard for data center power-supply units (PSUs), as shown in Table 1. The new efficiency requirement is not only higher than 80 Plus Titanium at each load condition, but also requires 90% efficiency at a 5% load, which has never been specified before.

 

 

 

80 Plus test type

230V internal redundant

Percentage of rated load

5%

10%

20%

50%

100%

80 Plus Titanium

 

90%

94%

96%

91%

80 Plus Ruby

90%

91%

95%

96.5%

92%

Table 1 “Ruby” is the most recent and most stringent of the 80 Plus certification levels

With totem-pole bridgeless power factor correction (PFC) offering the best efficiency among all PFC topologies, digital control can further push the efficiency capabilities of this topology to new levels. In the fourth and final installment of this series, I will first introduce several digital methods to improve efficiency and then discuss some special PFC requirements including re-rush current control, electrical metering (e-metering) and PFC with a baby boost converter.

Dynamic dead time to achieve ZVS for synchronous switch

Theoretically, the PFC synchronous switch can operate with zero voltage switching (ZVS), but there must be a proper dead time between when the boost switch turns off and the synchronous switch turns on. As illustrated in Figure 1, assuming a positive cycle, when boost switch Q2 turns off, the inductor current (IL) starts to charge the output capacitance (COSS) of Q2 and discharge the output capacitance COSS of Q1, and the switch-node voltage rises.

If Q1 turns on before the switch-node voltage rises to the output voltage (VOUT), this is hard switching, and the switching losses are high. If Q1 turns on too late after the switch-node voltage rises to VOUT, the current will conduct in the third quadrant of Q1 with diode-like behavior. Since the gallium nitride field-effect transistor used for Q1 has a higher VSD drop compared to a silicon metal-oxide semiconductor field-effect transistor body diode, this induces a higher third-quadrant conduction loss.


Figure 1 This equivalent circuit describes a PFC synchronous switch during dead time. (Source: Texas Instruments)

Ideally, Q1 should turn on at the exact moment when the switch-node voltage rises to VOUT. Given the IL, VOUT and COSS of Q1 and Q2, the following equation calculates the time to charge the switch node from 0 to VOUT:

t=\frac{2C_{OSS}V_{OUT}}{I_L}

You can use firmware to dynamically adjust the dead time calculated from the equation to maintain ZVS for the synchronous switch.

CCM_TCM multimode control

A totem-pole bridgeless PFC can operate in either continuous conduction mode (CCM) or triangular current mode (TCM); each has its advantages and disadvantages. Table 2 provides a high-level comparison between the two modes.

 

CCM operation

TCM operation

Pros

  • Low peak-to-peak IL ripple.
  • Simple control.
  • ZVS.

Cons

  • Hard switching – high switching losses.
  • High peak-to-peak IL ripple.
  • Requires multiphase interleaved operation to reduce current ripple for high-power applications, resulting in low power density and high costs.
  • Complex control.

Table 2 Continuous conduction mode (CCM) and triangular current mode (TCM) options both have pros and cons for totem-pole power factor correction (PFC) operation purposes.

Ideally, the totem-pole bridgeless PFC could operate with multimode, as shown in Figure 2. At heavy loads or at the peak of an AC half cycle, the desired PFC input current is high and the PFC operates in CCM mode. When the load reduces or around the AC zero-crossing area where the desired PFC input current is low, the PFC switches to TCM mode and operates with ZVS.

Compared to pure CCM mode, this multimode operation has better efficiency at light loads because of ZVS. Compared to pure TCM mode, because the inductor current ripple is much lower, there is no need to use multiphase interleaved operation; therefore, this multimode operation significantly reduces the size and system costs. By combining the advantages of both CCM and TCM, this multimode operation can meet both high-efficiency and high-power-density requirements.


Figure 2 CCM_TCM multimode operation can meet both high-efficiency and high-power-density requirements. (Source: Texas Instruments)

Reference 1 provides details about this control method and its implementation. Figure 3 compares the efficiency (tested on the same board) between this CCM_TCM multimode control method and traditional CCM control, with efficiency improving as much as 2%.

(a) (b)

Figure 3 CCM_TCM multimode control delivers efficiency improvements versus traditional CCM control in both low line (a) and high line (b) environments. (Source: Texas Instruments)

Special burst mode – AC cycle skipping

Burst mode is widely used to improve efficiency at light loads. Unlike traditional pulse-width modulation (PWM) pulse-skipping burst mode, where you skip PWM pulses randomly, here I would like to introduce a special burst mode: AC cycle skipping, which is you skip one or more AC cycles in light loads.

In other words, you would turn the PFC off for one or more AC cycles and turn the PFC back on for the next AC cycle. The turnon and turnoff instance occurs at the AC zero crossing such that the whole AC cycle is skipped. Since PFC turnon and turnoff at inductor current equal zero, there is less stress and electromagnetic interference.

The number of AC cycles to skip is reverse-proportional to the load; the lighter the load, the more AC cycles skipped. Figure 4 shows the skipping of one and two AC cycles, respectively. Channel 1 is the AC voltage, and channel 4 is the AC current.

(a) (b)

Figure 4 Shown here is AC cycle skipping at a light loads: one cycle (a) and two cycles (b). (Source: Texas Instruments)

Once the PFC turns off, the switching losses, driving losses and reverse-recovery losses all drop to zero, and the power losses are just the PFC standby power.

When turning off the PFC to skip AC cycles, both the current loop and voltage loop need to be frozen; otherwise, the integrators in those loops will build up to generate a big PWM pulse when the PFC turns back on, causing a large current spike.

Determining whether the PFC enters a light load requires the load information. Normally there is no current sensor at the PFC output; therefore, it’s not possible to directly measure the output load. However, because the PFC voltage-loop output is proportional to the load, you can use the voltage-loop output as a rough indicator to determine whether the PFC is operating with a light load.

If you must precisely skip an appropriate number of AC cycles to maintain VOUT ripple within a specified range, you will need accurate load information, which you can obtain through an integrated e-meter function that I will discuss after the next section.

A big concern with AC cycle skipping is the VOUT drop during a load transient. Assuming that a load step-up occurs when the PFC is off, VOUT may drop too much.

To address this issue, you can compare VOUT to a predefined threshold through a comparator. Once VOUT is below this threshold, the PFC will immediately exit burst mode, disable AC cycle skipping, and return to normal operation. The PFC will handle the transient response as if there is no such special burst mode.

AC cycle skipping can also help reduce total harmonic distortion (THD) at light loads. Reference 2 compares THD with and without this method.

Re-rush current limit

The AC input voltage could suddenly drop out when PFC is operating normally. Since the load is still applied, the PFC VOUT could drop to a lower value. Then, when the AC voltage returns, if the AC input voltage is higher than VOUT, there will be an inrush current. This current is called the re-rush current.

Previously, the re-rush current was unspecified and there was no special control action for this event, it solely relied on the power-stage components’ ability to handle re-rush current. Test results show that re-rush current can jump more than 10 times higher than the PFC-rated maximum input current. Such a high re-rush current can either damage the power supply or reduce its lifetime.

The recently released Modular Hardware System– Common Redundant Power Supply (M-CRPS) specification requires limiting re-rush current when the input voltage resumes after an input brownout or blackout event on the power supply used in a data center. As shown in Figure 5, the root-mean-square (RMS) value of re-rush current should not exceed 5 times the maximum PSU rating over one-half cycle of input frequency, or 3.5 times the maximum PSU rating over one cycle of input frequency. In addition, the input current of the PSU should settle to a value less than or equal to two times the maximum PSU rating of the PSU within two cycles of the input frequency after applying the AC input.


Figure 5 The Modular Hardware System– Common Redundant Power Supply (M-CRPS) specification documents limits on both re-rush current and timing. (Source: Texas Instruments)

Reference 3 provides a firmware-based solution to handle this re-rush current so that when the AC voltage comes back from dropout, both the re-rush current (when VIN > VOUT) and the non-re-rush current (when VIN < VOUT) are well controlled – not exceeding the M-CRPS limit specification, but high enough to rapidly boost VOUT.

E-metering

Power supplies in data centers are required to measure the input power in real time and report the measurement to the host; this is called e-metering. The M-CRPS specification requires an input power measurement error within ±1% when the load is >125W, within ±1.25W when the load is between 50W and 125W, and within ±5W when the load is <50W. To achieve such high measurement accuracy, the e-meter function is traditionally implemented through a dedicated metering device, as shown in Figure 6a.

(a) (b)

Figure 6 These circuit diagrams show a traditional e-meter and PFC control (a), as well as combining an e-meter with PFC control (b). (Source: Texas Instruments)

A current shunt placed on the PFC input side senses the input current, with a voltage divider (not shown in Figure 6a) across the AC line and AC neutral senses the input voltage. A dedicated metering device receives this current and voltage information and calculates the input power and input RMS current information, sending the results to the host.

With a digital controller, since analog-to-digital converters (ADCs) of the microcontroller (MCU) are measuring both the input voltage and input current, it becomes possible to integrate the e-meter function into PFC control code. Figure 6b shows this e-meter configuration.

A current shunt senses the input current and an isolated delta-sigma modulator (the AMC1306 from Texas Instruments) measures the voltage drop across the current shunt. The delta-sigma modulator output is sent to the PFC controller MCU. The current information will be used for both e-metering and PFC current-loop control. A voltage divider senses the input voltage, which is then measured by the MCU’s ADC directly, just as in traditional PFC control. Reference 4 has more details about e-meter implementation and calculation.

Integrating e-meter functionality into PFC control code eliminates the need for a dedicated metering device, not only reducing system costs, but also simplifying printed circuit board layout and expediting the design process.

PFC with a baby boost converter

In server applications, a bulk capacitor (CBULK in Figure 7) is required to hold PSU output in regulation for more than 10mS after AC dropout. To accomplish this, a 3kW server PSU would need a total capacitance of over 1.3mF, which would consume at least 30% of the overall space. To improve power density, you must reduce the bulk capacitance.

Adding a baby boost converter between PFC and DC/DC, as shown in Figure 7 and described in Reference 5, can achieve high power density. The baby boost converter is a compact boost converter that only operates during AC dropout events.


Figure 7 A PFC with a baby boost converter can achieve high power density. (Source: Texas Instruments)

Figure 8 is a flow chart of baby boost converter operation. During normal operation, the baby boost converter is off and bypassed by a BYPASS FET Q4. When AC line dropout occurs and VBULK drops to a certain level, Q4 turns off, and the baby boost converter turns on to allow VBB to maintain its nominal value. If AC power returns, VBULK will rise; once VBULK rises to a certain level, MCU turns off the baby boost converter, turns on BYPASS FET Q4, and the PFC resumes normal operation.


Figure 8 This flow chart outlines the various stages of baby boost converter operation.

Conclusion

I hope that the information imparted in this series enables you to design your own digital-controlled PFC and meet ever-more-strict specifications. You will find that digital control is so flexible that is possible to implement advanced control algorithms that would be difficult to implement with analog control. A digital-controlled power supply also offers impressive performance.

References

  1. Sun, Bosheng. “A novel CCM-TCM multimode control method for totem-pole bridgeless PFC.” Texas Instruments Analog Design Journal article, literature No. SLYT877, 1Q 2026.
  2. Sun, Bosheng. “AC cycle skipping improves PFC light-load efficiency.” Texas Instruments Analog Design Journal article, literature No. SLYT585, 3Q 2014.
  3. Sun, Bosheng. “How to limit PFC re-rush current.” Texas Instruments Analog Design Journal article, literature No. SLYT865, 1Q 2025.
  4. Sun, Bosheng. “A low-cost and high-accuracy e-meter solution.” EDN, Aug. 26, 2024.
  5. Yu, Sheng-Yang, Benjamin Genereaux, and LiehChung Yin. “Improve power density with a baby boost converter in a PFC circuit.” Texas Instruments Analog Design Journal article, literature No. SLYT830, 2Q 2022.

Related Content

The post How to design a digital-controlled PFC, Part 4 appeared first on EDN.

MLPerf and the rise of latency-aware LLM benchmarking

Fri, 06/05/2026 - 12:28

Any discussion of modern AI system performance must include MLCommons and its MLPerf benchmark suite, which has become the industry’s de facto standard for measuring machine learning performance. Since its debut in 2018, MLPerf has provided a neutral, peer-reviewed framework for comparing hardware and software platforms across a broad range of AI workloads.

The original MLPerf benchmarks reflected the dominant AI workloads of the late 2010s. Early inference tests focused on models such as image classification with ResNet-50, natural language processing with Bidirectional Encoder Representations from Transformers (BERT), object detection with RetinaNet, and recommendation with Deep Learning Recommendation Model (DLRM).

These workloads were important and representative at the time, but they shared one characteristic: they were highly parallel and relatively easy to map onto GPU architectures.

For several years, benchmark results reinforced a simple narrative. Each new generation of accelerators delivered higher throughput, lower latency, and better energy efficiency. Because the workloads aligned well with GPU strengths, the benchmark curves rose steadily and predictably.

The generative AI shockwave: Rewriting the rules of MLPerf

Autoregressive LLMs introduced a fundamentally different inference pattern. Prompt processing remained highly parallel, but token generation became sequential and memory bound. Suddenly, raw TeraFLOPS no longer told the whole story.

MLPerf began incorporating this new reality in stages. Inference v4.0 introduced the first LLM benchmark based on Meta platform Llama 2 70B. This benchmark measured token throughput and provided the industry with its first standardized method for comparing LLM inference systems.

MLPerf Inference v5.0 released in 2025 significantly expanded the generative AI focus. It added Llama 3.1 405B Instruct, a 405-billion parameter model with a 128,000-token context window. The benchmark also introduced an interactive variant of Llama 2 70B that imposed strict limits on Time to First Token (TTFT) and Time Per Output Token (TPOT), two metrics that directly capture user experience in conversational applications.

These additions were pivotal because they exposed the core weakness of GPU-based inference systems. When unconstrained by latency, GPUs could buffer requests, create large batches, and deliver excellent throughput. Under interactive latency limits, batching opportunities shrank, hardware utilization dropped, and throughput fell sharply.

In other words, MLPerf began measuring not just how fast a system could run under ideal conditions, but also how responsive it remained under realistic conditions.

Inference disaggregation: Optimization of resources

This evolution reached another milestone in MLPerf Inference v5.1 and the emerging v6.x era. The benchmark suite broadened its focus to include increasingly sophisticated workloads, including reasoning models such as DeepSeek-R1 and more demanding long-context applications. At the same time, submissions began showcasing system-level optimizations such as inference disaggregation, where prompt processing and decoding are assigned to different accelerator pools.

Disaggregation has become one of the most consequential developments in modern inference benchmarking.

Historically, MLPerf treated each benchmark run as a single system under test, leaving vendors free to optimize their hardware and software stacks as they saw fit. As long as submissions complied with accuracy and latency requirements, any architectural technique was fair game.

This openness allowed participants to introduce increasingly sophisticated serving strategies. One of the most effective has been the separation of prefill and generation across distinct groups of accelerators. The prefill cluster handles the compute-intensive prompt processing stage, while the generation cluster focuses exclusively on token decoding.

In controlled benchmark scenarios, where prompt lengths and output lengths are known in advance, disaggregation can produce dramatic gains. By eliminating interference between the two phases, systems reduce preemption and improve latency-sensitive throughput.

Yet this raises an important question. Does the benchmark still measure accelerator capability, or is it increasingly measuring system orchestration? The answer is both.

Modern AI performance depends on the interaction between processor, memory hierarchy, interconnect fabric, runtime software, and serving algorithms. MLPerf has evolved accordingly. It now rewards system-level innovation rather than isolated chip performance.

That shift is entirely appropriate, but it also means benchmark results must be interpreted carefully.

A disaggregated configuration optimized for long document summarization may perform brilliantly in MLPerf while delivering more modest benefits in production environments where workloads vary continuously. Real-world deployments must cope with unpredictable prompt lengths, bursty traffic, and rapidly changing ratios of prefill to generation demand.

Consequently, MLPerf increasingly measures a system’s ability to align resources with a known workload profile. This is a valuable metric, but it’s not synonymous with universal real-world performance.

Illustrative comparison: MLPerf 5.x versus MLPerf 6.x

Table below illustrates how benchmark methodology evolved as MLPerf shifted from throughput-oriented LLM tests to more latency-sensitive and system-aware workloads. The numbers are representative rather than exact, but they reflect the broad trends seen in published results and vendor disclosures.

Publicly discussed MLPerf inference results based on Llama 3.1 405B LLM run on a leading-edge GPU-based processor in three scenarios (off-line, server mode, and interactive mode) highlight MLPerf’s evolution. Source: Author

From chip benchmark to system benchmark

The history of MLPerf mirrors the evolution of AI itself.

The early benchmark suites focused on relatively static workloads that aligned naturally with the strengths of GPU architectures. Tasks such as image recognition, recommendation systems, and conventional deep learning inference relied heavily on dense matrix operations and large-scale parallelism, allowing GPUs to demonstrate exceptional throughput and scalability. In that era, benchmark leadership was closely associated with raw compute capability, memory bandwidth, and increasingly larger accelerator configurations.

The rise of generative AI fundamentally changed that equation.

As autoregressive LLMs became the dominant workload, MLPerf evolved accordingly, introducing larger models, longer context windows, interactive server scenarios, and increasingly strict latency constraints. These additions exposed a critical reality: while GPUs remain extraordinarily efficient during the highly parallel prefill phase, they are far less efficient during token generation, where inference becomes sequential, memory-bound, and heavily dependent on latency-sensitive execution.

This shift transformed the meaning of benchmark performance.

Modern MLPerf results no longer measure the capabilities of an isolated accelerator alone. Instead, they measure the effectiveness of an entire inference architecture.

Disaggregation, scheduling policies, key-value (KV) cache management, streaming pipelines, runtime orchestration, and workload balancing have become just as important as the underlying silicon itself. In many cases, the benchmark winner is no longer the system with the most compute power, but the one that most effectively adapts a fundamentally sequential workload to hardware originally designed for massively parallel graphics and HPC computation.

As a result, benchmark interpretation has become significantly more nuanced. The headline numbers increasingly reflect how intelligently the system orchestrates resources across racks of accelerators, separates prefill from generation, minimizes preemption, and maintains throughput under realistic latency constraints. MLPerf has evolved from a pure hardware benchmark into a broader measure of system architecture and software orchestration.

At the same time, this evolution reveals something even more profound. The latest MLPerf 6.x requirements implicitly highlight the growing limitations of conventional GPU architectures for real-time LLM inference. The industry has reached a point where increasingly sophisticated scheduling mechanisms and disaggregated serving infrastructures are being used to compensate for a deeper architectural mismatch between autoregressive inference and massively parallel processors.

In many respects, the benchmark itself is beginning to suggest the next major transition in AI infrastructure design.

Rather than continuing to optimize architectures originally developed for graphics rendering and parallel numerical computing, the future may require entirely new inference-centric architectures built specifically for the unique characteristics of the LLM generation. Such architectures would need to deliver high utilization and low latency even with very small batch sizes—potentially down to a single user request—while minimizing data movement, reducing memory bottlenecks, and supporting continuous token generation without relying on increasingly complex orchestration layers to hide inefficiencies.

In that sense, MLPerf has become more than a benchmark suite. It is now a window into the architectural tensions shaping the future of AI computing, revealing both the extraordinary adaptability of modern accelerator systems and the growing need for a fundamentally new class of inference hardware designed from the ground up for the realities of autoregressive AI.

Lauro Rizzatti is a business development executive with Vsora, a technology company offering semiconductor solutions that redefine design performance. He is a noted chip design verification consultant and industry expert on hardware emulation.

Editor’s Note

This is Part 2 of the mini-series that examines how LLM inference forced changes to MLPerf benchmarking. In Part 1, contributor Lauro Rizzattti analyzes LLM inference across its two processing phases—prefill versus generation—and highlights how this workflow exposes structural inefficiencies in GPU-based accelerators.

Related Content

The post MLPerf and the rise of latency-aware LLM benchmarking appeared first on EDN.

Pages