Українською
  In English
Feed aggregator
The hidden bottleneck in LLM inference and the impact on MLPerf benchmarking

Recent frontier LLM inference benchmarks have highlighted a recurring pattern. GPU-based systems deliver outstanding throughput when latency is not a concern, but their performance drops sharply once real-time response requirements are imposed.
This behavior is sometimes attributed to software inefficiencies or suboptimal system tuning. In reality, the root cause lies much deeper. It reflects a fundamental mismatch between how GPUs are architected and how autoregressive inference works.
LLM inference: Prefill versus generation
To understand this limitation, it is useful to examine the two distinct phases of LLM inference: prefill and generation.
During the prefill phase, the model processes the entire input prompt in one pass. The prompt is tokenized, embedded, and propagated through every layer of the transformer network. At each layer, the model computes the attention relationships among all tokens and builds the key-value (KV) cache, which stores the intermediate data needed for subsequent token generation.
This stage maps extremely well onto GPU hardware. GPUs were designed to execute thousands of identical operations in parallel. In the prefill phase, the model performs massive matrix multiplications over large tensors, exactly the type of workload for which GPUs excel. When all tokens are available upfront, the calculations can be distributed across tens of thousands of cores, resulting in very high arithmetic utilization.
The generation phase is fundamentally different.
Once the KV cache has been created, the model begins producing output tokens one at a time. Each token depends on all tokens that came before it. This sequential dependency means that, regardless of how much hardware is available, the model cannot generate the next token until the current one has been completed.
For every generated token, the model must read the parameters for every layer, consult the KV cache, compute the next token probabilities, and then repeat the autoregressive process. The amount of computation per token is relatively modest, but the amount of data movement remains substantial.
Two faces of GPU architecture: Why modern GPUs struggle with real-time latency constraints
This is where the GPU architecture begins to work against the workload.
GPUs achieve peak efficiency when they execute large, highly parallel workloads with regular memory access patterns. Token generation offers neither. The workload is small, inherently sequential, and dominated by repeated memory accesses rather than dense arithmetic. Many of the GPU’s compute units remain idle while the device waits for data to arrive from high-bandwidth memory.
In other words, generation is not compute-bound; it’s memory-bound.
The distinction is crucial. In a compute-bound workload, adding more arithmetic units improves performance. In a memory-bound workload, performance is limited by how quickly data can be moved to the processors. Once memory bandwidth becomes the bottleneck, additional compute resources provide diminishing returns.
This explains why GPUs can appear extraordinarily efficient when throughput is measured without latency constraints. In that scenario, inference servers are free to buffer requests and combine them into large batches. Batching allows the system to process many token streams simultaneously, effectively transforming numerous small sequential tasks into a larger parallel workload that better matches the GPU’s strengths.
The role of batch sizes in GPU’s utilization
At first glance, batching in AI inference may appear straightforward. Unlike image inference where every sample in a batch completes simultaneously, LLM inference involves many conversations progressing independently and asynchronously. Some requests finish quickly, others may continue for hundreds or even thousands of decoding iterations, and new requests may arrive continuously while older conversations are still active.
The workload therefore becomes highly dynamic and irregular. Specifically, the generation of each request ends only when the model produces a special “end-of-sequence” token indicating that the response is complete.
This characteristic fundamentally changes the nature of inference scheduling.
This is where continuous batching becomes essential. Continuous batching is the runtime orchestration algorithm responsible for managing the simultaneous execution of multiple conversations across the same accelerator resources. Instead of treating inference as a sequence of isolated batches, the scheduler continuously inserts, removes, pauses, and resumes requests as tokens are generated.
The objective is to maximize hardware utilization while minimizing user-visible latency. As batch sizes increase, hardware utilization rises and throughput improves dramatically. However, batching comes at the cost of response time.
When users expect low latency, the system cannot afford to delay requests while waiting to accumulate a large batch. Each request must be processed almost immediately. As batch sizes shrink, the GPU loses the parallelism needed to keep its compute resources busy. Utilization falls, and throughput drops accordingly.
This is the central architectural limitation of GPUs in LLM inference.
The issue becomes even more pronounced when the same accelerator must handle both prefill and generation. Prefill is a large, compute-intensive task, while generation consists of many smaller, latency-sensitive operations. When new prompts arrive, the system may need to interrupt ongoing token generation to perform prompt processing. These context switches, often referred to as preemption, increase latency and reduce efficiency further.
Inference disaggregation: A clever shortcut to mitigate GPU’s inefficiencies
To mitigate this problem, system designers have begun disaggregating inference. Instead of assigning both phases to the same accelerator pool, they dedicate one group of GPUs to prefill and another to generation. The prefill GPUs build the KV cache and transfer it to the generation GPUs, which decode tokens independently.
This separation eliminates interference between the two phases and allows each group of GPUs to operate more efficiently. Prompt processing can proceed continuously without disrupting active token generation, and generation can continue without interruption.
In controlled benchmark environments, where prompt lengths, output lengths, and request patterns are known in advance, this approach can deliver substantial improvements.
Yet the underlying limitation of GPU architectures remains.
Inference disaggregation: Does it scale in real-world applications?
The generation phase is still sequential and memory bound. No amount of software optimization can eliminate the need to read model weights and cached data for each token. The disaggregated approach simply reduces scheduling inefficiencies and isolates the phases so that GPU resources are used more effectively.
Whether this strategy can scale efficiently in real-world applications depends on workload predictability.
The real-world AI services process a highly variable mix of requests. Some consist of long prompts and short responses. Others involve short prompts and long outputs. Demand can shift rapidly over time, changing the ideal ratio between prefill and generation resources.
Adapting to these changes requires dynamically reallocating accelerators. That process is not instantaneous. Devices must be initialized, model parameters loaded, and serving infrastructure synchronized. If traffic patterns are highly volatile, the overhead of reconfiguration can offset much of the benefit.
The broader lesson is that GPU performance in LLM inference is governed by more than raw TeraFLOPS.
The prefill phase showcases the strengths of GPUs, leveraging dense matrix operations and massive parallelism. The generation phase exposes their weaknesses, forcing highly parallel processors to execute a fundamentally sequential, memory-dominated workload.
As a result, the impressive throughput numbers often reported in unconstrained benchmarks can be misleading. They reflect idealized conditions in which batching hides architectural inefficiencies. Once latency constraints are introduced, those inefficiencies become visible.
The challenge for the industry is not simply to build larger GPUs, but to develop architectures and system designs better aligned with the realities of autoregressive inference.
Until then, the most significant limitation in real-time LLM serving will remain the same: generation is a sequential, memory-bound process running on hardware originally optimized for massively parallel computation.
Lauro Rizzatti is a business development executive with VSORA, a technology company offering semiconductor solutions that redefine design performance. He is a noted chip design verification consultant and industry expert on hardware emulation.
Editor’s Note
In a two-part series, contributor Lauro Rizzattti examines how LLM inference forced changes to MLPerf benchmarking. He will illustrate the evolution of the MLPerf benchmark and detail how generative AI forced a radical shift in AI hardware evaluation in the upcoming Part 2.
Related Content
- Strategies to Dominate the AI Accelerator Market
- A closer look at LLM’s hyper growth and AI parameter explosion
- The role of AI processor architecture in power consumption efficiency
- AI GPU computing delivers data-center performance on the factory floor
- The truth about AI inference costs: Why cost-per-token isn’t what it seems
The post The hidden bottleneck in LLM inference and the impact on MLPerf benchmarking appeared first on EDN.
My first college project
| This is a portable lab device help to do experiment like diodes bjt amplifier gain and act as a function generator [link] [comments] |
Build 2026: Accumulating evidence of Microsoft’s AI independence

Abundant use of the AI acronym is increasingly evident at various industry events. Strip away the hype layer and look deeper, however, and interesting trends still emerge into view.
This is my third straight year covering Microsoft’s developer-focused conference, following up on the 2024 and 2025 show editions. And interestingly (at least to me), the event timing, both in an absolute sense and relative to other notable industry trade shows, has shifted each year.
- 2024’s Build took place on May 21-23, the week after Google’s I/O developer event (May 14-16) and several weeks before Computex (June 4-7)
- Last year, all three conferences took place on the same week
- And this year, the Google I/O and Microsoft Build cadence returned to separate-weeks spacing, two weeks apart this time. Conversely, Build and Computex were still in the same-week slot.
Why the upfront focus on this seeming nuance? Well, for one thing, Computex conversely is a consumer-tailored show. That’s why, for example, Microsoft and NVIDIA co-announced one new computer (information on which I’ll share shortly) at Computex, while introducing another with a different form factor but the exact same processing subsystem at Build. Plus, in emphasizing a point that is likely already obvious to at least some of you, any chronological spacing between two companies’ events enables the latter to fine-tune its announcements and their messaging to react to the former…and the more spacing the better from a reaction-robustness standpoint.
Speaking of announcements, let’s get to them, shall we? Microsoft CEO Satya Nadella and his various lieutenants, along with a couple of special guests, covered a lot of ground in the 2.5-hour kickoff keynote, the video of which I’ve embedded below. I’ll hit what I thought were the highlights in the following paragraphs.
AI inference-accelerating hardwareAbout those computers I just mentioned…stop me if you’ve heard this before. Microsoft and a partner roll out new Windows-on-Arm computer platforms, both mobile and mini-desktop in shape, and intended for both consumers and developers. Two years ago, that partner was Qualcomm, the SoCs were the Snapdragon X Elite and Plus, and the consumer mobile systems were the Surface Laptop and Pro (also accompanied by ones from other OEMs, in a nod to Microsoft’s broader Windows-on-Arm aspirations). The developer mini-desktop was the Snapdragon Dev Kit for Windows, which never made it to production: Qualcomm “indefinitely paused” it only a few months later:

This outcome was more than a bit of a surprise to me, albeit not a complete surprise, as I’d been hearing for some time of both chronic hardware and software issues with the platform. That said, I already owned (and still use) its two Qualcomm application processor-based, developer-tailored predecessors, the Qualcomm-branded ECS LIVA Mini Box QC710:
and Microsoft’s “Project Volterra” (officially: Windows Dev Kit 2023) system:

so the Snapdragon Dev Kit for Windows was unsurprisingly on my wish list, too.
Hopefully NVIDIA will have better luck, although the situation still feels somewhat embryonic. Consumer mobile system(s) first: launched at Computex and coming “this fall” at an as-yet-unannounced price is the Microsoft Surface Laptop Ultra, based on NVIDIA’s RTX Spark SoC:
While you might not immediately recognize the processor from its new marketing moniker, you’ve heard about it (from me, to be precise) before. It was previously known as the N1 and N1X, as well as the GB10, and it’s the outcome of a co-development project with MediaTek, who contributed the up-to-20-core CPU constellation and reportedly also took lead on full-chip integration, including the NVLink interconnect to the up-to-6,144 core GPU cluster.

The SoC’s development has been lengthy and troubled, if longstanding and widespread rumors are to be believed, and industry analyst skepticism remains existent. It first appeared in a Linux-based system, the DGX Spark (rebranded from its initial name, Project DIGITS), last October:

And now, NVIDIA has determined that the RTX Spark is finally ready for Windows-based laptops (and not just from Microsoft itself, just as was the case two years before with Qualcomm). But not now. “This fall”. At a price to be announced later, but likely stratospheric if due only to the industry constraints-driven currently pricey “up to 128GB of unified memory”. And what about the developer mini-desktop system, the Surface RTX Spark Dev Box, unveiled at Build?
There’s…umm…a waitlist. Microsoft CEO Satya Nadella invited the Build attendees to join him on it. None of which inspires much in the way of confidence. Maybe one or both systems will be available for sale in time to end up on this November’s edition of my yearly “Holiday shopping guide for engineers”, but at this point, I’d be (pleasantly, mind you) surprised.
If you’re once again feeling déjà vu, by the way, it’s because Microsoft and NVIDIA have been here before. The initial attempt at bringing a Windows-on-Arm system to market, the Surface with Windows RT, was based on an NVIDIA Tegra SoC. I personally owned one and ended up tearing it apart after it eventually died. The hardware was first-rate for the time, although a dearth of native software in conjunction with woeful x86 code emulation support doomed it.
That was 2012. Jump forward again to the other, earlier-mentioned déjà vu moment, Qualcomm’s announced partnership with Microsoft in 2024, and I feel compelled to point out that by no means is it seemingly deceased (or even on life support, for that matter). I recently acquired a gently used Microsoft Surface Pro 11 based on Qualcomm’s Snapdragon X Plus to replace my long-in-the tooth Surface Pro X. The SP11 has 16 GBytes of RAM and a 1 TByte SSD and runs solely on its integrated battery all day with ease, even when emulating x86. Microsoft systems based on second-generation Snapdragon X2 Elite (and presumably also Plus) SoCs are seemingly coming soon. And on a similar note, Microsoft’s still churning out branded systems based on x86 CPUs, too, with most recent updates less than a month ago.
Agentic-centric O/SsOne particularly memorable quote from Satya Nadella in the keynote was the following:
“There’s a real platform shift. We’re moving from building operating systems, devices for apps, to agents.”
Indicative of this forecasted shift is Project Solara, explained by means of a conversation between Nadella and Qualcomm President and CEO Cristiano Amon:
along with an Android-derived proof-of-concept demonstration showing agent-based interactions with (and between) a smart speaker with a screen, mobile devices, and intelligent ID cards. Google also spoke a great deal about agentic AI at its I/O developer conference two weeks ago; instead of repeating myself again, I’ll refer you to my coverage of that event for the background info if you need it.
Speaking of agents, Microsoft also announced Execution Containers, which keep agents from accessing unintended, critical regions of other agents and applications, the underlying operating system and system hardware. And for when you want to communicate with them, OpenClaw founder Peter Steinberger showed up on stage by means of introducing Scout, an OpenClaw AI Assistant gateway. If you’re thinking it sounds at least something like Gemini Spark, which Google announced two weeks back, you’re not off-base. Remember my comments at the beginning of this piece about competing-event timing and ordering and effects on later-event messaging?
Homegrown modelsLast but not least, let’s touch on an event topic that prompted the “AI Independence” title of this piece. In late April, OpenAI and Microsoft “redefined” their business relationship, in the process fundamentally freeing both companies from the various exclusivity arrangements that had previously defined (and arguably dominated) it. While a “divorce” would be overstating the result, a “softer” term such as “conscious uncoupling” wouldn’t be far off.
One tangible outcome of this redefinition was clearly evident this week, as Mustafa Suleyman, head of Microsoft AI, unveiled seven new homegrown AI models with capabilities spanning image, voice and transcription functions and claimed performance matching if not exceeding that of Google, OpenAI and other competitors’ models, both open- and closed-source. I was particularly interested in Suleyman’s declaration regarding MAI-Thinking-1, the flagship reasoning model, that:
“We trained it from the ground up on clean data, without distillation from third-party models.”
And with that, I’ll wrap up for today. As always, I welcome your thoughts in the comments on the topics I’ve covered here, as well as any others that might have caught your eye—Microsoft’s ongoing research work on quantum computing, for example, including the development of Majorana 2, the sequel to last year’s premier quantum computing chip from the company.
Next Monday, Tim Cook and his CEO successor John Ternus (I’m assuming) will hit the stage to kick off Apple’s yearly Worldwide Developers Conference (WWDC), completing the yearly big-tech-company developer conference triumvirate. I’ll see you back here then, if not before!
—Brian Dipert is the associate editor, as well as a contributing editor, at EDN.
Related Content
- Google I/O 2026: Agentic AI gets serious
- Microsoft Build 2025: Arm (and AI, of course) thrive
- Microsoft’s Build 2024: Silicon and associated systems come to the fore
- A holiday shopping guide for engineers: 2025 edition
The post Build 2026: Accumulating evidence of Microsoft’s AI independence appeared first on EDN.
Agilex 9 FPGAs power COTS VPX boards

Altera has partnered with Mercury Systems and VadaTech to expand its Agilex 9 FPGA ecosystem with COTS VPX boards for mission-critical defense platforms. These solutions integrate Agilex 9 medium-band Direct RF FPGAs into VPX architectures, including SOSA-aligned OpenVPX, to help defense customers accelerate time-to-market, reduce SWaP, and enable flexible software-defined RF capabilities.

The Agilex 9 FPGAs combine RF data converters, FPGA fabric, and high-speed transceivers into a unified, programmable architecture, enabling real-time processing of large volumes of RF data at the edge. This integration supports distributed, multi-domain operations that require rapid decision-making and adaptation to changing mission requirements. The devices deliver the bandwidth, performance, and I/O needed for demanding embedded applications such as adaptive radar, cognitive electronic warfare, and secure, software-defined communications.
Mercury Systems’ DRF5660 boards and VadaTech’s VPX540 boards with Agilex 9 Direct RF AGRM027 FPGAs are available for order today.
The post Agilex 9 FPGAs power COTS VPX boards appeared first on EDN.
Value DSCs streamline embedded control

Digital signal controllers (DSCs) in Microchip’s dsPIC33CK Value Line provide real-time control for cost-sensitive designs. Starting at $0.51 each, they offer consistent pricing regardless of order size. The 16-bit controllers deliver 100-MHz deterministic processing, high-resolution PWM, and a 12-bit ADC supporting motor control, precision sensing and control, and touch/HMI applications.

A balanced set of peripherals helps reduce external component count, PCB footprint, and overall BOM cost. With flash memory ranging from 32 KB to 256 KB and compatibility across the dsPIC33CK family, the Value Line DSCs enable scalability and migration to future designs. The devices integrate a 12-bit ADC capable of up to 2 Msamples/s, four PWM pairs with resolution down to 2 ns, and on-chip analog comparators with a 12-bit DAC. Communication interfaces include CAN FD, LIN, SENT, UART, SPI, and I2C.
To accelerate evaluation and development, Microchip offers the dsPIC33CK Value Line Curiosity Nano evaluation kit with an onboard debugger. The evaluation platform supports the Curiosity Nano base for Click Boards and a touch adapter board for touch applications. A motor control DIM is also available for rapid prototyping of motor control designs.
Value Line DSCs are available directly from Microchip, its sales representatives, or authorized distributors.
The post Value DSCs streamline embedded control appeared first on EDN.
RF tool captures reusable design workflows

Keysight’s RF Circuit Simulation Professional software now enables engineers to document their design workflow on an executable whiteboard. The software replicates design decisions while capturing simulations, optimizations, decision trees, and parameters derived from prior analyses. Each step generates editable Python code that can be saved, shared, replayed for design reviews, and redeployed across the Keysight Advanced Design System (ADS), Cadence Virtuoso, and Synopsys Custom Compiler environments with full design data traceability.

Design teams often face workflow inefficiencies, simulation bottlenecks, and knowledge-transfer challenges. Engineers can build workflows visually on an executable whiteboard while the software automatically generates corresponding Python scripts. The platform executes simulations, optimizations, and design decisions in sequence, with support for decision-based loops and parameter settings.
Each workflow becomes a repeatable methodology that can be shared across teams, reused, and driven by AI. Captured workflows help preserve RF design expertise while creating structured design data that can support future AI-driven automation and training. Design review and tapeout tasks that previously required manual configuration now execute automatically.
RF Circuit Simulation Professional
The post RF tool captures reusable design workflows appeared first on EDN.
Buck controller streamlines in-vehicle USB charging

Diodes’ APK43070Q synchronous buck controller integrates a USB Type-C PD 3.1 source controller, simplifying automotive single- and multi-port charging designs. Operating from a 4-V to 36-V input, it enables USB Type-C charging up to 140 W. The device supports USB extended power range (EPR) and adjustable voltage supply (AVS) up to 28 V, along with standard power range (SPR) and programmable power supply (PPS) up to 21 V.

The constant-frequency controller features integrated drivers, optimized dead time, and elevated gate drive voltage for efficient mid- to high-power charging using external N-channel MOSFETs. This allows flexible MOSFET selection to balance thermal performance and power loss. A VIN DC pass-through mode further improves converter performance by enabling the high-side MOSFET to act as the VBUS switch, eliminating the need for an additional output switch.
An I2C interface with a controller/target addressing scheme enables power sharing across up to eight USB Type-C ports via resistor selection without an external MCU. The APK43070Q also includes overvoltage, overcurrent, undervoltage, and thermal protection.
The APK43070Q is priced at $0.80 each in 1000-unit quantities.
The post Buck controller streamlines in-vehicle USB charging appeared first on EDN.
Low-noise USB scopes deliver 16-bit resolution

Pico Technology has launched the PicoScope 5000E series of USB-C oscilloscopes for analog, digital, and mixed-signal debugging. The four-channel scopes provide true 16-bit resolution with bandwidths to 200 MHz, sample rates to 2.5 Gsamples/s, and up to 1 GS of memory. PicoScope 5000E Plus models also offer a switchable 8-bit high-speed mode that raises bandwidth to 500 MHz, sample rates to 5 Gsamples/s, and memory to 2 GS.

With an ultra-low-noise front end, the oscilloscopes achieve a noise floor below 22 µV RMS and total harmonic distortion better than -73 dB. The resulting dynamic range helps reveal small-amplitude components, ripple, distortion, and other anomalies that lower resolution or noisier instruments can miss.

The compact, portable scopes connect to a host computer through a SuperSpeed USB 3.0 Type-C interface. For debug and validation, Pico 7 software provides more than 40 serial protocol decoders, advanced math channels, automated measurements including power analysis, multi-capture analysis, and measurement and mask limit testing. The Pico SDK supports custom application development using C, C#, C++, Python, MATLAB, and LabVIEW.
The PicoScope 5000E series is available in four-channel and 4+16-channel mixed-signal oscilloscope variants, with bandwidth options from 60 MHz to 500 MHz depending on model and operating mode. Units are sold through authorized distributors worldwide and directly from Pico Technology.
The post Low-noise USB scopes deliver 16-bit resolution appeared first on EDN.
🏆 Міжнародний конкурс студентських наукових робіт зі штучного інтелекту 2026
КПІ ім. Ігоря Сікорського запрошує взяти участь у Міжнародному конкурсі студентських наукових робіт зі штучного інтелекту. Учасники зможуть представити власні дослідження у сфері ШІ та долучитися до міжнародної наукової спільноти.
EPC targets high-density motion systems with GaN ePower Stage technology
Navitas collaborates with NVIDIA MGX Ecosystem to accelerate 800VDC AI infrastructure
From road to rack: 800V EV innovations redefining AI data-center power architecture
Finished DIY Vacuum Tube Oscilloscope
| On the bottom left it is shown next to its accompanying vacuum tube power supply, not a single semiconductor used in the whole setup. Wiring is horrible, and its performance reflects that. But at least it looks nice. Uses a 2" diameter 902 CRT, and is based mostly on a 1945 RCA schematic for this tube. The CRT only runs at <600V (schematic specifies 577V, mine only runs on ~400V), which is remarkably low for a CRT but it definitely still hurts. Uses two 6SJ7 pentodes for vertical and horizontal amplification with a Type 884 thyratron for sawtooth generation. Has x-y mode and internal/line/external sync. Rectification is done with a Type 80 for B+ and a 6AU4 for the negative CRT supply (grounded anode). The tube could maybe use some magnetic shielding and I am trying to figure that out, but for now I just keep the power supply away from it to eliminate the interference. Whole thing uses a little over 60W when running and is fused accordingly. This is by far my highest-effort electronics project ever, and I am very glad to be done with it! I started this project over a year ago before, I got my real oscilloscope. Whadaya think? [link] [comments] |
🎥 КПІ ім. Ігоря Сікорського об’єднав Київ і Кіото піснею
Уже втретє з початку повномасштабного вторгнення Міжнародний хор Кіото організовує на День Києва пісенний телеміст між містами-побратимами — як знак підтримки киян і солідарності з Україною.
Figured out why my Xbox controller adapter burned me
| It wasn't working, so unplugged it and the metal was hot as hell. So took it apart, soldered some leads for power and gave it some juice. Got a lot hotter than I was expecting. Resistor was reading as .5Ω [link] [comments] |
I’ve never seen capacitors that look like this before.
| I’m a graduate electrical engineer with over 12 years of experience in electronics. I’ve worked on a wide range of projects, and I thought I had seen most things by now… but I’ve never seen capacitors that look like this. [link] [comments] |
Made this water level indicator as my college project.
| submitted by /u/Public_Ice_736 [link] [comments] |
Triply simply sequence supply voltages

This circuit design for power supply on/off sequencing uses Schmidt triggers for triple-positive-rail timing purposes.
Recent design ideas have explored the utility of timed power supply ON/OFF sequencing and provided circuit designs to implement it. Figure 1 shows a simple topology using Schmidt triggers for timing the turn ON and OFF of triple positive supply rails. Here’s how it works.
Wow the engineering world with your unique design: Design Ideas Submission Guide

Figure 1 This significantly simple supply sequencing scheme leverages Schmidt triggers.
Switching action begins with SPDT S1 in the OFF position which holds the C1 and C2 timing caps discharged. The latter holds U1 pin 1 at 15v and therefore its pin 2 and the NFET Q2’s gate at zero, forcing the 5Vout rail OFF.
Meanwhile, C1’s discharged state holds U1’s pins 3 and 5 low so pins 4 and 6 sit high. The former holds enhancement mode PFET Q1 and the 15Vout rail OFF, while the latter does the same for level shifter Q3, PFET Q4, and the 24Vout rail.
Therefore no power flows to the connected loads. Yet, at least. Figure 2’s left side graphs the sequence of events initiated by actuating S1.

Figure 2 This plot shows power sequence timing when S1 is flipped ON and later flopped OFF.
C2 connects to ground through R3, quickly charging it to the Schmidt trigger low-going threshold in about R3C2 = 1mS. This inverts U1 pin 2 to 15v, placing a net forward bias of 15 – 5 =10V on NFET Q2, turning it and the 5Vout rail ON. Thus they will remain as long as S1 stays ON.
Meanwhile, reset of C1 has been released, allowing it to begin charging through R1 + R3. The first thing that happens occurs at the end of T1 when U1 pin 3 reaches the ~9V Schmidt threshold. Since the timeout duration is proportional to C1, any desired interval can be chosen with an appropriate RC product. U1 pin 4 then snaps low, PFET Q1 turns ON and 15Vout goes active.
Of course C1 continues to charge, so at T2 U1 pin 5 also reaches its triggering threshold. Then its pin 6 snaps low, turning ON Q3, Q4 and 24Vout. The ratio R4 = 10 R5/(15 – 0.7) was chosen to apply an adequate and safe ~10V drive to Q4’s gate, independently of 24Vin. The S1 flip ON sequence is now complete.
The right side of Figure 2 shows what happens when S1 subsequently flops OFF. First, C1 is promptly discharged through R3, turning OFF Q1, Q3, Q4 and thereby 15Vout and 24Vout, putting them and whatever they power to sleep. Meanwhile C2 begins ramping up, taking T3 to get to U1’s threshold. When it completes the trip, pin 2 goes low, turning Q2 and 5Vout OFF.
Turnoff sequencing is therefore complete. Nighty night.
Details of the design include D1 and D2. Their purpose is to make the sequencer’s response to losing and regaining of the input rail voltage orderly, and to do it regardless of whether S1 is ON or OFF. If S1 is OFF, then all output rails remain low and (a safe) nothing occurs when the supply voltages return. If it’s ON, then a normally timed (and therefore safe) power-up sequence is executed.
Note that the MOSFETs should be chosen for adequate voltage and current handling capacities. Because Q1 has 15v of gate drive and Q2 and Q4 get 10v, none need be sensitive logic-level types.
Okay. But what if you also need to sequence a negative supply rail? Figure 3 shows how.

Figure 3 This power switching circuit works with a negative rail.
When the U1 inverter’s input rises above the Schmidt trigger voltage, its output snaps low, causing the 2N3906 to pass Ic = (+15Vin – 0.6)/15k = 0.96mA. This develops a 10.6V that’s independent of –Vin across the 11k resistor, saturating the NFET. If symmetrical polarity rails (e.g. +/-15v) are needed, Figure 3 can be added to Figure 1 to provide the negative side with no other modifications required.
Stephen Woodward‘s relationship with EDN’s DI column goes back quite a long way. Over 200 submissions have been accepted since his first contribution back in 1974. They have included best Design Idea of the year in 1974 and 2001.
Related Content
- Silly simple supply sequencing
- Single switch controls sequential operation of multiple power supplies
- Short push, long push for sequential operation of multiple power supplies
The post Triply simply sequence supply voltages appeared first on EDN.
Ayar Labs joins NVIDIA NVLink Fusion ecosystem to bring CPO to rack-scale AI infrastructure
Rohde & Schwarz Secures Critical Certification for Next-Gen eCall Compliance
The hybrid eCall test specification EN 18052 states that a hybrid system must combine different transmission paths and protocols to make sure an eCall reliably reaches its destination. In practice, this means a vehicle uses NG eCall functions (IP/IMS-based voice and data over 4G/5G) but can automatically fall back to available classic CS eCall (2G/3G) transport paths when coverage or service quality degrades. Manufacturers need to validate hybrid implementations to ensure they can trigger calls, transmit the minimum set of data (MSD), maintain GNSS positioning, and deliver intelligible voice quality across multiple network scenarios, including voice over New Radio (VoNR), voice over LTE (VoLTE), and circuit-switched fallback. Tests must demonstrate that a system remains robust during handovers and under degraded radio conditions, while also complying with relevant CEN, ETSI, 3GPP, and national requirements.
“We use the solution for functional tests and protocol conformity tests as well as for the type-approval of In-Vehicle Systems (IVS) that implement hybrid eCall and NG eCall,” says Thomas Reschka, Senior Technical Consultant at cetecom advanced.
Rohde & Schwarz has updated its eCall evaluation solution, CMX-KA09x, to support compliance with EN 18052:2025 and EN 17240:2024+A1:2026. The CMX-KA099 option completed Public Safety Answering Point (PSAP) test scenarios in accordance with EN 18052:2025, while the CMX-KA098 option completed PSAP test scenarios in accordance with EN 17240:2024+A1:2026. This marks an important step toward meeting European requirements for NG eCall test systems. The test environment allows the simulation of the real world mobile network conditions and the emulation of various network scenarios. This is a significant advantage in preparing for certifications or the market launch of new vehicle models.
The post Rohde & Schwarz Secures Critical Certification for Next-Gen eCall Compliance appeared first on ELE Times.



