ST made history in 2016 when they announced their STM32H7, the most powerful Cortex-M7 MCU, which broke the 2000-point threshold in CoreMark. What was even more remarkable was that soon afterward, they released their STM32F413 and crypto-enhanced STM32F423 microcontrollers (MCUs), which offer an impressive price-per-performance ratio for entry-level systems. So in the span of a few months, ST changed the industry by offering components covering the entire spectrum of microcontroller-based applications.
The Power of Flash in the STM32F413 and STM32F423
The STM32F413/423 are head and shoulders above the competition because of ST’s mastery of the ARM® Cortex®-M4 architecture and their 90-nm process node. This translates into the chip’s ability to reach a score of 339 in CoreMark while running at just 100 MHz. One enabler of this performance is ST’s Adaptive Real-Time (ART) Accelerator, an architectural optimization that executes code from the Flash memory at a similar performance level to 0-wait state memory, according to benchmarks performed in CoreMark.
Given that Flash is much slower than the more traditional SRAM, this seems counterintuitive. However, the ART Accelerator’s brilliance resides in ST having divided its Flash into chunks of 128-bits. With this, each piece can store from four to eight instructions. Then, by pre-fetching these instructions together, the MCU can immediately feed five to six instructions on average into its pipeline. Thanks to this full pipeline, the Cortex-M4 doesn’t have to idle waiting for the system to fetch another set of instructions.
The ART Accelerator also offers a branch cache that stores the instructions of a program’s specific branch. When the code is no longer sequential, and the instructions of a particular branch are not in the pipeline, the MCU typically has to wait as it fetches the relevant information. Branches slow things down, but are inevitable in programs. ST’s innovative architecture greatly optimizes performance by including a mechanism to store the content of that branch in a special cache to ensure these instructions get to the pipeline quicker the next time around.
There are many more features included in the STM32F413/423, such as the Quad SPI interface to access off-chip memory and peripherals, a Floating-Point Unit, DSP instructions, and a serial audio interface, or an LCD interface for 16-bit QVGA or 8-bit WQVGA screens. To top it off, these MCUs offer different low-power modes to ensure designs remain extremely power-efficient.
For instance, Sleep mode can shut down the Cortex-M4, but still keep the peripherals running, enabling them to wake the CPU when they require its processing capability. ST also offers a lot of customization possibilities on the clocks, which can be stopped (Stop mode), slowed down, or gated to a certain threshold to maximize efficiency. Finally, a Batch Acquisition Mode (BAM) can put the architecture in Sleep mode and turn the Flash off. Thus, peripherals can still collect data and transfer it to a memory using a DMA interface while keeping power consumption to a bare minimum. In Stop mode, this means a mere 18 µA.
All Budgets Welcomed!
What is probably the most remarkable aspect of ST’s latest offering is that a development board integrating the STM32F413, the NUCLEO-F413ZH, costs only $20 at major resellers (such as or ). At such a low price, it becomes the Swiss Army knife of entry-level boards for all professionals or students developing multimedia or industrial applications. ST’s Open Development Environment ecosystem also comes with sample codes, a great SDK, and well-documented libraries to quickly enable the exploitation of the architecture’s features. Furthermore, using an STM32 microcontroller means being able to switch to other low-power STM32 MCUs, or a more powerful processor, while still retaining most of the application’s code, thanks to a catalogue of pin-to-pin compatible MCUs that work for engineers, not against them.