Update: November 19, 2020
We just launched a qualified Alexa Voice Service for AWS IoT reference design, which replaces X-CUBE-VS4A. The new solution provides a hardware implementation that solves many technical challenges, like the close placement of microphones in tight spaces. It’s also the first single-chip design in the industry. Additionally, the firmware will help developers with signal processing, such as noise reduction measures, echo cancellation, and beam-forming algorithms. The reference design also comes with an evaluation license for the Alexa wake word detection technology. Put simply, it’s the new reference design for engineers looking to build smart embedded systems, such as appliances.
Original Text: June 8, 2018
X-CUBE-VS4A is the first software package to bring Alexa Voice Service (AVS) to our microcontrollers (MCU). Today, we associate voice services from Amazon, Apple, or Google with smart speakers and if these devices are new and successful, they often look similar. They adopt a cylindrical shape, they are tethered to a wall socket and can get so hot that they can possibly scar certain wood surfaces. This design homogeneity often stems from the use of a beefy application processor that connects to cloud services and processes information, but limits what engineers can actually create. X-CUBE-VS4A is thus a significant breakthrough because it brings AVS to more portable applications to open designers to a whole new type of smart devices.
ST continues to partner with Amazon in more ways than one (see our STM32 products in its different stores), and the focus on Alexa Voice Service brings out a new aspect of our collaboration. Thanks to Amazon’s SDK (Software Development Kit), engineers can take advantage of Amazon’s APIs to bring voice control to their device and benefit from a lot of the same infrastructure that makes Amazon Echo speakers unique. Whether it is to control home appliances, check weather forecasts, or get the answer to a burning question at three in the morning by just using voice commands instead of turning on a phone, AVS offers a rich experience that sets the benchmark for the rest of the industry. Thanks to X-CUBE-VS4A, it’s going to be a lot easier to bring AVS to small devices because using a power-hungry application processor is no longer necessary.
Alexa Voice Service: From MCU to Cloud
Indeed, the most prominent feat of X-CUBE-VS4A is that it ported to STM32 MCUs the necessary protocols responsible for connecting a device to the AVS cloud and that it optimized certain aspects for the hardware units of our microcontrollers. For instance, the libraries in X-CUBE-VS4A use our crypto-cores to accelerate cryptographic operations, thus saving energy and increasing performance. Currently, only STM32F7 and STM32H7 components are compatible with the software pack, because AVS’s current implementation requires a fair amount of memory and computational throughput. However, over time, we can expect Amazon to further optimize its solution, and X-CUBE-VS4A still represents a massive achievement as it’s the first time it can efficiently run on an MCU.
X-CUBE-VS4A is also a testament to the ST ecosystem. Very often, teams gravitate toward traditional Linux systems because they already possess significant tools such as a TCP/IP stack that significantly simplify development, whereas choosing an MCU can sometimes mean starting from scratch. However, X-CUBE-VS4A provides all the libraries, drivers, and routine developers will need for our component. Furthermore, unlike competing solutions for popular operating systems, our software pack will help teams get the right AVS certifications faster. Before a company can sell a product that connects to AVS, Amazon certifies that the system respects specific latency and protocols, among other things, and X-CUBE-VS4A helps meet those requirements.
From STM32F7 to AVS and Back
To ensure engineers can quickly experiment with some of the features of X-CUBE-VS4A, we’ve included application examples for our STM32F769 Discovery Kit. The board uses a STM32F769NIH6 MCU with 512 KB of RAM and 2 MB of Flash. It’s also possible to connect it to the Internet through a Wi-Fi daughter board or its Ethernet port. Finally, it also has the audio front-end necessary to use a simple demo application. Very simply, the Discovery kit captures the surrounding audio using one of its omnidirectional MEMS microphones and pre-processes the signal using algorithms from Sensory to enable speech recognition and keyword spotting. This is also known as the audio front-end.
If the system determines that the user pronounced the keyword that wakes the system up (in this case “Alexa”), the X-CUBE-VS4A libraries and tools send the clean audio buffer to the AVS cloud and receive a response from Amazon. Indeed, the cloud servers send a confirmation that AVS understands the question then transmit an MP3 file containing the answer. Developers will then have to come up with a media player to play the answer from Alexa and any other music file from streaming services if this is a feature that engineers want to offer to their users.
All the Building Blocks
It’s important to note that X-CUBE-VS4A only ports Amazon’s Alexa Audio Service SDK to our STM32 MCU. Hence, just like when using the traditional AVS SDK, engineers using our software pack will still require third-party technologies for the audio front-end. However, since our solution uses Sensory algorithms, developers just need to get a license to replicate our system, which should shorten their development and prototyping phase. Similarly, our Discovery board is only using one omnidirectional microphone, which means that our demo won’t work as well in a noisy environment. If that’s fine in a lab, companies looking to build a commercial product will use third-party solutions to integrate beamforming technologies that will help capture the user’s voice, even if the ambient noise is quite high.
Ultimately, X-CUBE-VS4A offers the tremendous advantage of bringing Alexa Voice Service to microcontrollers to ensure that developers don’t have to start from scratch and don’t miss any features that are available from the AVS SDK. Furthermore, our example applications can even help engineers by pointing them in the direction of industry-leading solutions for front-end audio. Adding a smart assistant to a low-power device has never been this easy.