3 reasons why embedded heterogeneous systems are more efficient

3 reasons why embedded heterogeneous systems are more efficient

Efficiency is defined as the ratio of the useful work performed by a machine to the total energy expended. Our jobs as engineers are to innovate and solve problems in the most efficient way possible.

Flying as the sole passenger in an empty Jumbo Jet, for instance, is less efficient than flying in a Lear Jet. The Jumbo Jet burns a lot more fuel and much of the plane’s capacity is wasted, while the Lear Jet can get you to your destination faster on less fuel, considering you don’t have to go through the airport security and avoid delays. Similar could be said about processes running on various types of cores. The ARM® Cortex®-A core is a powerful core that can run smaller processes but it will use up more power and add latency. The better option could be to run smaller processes on a deterministic Cortex-M core.

More and more new generation microprocessors are incorporating a mixture of powerful cores alongside smaller cores. The combination of the two types is not new, they have been used in heterogeneous computing configurations for many years; however as computing is used in many more applications and use cases, the advantages offered by having a mixture of types of cores has brought the need for additional configurations to best support their use cases.

It is helpful to quickly review the different Multi Processing configurations since many terms are used interchangeably.

Homogeneous vs. heterogeneous multicore
Homogeneous multicore systems have more than one core and share the same architecture and microarchitecture. Example of this is the ARM quad-core Cortex-A53 system; each core is identical in this system.

Heterogeneous multicore systems have two or more cores that differ in architecture or microarchitecture. Example of heterogeneous multicore systems is the combination of a microprocessor core with a microcontroller class core (e.g. mix of Cortex-A, Cortex-M, or DSP cores.)

Symmetric vs. asymmetric multiprocessing
The terms Symmetric and Asymmetric usually refer to the software environment, however many mistake multiprocessing for multicore. “Multicore” usually refers to the hardware.

symmetric multiprocessing (SMP) system — one kernel, multiple cores — usually contains identical cores, or at least cores with the same instruction set and run a single OS with shared memory. This environment enables load balancing, allowing processes to run on various cores at various times, as decided by the scheduler.

An asymmetric multiprocessing (AMP) system — multiple software processing, multiple cores — contains multiple cores, either similar (homogeneous) or differing (heterogeneous) architecture with either separate or shared memory. Usually more than one OS is running on the system, which are separated per core or core architecture. For example, the Cortex-A core may run a rich OS, while the Cortex-M core may run simple code or an RTOS. Consider a gateway control application that requires a rich GUI and multiple high speed connectivity options running on the Cortex-A core, while providing control and monitoring algorithms that run separately on the Cortex-M core.

The AMP system also is used in many use cases which can take advantage of an optimized core for specific types of computing, e.g. off-loading audio processing to a low power Cortex-M processor. Many of these will run on an RTOS and require hard real time operations. To enable these requirements the architecture developed around the Cortex-M core in a heterogeneous system provides very fast single cycle access from master to slave/memory. However some good high-performance and low-latency RTOSes (e.g. Nucleus® from Mentor) can provide real time processing on the Cortex-A core.

The homogeneous multicore configuration running in SMP mode can be considered the most popular way to scale processing, however the benefits of a heterogeneous multicore configuration running in AMP mode may be the right fit for designs looking for efficiency in processing and power consumption.

Three key reasons why heterogeneous multicore processing configuration can be beneficial to your design:

#1 Performance optimization

Tasks should be separated based on processing needs and determinism.

Applications running on an OS such as Linux or Android require a powerful Cortex-A type of core along with the Memory Management Unit (MMU). Real-time applications needing strict determinism and/or DSP  capabilities can run on the Cortex-M class core.  Mixing these tasks on a single core is inefficient and may cause unneeded complexity for both types of tasks.

#2 Reduction of power consumption

Many processes providing the monitoring of sensors and controlling of various motors or actuators require determinism and are efficiently run using an RTOS on top of the smaller Cortex-M class core.  If the use case also calls for a rich OS running on the Cortex-A core, the rich OS may spend much of its time waiting on user interaction or communication from the various sensors being monitored by the RTOS running on the Cortex-M core.  In this situation the system can take advantage of this situation and power gate the large Cortex-A core until either a predetermined wake-up time or through an interrupt generated from the lower-power Cortex-M core.  By shutting down the large core and associated silicon, the amount of power that is needed to run the system can be optimized.

#3 Improved system reliability and security

A natural benefit of distributing  processes between the two cores is the ability to create separation between the two asymmetric processing environments.  A system can now control or forbid access between the two processing environments and in turn provide greater stability and security, preventing processes that goes awry from affecting the real time processing domain.  By separating access to the peripherals/memory between the two processing environments, a secure firewall is created that improves both system reliability and security.

Many SoCs can be clearly defined as either heterogeneous or homogeneous architectures, within either an AMP or SMP system. However SoCs such as the i.MX 7Dual can be considered to be a mix. The i.MX 7Dual processor contains a homogeneous multicore architecture, with the Dual Cortex-A7 cores sharing memory, encapsulated in the overall heterogeneous architecture by adding in the Cortex-M4 processor.  This system allows for either an SMP or AMP system on the Dual Cortex-A7, as well as an AMP configuration when adding a separate OS running on the Cortex-M4.

Figure1. AMPconfigurationInAMixArch

Figure 1: AMP Configuration in a Mix Architecture



Figure 2: Mix Processing and Architecture


Heterogeneous multicore processors such as the i.MX 7Dual enable rich software architecture configurations to address the requirements of complex computing devices.  The homogeneous processing enabled through the addition of the Cortex-M processor can offer a significant number of benefits, however it should be noted that issues such as  software configuration, booting, Inter-Process Communication (IPC), debugging and performance optimization also need to be considered. The software community is addressing these complexities with solutions from organizations such as the OpenAMP open source project, managed by The Multicore Association®.  Companies such as NXP and Mentor Graphics are members and contributors in the OpenAMP project.

These processors are supported by popular general purpose OS and real-time OS technologies, and are complemented by runtime technologies and tools such as the newly released DS-MDK from ARM, specifically designed to enable these modern heterogeneous multicore processors by providing the user a rich and powerful tool to debug both sides of the system simultaneously. The ability to observe shared resources and how messages and data are passed from one side to another in a single unified GUI, greatly accelerates the development process.

Heterogeneous compute has come a long way. The new i.MX 7Dual is a great example of an SoC built to enable embedded efficiency through heterogeneous computing. It brings many advantages including performance optimization, reduction of power consumption, and increased system reliability and security.  By taking advantage of these benefits product developers can save on cost and system power while avoiding the more expensive option of an ASSP.

Special thanks to Lori Kate Smith and Phillip Burr of ARM and Warren Kurisu of Mentor Graphics for their contribution and input.



Nik Jedrzejewski
Nik Jedrzejewski
Nik leads the eReader and Wearable product and marketing applications processor strategy for NXP within the low power i.MX processor team and is an expert in electronic paper display (EPD) applications. With 18 years of semiconductor experience, Nik was part of the team credited with capturing over 75 percent market share of the Portable Media Player (PMP) market during his 8 year career at SigmaTel. An electrical engineer, Nik has managed hardware platform design and firmware architecture from the aerospace to consumer electronics sectors. Nik holds a BSEE from the Missouri University of Science and Technology. You can follow him on Twitter @nmj55


  1. […] performance optimization to system reliability, NXP’s Nik Jedrzejewski highlights three ways embedded heterogeneous systems can be beneficial to […]

  2. Avatar Marco Stucchi says:

    Some applications have some (maybe) unusual constraints such as very little startup times (from no power) and very little power supply available at startup.
    Is it possible on these devices to power first the less power-hungry and immediately available Cortex-M4 (almost immediately I guess, since cache involved) and then, when supply improves, the Cortex-A for optional functions like communications or LCD ?

    If not, what device from NXP would you advice for ?

Buy now