NXP likes to do things differently – to lead and innovate. We’ve being successfully supporting camera modules interfaces on i.MX applications processors. And we’ve been enabling machine learning on shared resources such as CPUs and GPUs on many NXP SoCs. While this still works well, depending on the application requirements, this blog explains why we decided to boost it up and add both an Image Signal Processor (ISP) and Machine Learning Accelerator to the i.MX 8M Plus.
In particular, performing machine learning in the cloud is the key technology behind anyone using a voice assistant with a smart phone or smart speaker, as well as being the technology behind how social media and even cell phones can group together photos containing a given person. But those use cases all rely on machine learning running in a server somewhere in the cloud. The real challenge that NXP enables is machine learning at the edge. This is where all the machine learning inference runs locally on an edge processor, such as the i.MX 8M Plus. Running the ML inference at the edge means that the application will continue to run even if access to the network is disrupted – critical for applications such as surveillance or a smart home alarm hub, or when operating in remote areas without network access. It also provides much lower latency in making a decision than would be the case if the data had to be sent to a server, processed, and the result sent back. Low Latency is important for example when performing industrial factory floor visual inspection, and needing to decide whether to accept or reject products whizzing by.
Another key benefit of machine learning on the Edge is user privacy. The personal data collected, such as voice communication and commands, face, video and images captured by the Edge device is processed and stays local in the Edge. Information is not sent to the cloud for processing, where it can be recorded and tracked. The user’s privacy remains intact, giving individuals the choice to decide whether or not to share personal information in the cloud.
So now, given the need for machine learning in the edge, the question becomes how much machine learning is needed. One way to measure machine learning accelerators is the number of operations (usually 8-bit integer multiplies or accumulates) per second, usually referred to as TOPS, tera (trillion) operations per second. It is a rudimentary benchmark, as overall system performance will depend on many other factors too, but is one of the most widely quoted machine learning measurements.
It turns out that to do full speech recognition (not just keyword spotting) in the edge takes around 1-2 TOPS (depending on algorithm, and more if you actually wish to understand what the user is saying rather than just converting from speech to text). And to perform object detection (using an algorithm such as Yolov3) at 60fps also takes around 2-3 TOPs. That makes machine learning acceleration such as the 2.3TOPS of i.MX 8M Plus the sweet spot for these type of applications.
ISP functionality always exists in any camera-based system, although sometimes it can be integrated into either a camera module or embedded in an applications processor and potentially hidden to the user. ISPs typically do many types of image enhancement as well as their key purpose converting the one-color-component per pixel output of a raw image sensor into the RGB or YUV images that are more commonly used elsewhere in the system.
Applications processors without ISPs work well in vision-based systems when the camera inputs are coming from network or web cameras, typically connected to the applications processor by Ethernet or USB. For these applications, the camera can be some distance, even up to 100m or so away from the processor. The camera itself has a built-in ISP and processor to convert the image sensor information and encode the video stream before sending it over the network.
Applications processors without ISPs also work well for relatively low-resolution cameras. At resolutions of 1 Megapixel or below, image sensors often have an embedded ISP within them, and can output RGB or YUV images to an applications processor, meaning that there is no need for an ISP within the processor.
But at a resolution of around 2 Megapixels (1080p) or higher, most image sensors do not have an embedded ISP, and instead rely on an ISP somewhere else in the system. This may be a standalone ISP chip (which works, but adds power and cost to the system), or it may be an ISP integrated within the applications processor. This is the solution NXP chose to take with the i.MX 8M Plus – offering high quality imaging, while also being an optimized imaging solution, particularly at 2 Megapixel and higher resolutions.
Putting all of this together, the combination of a 2.3TOPS machine learning accelerator and an ISP, the i.MX 8M Plus applications processor is well positioned to be a key element of embedded vision systems at the edge, whether it be for the smart home, smart building, smart city, or industrial IoT applications. With its embedded ISP, it can be used to create high image quality optimized systems connecting directly to local image sensors, and even feed this image data to the latest machine learning algorithms, all offloaded in the local machine learning accelerator.
The i.MX 8M Plus optimized architecture for Machine Learning and Vision Systems enables Edge Devices Designers to do things differently – to lead and innovate, as NXP does. They have in their hands a powerful machine learning capability aligned with a high definition camera system that allows devices to see clearer and further. A new set of innovative opportunities are open and emerging in the embedded landscape.
For more information on the i.MX 8M Plus visit: nxp.com/imx8mplus