Having examined how the shift of high performance processing from the cloud to the edge of networks has enabled the Internet of Things (IoT) to thrive, and consequently laid the foundations for AI in this blog article, there is one more aspect of AI that requires our full attention: Progress in AI is closely related to the development of cyber threats.
Every second, five new malware variants are discovered. Organizations across the globe are hit by one hundred previously unknown malware attacks every hour. Every day, one million new malicious files appear in the connected world. With ever more devices and systems connected to the web, cybercrime has become an increasing threat to our technological assets – and to the safety of our society as a whole.
Machine learning proves to be a double-edged sword: While ML enables industry-grade malware detection programs to work more effectively, it will soon be used by the bad actors to enhance the offensive capabilities of their attacks. As a matter of fact, a group of researchers from the University of Amsterdam recently demonstrated how this can work. In their side-channel attack that leaked information out of the CPU’s translation lookaside buffers (TLBs), the white-hat hackers used novel machine learning techniques to train their attack algorithm and bring it to a new level of performance. They are confident that machine learning techniques will improve the quality of future side-channel attacks.
In order to prevent the emergence of new, effective AI and ML techniques from changing the balance of power, we must focus on how to leverage artificial intelligence to improve system security and data privacy.
A good example of ML-based security is anomaly detection, where the system “detects” anomalous behavior or patterns in the data stream. This process has been routinely applied to SPAM and malware detection in the past, but machine learning can be expanded to look for more subtle and complex anomalous behavior in a system. While monitoring and protecting from external threats is crucial for an effective system defense, few organizations are aware of inside threats. In a survey from Accenture in 2016, they found that two thirds of the surveyed organizations fell victim to data theft from inside the organization. In these instances, 91% reported that they did not have effective detection methods for identifying this type of threat. Machine learning can significantly aid in the development of effective, real-time profiling and anomaly detection capabilities, to detect and neutralize user-based threats from within the system.
It’s easy to identify applications in which the data providers for AI, either in the training phase or in the inference phase, do not want to provide their data unprotected. With the new EU General Data Protection Regulation (GDPR) in effect since May 25, 2018, privacy protection is mandatory for any business dealing with the data of EU citizens, and non-compliance can result in heavy fines.
There are also applications in industrial environments where data privacy is crucial to system providers. For example, in predictive maintenance machine data is used to determine the condition of in-service equipment to precisely predict when maintenance should be performed. This approach achieves substantial cost savings over routine or time-based preventive maintenance because tasks are performed only when required and hopefully in advance of system failure. Machine owners participating in the service have a clear intent to benefit from the generated data, however, they also have a strong interest in not sharing their data with competitors using the same machines. This puts the maintenance service provider in a dilemma. The key question is: How can businesses continue to respect privacy concerns while still permitting the use of big data to drive business value?
Using homomorphic encryption, any data being used in the computation remains in the encrypted form, and only becomes visible to the intended user. The result of the computation – once decrypted – matches the result of the same computation applied to the plaintext. In addition, attribute-based authentication as used by NXP, is based on the Identity Mixer protocol developed at IBM® Research. It relies on a combination of flexible public keys (pseudonyms) and flexible credential that allows a user to share only the information required for a certain transaction, without revealing any other attributes.
The advantages are obvious: “The Internet is like the lunar surface — it never forgets a footprint. With Identity Mixer, we can turn it into a sandy beach that regularly washes everything away,” says Jan Camenisch, cryptographer and co-inventor of Identity Mixer.
Attacks can occur even in the first phase of ML when the training data is collected and being fed into the ML model. In order to prevent changing the data or manipulating the outcome of the ML model a typical application is an anomaly detection tool trained from data sent by users. If a user “poisons” the training data by purposely sending incorrect inputs, this may result in inferior performance or even failure of the machine learning model at inference time.
The user’s privacy must also be guaranteed during the inference phase. This is especially relevant when inference is done on private or sensitive data. The user can also be the attacker. As a means of attack, the user may employ so called adversarial examples. An adversarial example is a valid input data that will cause the machine learning model to misinterpret it. This seemingly benign attack can create catastrophic consequences, for instance, if we think of road sign classification in safety critical situations.
By attaching a specially crafted sticker on a stop sign, researchers have shown that they can trick the image classifier into misinterpreting or not recognizing the sign at all. While the sign appears as regular stop signs to the human eye, the machine learning model is unable to see it as such. The concept of adversarial examples is not new. What is new is the severity of their consequences, like crashing an autonomous car in the stop sign adversarial example.
The value of machine learning models mostly resides in the associated data sets. Training data can be very expensive to collect or difficult to obtain. When machine learning is offered as a service, the user has access only to the inputs and the outputs of the model. For example, in the case of an image classifier, a user submits an image and gets in return its category. It might be tempting for a user to make a copy of the model itself to avoid paying for future usage. A possible attack is to query the service on chosen input data, obtain the corresponding output, and train the so-obtained data-set to get a functionally equivalent model.
At NXP, we built some of the most sophisticated secure devices in the world. We create countermeasures inside them to protect them against a broad range of logical and physical attacks, such as side-channel or template attacks. It’s only a matter of time when hackers will rely on AI to extract secrets and critical information from secure systems, as it only enhances their “learning” capabilities.
To ready ourselves against these evolving threats, the implications of AI must become an integral part of IoT system security. If training and inference of AI machine learning models are not to be become a wide-open gate for future adversaries, security by design and privacy by design principles must be considered from the beginning. Fortunately, it’s not too late for the AI realm to apply the lessons already learned from IoT security. If we keep this in mind when designing the infrastructure and devices of the future, the AIoT holds the power to transform our lives. And it’s upon us, to turn the black box of the future into a bright one.