Adversarial machine learning

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Adversarial machine learning is a technique employed in the field of machine learning which attempts to fool models through malicious input.[1][2] This technique can be applied for a variety of reasons, the most common being to attack or cause a malfunction in standard machine learning models.

Machine learning techniques were originally designed for stationary and benign environments in which the training and test data are assumed to be generated from the same statistical distribution. However, when those models are implemented in the real world, the presence of intelligent and adaptive adversaries may violate that statistical assumption to some degree, depending on the adversary. This technique shows how a malicious adversary can surreptitiously manipulate the input data so as to exploit specific vulnerabilities of learning algorithms and compromise the security of the machine learning system.[2][3]


As early as Snow Crash (1992), science fiction writers have posited scenarios of technology being vulnerable to specially-constructed data. In Zero History (2010), a character dons a t-shirt decorated in a way that renders him invisible to electronic surveillance.[4]

In 2004, Nilesh Dalvi and others noted that linear classifiers used in spam filters were being defeated by simple "evasion attacks" as spammers inserted "good words" into their spam emails. (Around 2007, some spammers would also add random noise to fuzz words within "image spam" in order to defeat OCR-based filters.) In 2006, Marco Barreno and others published "Can Machine Learning Be Secure?", an influential paper suggesting a broad taxonomy of attacks against machine learning. As late as 2013 many researchers continued to hope that non-linear classifiers (such as SVMs and neural networks) might be naturally robust to adversarial examples. In 2012, deep neural networks were unexpectedly crowned as the dominant path for advanced computer vision; starting in 2014, Christian Szegedy and others demonstrated that deep neural networks could be fooled with tiny adjustments of the input.[5]

Other examples[edit]

Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of “bad” words or the insertion of “good” words;[6][7] attacks in computer security, such as obfuscating malware code within network packets or to mislead signature detection; attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user;[8] or to compromise users' template galleries that adapt to updated traits over time.

In 2017, researchers at Kyushu University showed that by changing only one-pixel it was possible to fool current deep learning algorithms.[9][10] Moreover, at the same year, researchers at the Massachusetts Institute of Technology 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle regardless of the angle from which the turtle was viewed.[11] Creating the turtle required only low-cost commercially available 3-D printing technology.[12] In 2018, Google Brain published a machine-tweaked image of a dog that looked like a cat both to computers and to humans.[13] A 2019 study from Johns Hopkins University showed that, when asked, humans can guess how machines will misclassify adversarial images.[14] Researchers have also discovered methods for perturbing the appearance of a stop sign such that an autonomous vehicle will classify it as a merge or speed limit sign.[2][15][16] In 2020, cyber security firm McAfee demonstrated an obsolete version of Tesla's Mobileye can be fooled into accelerating 50 miles per hour over the speed limit, by adding a two-inch strip of black tape to a speed limit sign. McAfee stated that "Even to a trained eye, (the black tape) hardly looks suspicious or malicious, and many who saw it didn't realize the sign had been altered at all", while Mobileye stated that the tape could have fooled human drivers as well.[17][18] Adversarial patterns on glasses or clothing designed to deceive facial-recognition systems or license-plate readers, have led to a niche protest-driven industry of "stealth streetwear". [19]

An adversarial attack on a neural network can allow an attacker to inject his own algorithms into the target AI.[20] Researchers can also create adversarial audio inputs to disguise commands to intelligent personal assistants in benign-seeming audio; however, robust countermeasures to such adversarial audio seem feasible.[21]

Security evaluation[edit]

Conceptual representation of the reactive arms race[22][23][24]

To understand the security properties of learning algorithms in adversarial settings, the following main issues should be addressed:[22][25][23][24][26]

  • identifying potential vulnerabilities of machine learning algorithms during learning and classification
  • devising appropriate attacks that correspond to the identified threats and evaluating their impact on the targeted system
  • proposing countermeasures to improve the security of machine learning algorithms against the considered attacks

This process amounts to simulating a proactive arms race (instead of a reactive one, as depicted in Figures 1 and 2), where system designers try to anticipate the adversary in order to understand whether there are potential vulnerabilities that should be fixed in advance; for instance, by means of specific countermeasures such as additional features or different learning algorithms. However, proactive approaches are not necessarily superior to reactive ones: under some circumstances, reactive approaches are more suitable for improving system security.[27]

Conceptual representation of the proactive arms race[22][23][24]

Attacks against machine learning algorithms (supervised)[edit]

The first step of the arms race described above is identifying potential attacks against machine learning algorithms. A substantial amount of work has been done in this direction.[22][23][24][28]

Taxonomy of potential attacks against machine learning[edit]

Attacks against (supervised) machine learning algorithms have been categorized along three primary axes:[28] their influence on the classifier, the security violation they cause, and their specificity.

  • Attack influence: An attack can have a causative influence if it aims to introduce vulnerabilities to be exploited at the classification phase by manipulating training data, or an exploratory influence if the attack aims to find and subsequently exploit vulnerabilities at classification phase. The attacker's capabilities might also be influenced by the presence of data manipulation constraints.[29]
  • Security violation: An attack can cause an integrity violation if it aims to get malicious samples misclassified as legitimate, or it may cause an availability violation if the goal is to increase the wrong classification rate of legitimate samples, making the classifier unusable (e.g., a denial of service).
  • Attack specificity: An attack can be targeted if specific samples are considered (e.g. the adversary aims to allow a specific intrusion or wants a given spam email to get past the filter), or indiscriminate.

This taxonomy has been extended into a more comprehensive threat model that allows one to make explicit assumptions on the adversary's goal, knowledge of the attacked system, capability of manipulating the input data and/or the system components, and on the corresponding (potentially, formally-defined) attack strategy.[22][23] Two of the main attack scenarios identified according to this threat model are described below.

Evasion attacks[edit]

Evasion attacks[22][23][30] are the most prevalent type of attack that may be encountered in adversarial settings during system operation. For instance, spammers and hackers often attempt to evade detection by obfuscating the content of spam emails and malware code. In the evasion setting, malicious samples are modified at test time to evade detection; that is, to be misclassified as legitimate. No attacker influence over the training data is assumed. A clear example of evasion is image-based spam in which the spam content is embedded within an attached image to evade the textual analysis performed by anti-spam filters. Another example of evasion is given by spoofing attacks against biometric verification systems.[8]

Poisoning attacks[edit]

Machine learning algorithms are often re-trained on data collected during operation to adapt to changes in the underlying data distribution. For instance, intrusion detection systems (IDSs) are often re-trained on a set of samples collected during network operation. Within this scenario, an attacker may poison the training data by injecting carefully designed samples to eventually compromise the whole learning process. Poisoning may thus be regarded as an adversarial contamination of the training data. Examples of poisoning attacks against machine learning algorithms including learning in the presence of worst-case adversarial label flips in the training data can be found in the following reference links.[22][23][28][31][32] Adversarial stop signs (stop signs that look normal to the human eye but are classified as non-stop signs by neural networks) are primary examples of poisoning attacks.[33]

Attacks against clustering algorithms[edit]

Clustering algorithms have been increasingly adopted in security applications to find dangerous or illicit activities. For instance, clustering of malware and computer viruses aims to identify and categorize different existing malware families, and to generate specific signatures for their detection by anti-viruses or signature-based intrusion detection systems like Snort.

However, clustering algorithms were not originally devised to deal with deliberate attack attempts that are designed to subvert the clustering process itself. If clustering can be safely adopted in such settings, this remains questionable.[34]

Secure learning in adversarial settings[edit]

A number of defense mechanisms against evasion, poisoning, and privacy attacks have been proposed in the field of adversarial machine learning, including:

  • The definition of secure learning algorithms[7][35][36]
  • The use of multiple classifier systems[6][37]
  • Having AIs write their own algorithms.[20]
  • Having AIs explore the training environment; for example, in image recognition, actively navigating a 3D environment rather than passively scanning a fixed set of 2D images.[20]
  • The study of privacy-preserving learning[23][38]
  • Ladder algorithm for Kaggle-style competitions
  • Game theoretic models for adversarial machine learning and data mining[39]
  • Sanitizing training data from adversarial poisoning attacks


Some software libraries are available, mainly for testing purposes and research.

  • AdversariaLib - includes implementation of evasion attacks
  • AdLib - Python library with a scikit-style interface which includes implementations of a number of published evasion attacks and defenses
  • AlfaSVMLib - Adversarial Label Flip Attacks against Support Vector Machines[40]
  • Poisoning Attacks against Support Vector Machines, and Attacks against Clustering Algorithms
  • deep-pwning - Metasploit for deep learning which currently has attacks on deep neural networks using Tensorflow.[41] This framework currently updates to maintain compatibility with the latest versions of Python.
  • Cleverhans - A Tensorflow Library to test existing deep learning models versus known attacks
  • foolbox - Python Library to create adversarial examples, implements multiple attacks
  • SecML - Python Library for secure and explainable machine learning - includes implementation of a wide range of ML and attack algorithms, support for dense and sparse data, multiprocessing, visualization tools.

Past events[edit]

See also[edit]


  1. ^ Bengio, Samy; Goodfellow, Ian J.; Kurakin, Alexey (2017). "Adversarial Machine Learning at Scale". Google AI. arXiv:1611.01236. Bibcode:2016arXiv161101236K. Retrieved 2018-12-13.
  2. ^ a b c Lim, Hazel Si Min; Taeihagh, Araz (2019). "Algorithmic Decision-Making in AVs: Understanding Ethical and Technical Concerns for Smart Cities". Sustainability. 11 (20): 5791. arXiv:1910.13122. Bibcode:2019arXiv191013122L. doi:10.3390/su11205791.
  3. ^ Goodfellow, Ian; McDaniel, Patrick; Papernot, Nicolas (25 June 2018). "Making machine learning robust against adversarial inputs". Communications of the ACM. 61 (7): 56–66. doi:10.1145/3134599. ISSN 0001-0782. Retrieved 2018-12-13.
  4. ^ Vincent, James (12 April 2017). "Magic AI: these are the optical illusions that trick, fool, and flummox computers". The Verge. Retrieved 27 March 2020.
  5. ^ Biggio, Battista; Roli, Fabio (December 2018). "Wild patterns: Ten years after the rise of adversarial machine learning". Pattern Recognition. 84: 317–331. arXiv:1712.03141. doi:10.1016/j.patcog.2018.07.023.
  6. ^ a b Biggio, Battista; Fumera, Giorgio; Roli, Fabio (2010). "Multiple classifier systems for robust classifier design in adversarial environments". International Journal of Machine Learning and Cybernetics. 1 (1–4): 27–41. doi:10.1007/s13042-010-0007-7. ISSN 1868-8071.
  7. ^ a b Brückner, Michael; Kanzow, Christian; Scheffer, Tobias (2012). "Static Prediction Games for Adversarial Learning Problems" (PDF). Journal of Machine Learning Research. 13 (Sep): 2617–2654. ISSN 1533-7928.
  8. ^ a b Rodrigues, Ricardo N.; Ling, Lee Luan; Govindaraju, Venu (1 June 2009). "Robustness of multimodal biometric fusion methods against spoof attacks" (PDF). Journal of Visual Languages & Computing. 20 (3): 169–179. doi:10.1016/j.jvlc.2009.01.010. ISSN 1045-926X.
  9. ^ Su, Jiawei; Vargas, Danilo Vasconcellos; Sakurai, Kouichi (2019). "One Pixel Attack for Fooling Deep Neural Networks". IEEE Transactions on Evolutionary Computation. 23 (5): 828–841. arXiv:1710.08864. doi:10.1109/TEVC.2019.2890858.
  10. ^ Su, Jiawei; Vargas, Danilo Vasconcellos; Sakurai, Kouichi (October 2019). "One Pixel Attack for Fooling Deep Neural Networks". IEEE Transactions on Evolutionary Computation. 23 (5): 828–841. arXiv:1710.08864. doi:10.1109/TEVC.2019.2890858. ISSN 1941-0026.
  11. ^ "Single pixel change fools AI programs". BBC News. 3 November 2017. Retrieved 12 February 2018.
  12. ^ Athalye, A., & Sutskever, I. (2017). Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397.
  13. ^ "AI Has a Hallucination Problem That's Proving Tough to Fix". WIRED. 2018. Retrieved 10 March 2018.
  14. ^ Zhou, Z., & Firestone, C. (2019). Humans can decipher adversarial images. Nature Communications, 10, 1334.
  15. ^ Jain, Anant (2019-02-09). "Breaking neural networks with adversarial attacks - Towards Data Science". Medium. Retrieved 2019-07-15.
  16. ^ Ackerman, Evan (2017-08-04). "Slight Street Sign Modifications Can Completely Fool Machine Learning Algorithms". IEEE Spectrum: Technology, Engineering, and Science News. Retrieved 2019-07-15.
  17. ^ "A Tiny Piece of Tape Tricked Teslas Into Speeding Up 50 MPH". Wired. 2020. Retrieved 11 March 2020.
  18. ^ "Model Hacking ADAS to Pave Safer Roads for Autonomous Vehicles". McAfee Blogs. 2020-02-19. Retrieved 2020-03-11.
  19. ^ Seabrook, John (2020). "Dressing for the Surveillance Age". The New Yorker. Retrieved 5 April 2020.
  20. ^ a b c Heaven, Douglas (October 2019). "Why deep-learning AIs are so easy to fool". Nature: 163–166. doi:10.1038/d41586-019-03013-5.
  21. ^ Hutson, Matthew (10 May 2019). "AI can now defend itself against malicious messages hidden in speech". Nature. doi:10.1038/d41586-019-01510-1.
  22. ^ a b c d e f g B. Biggio, G. Fumera, and F. Roli. "Security evaluation of pattern classifiers under attack". IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, 2014.
  23. ^ a b c d e f g h Biggio, Battista; Corona, Igino; Nelson, Blaine; Rubinstein, Benjamin I. P.; Maiorca, Davide; Fumera, Giorgio; Giacinto, Giorgio; Roli, Fabio (2014). "Security Evaluation of Support Vector Machines in Adversarial Environments". Support Vector Machines Applications. Springer International Publishing. pp. 105–153. arXiv:1401.7727. doi:10.1007/978-3-319-02300-7_4. ISBN 978-3-319-02300-7.
  24. ^ a b c d B. Biggio, G. Fumera, and F. Roli. "Pattern recognition systems under attack: Design issues and research challenges". Int'l J. Patt. Recogn. Artif. Intell., 28(7):1460002, 2014.
  25. ^ Biggio, Battista; Fumera, Giorgio; Roli, Fabio (2014). "Security Evaluation of Pattern Classifiers under Attack". IEEE Transactions on Knowledge and Data Engineering. 26 (4): 984–996. arXiv:1709.00609. doi:10.1109/TKDE.2013.57. ISSN 1041-4347.
  26. ^ L. Huang, A. D. Joseph, B. Nelson, B. Rubinstein, and J. D. Tygar. "Adversarial machine learning". In 4th ACM Workshop on Artificial Intelligence and Security (AISec 2011), pages 43–57, Chicago, IL, USA, October 2011.
  27. ^ A. Barth, B. I. P. Rubinstein, M. Sundararajan, J. C. Mitchell, D. Song, and P. L. Bartlett. "A learning-based approach to reactive security. IEEE Transactions on Dependable and Secure Computing", 9(4):482–493, 2012.
  28. ^ a b c M. Barreno, B. Nelson, A. Joseph, and J. Tygar. "The security of machine learning". Machine Learning, 81:121–148, 2010
  29. ^ Sikos, Leslie F. (2019). AI in Cybersecurity. Intelligent Systems Reference Library. 151. Cham: Springer. p. 50. doi:10.1007/978-3-319-98842-9. ISBN 978-3-319-98841-2.
  30. ^ B. Nelson, B. I. Rubinstein, L. Huang, A. D. Joseph, S. J. Lee, S. Rao, and J. D. Tygar. "Query strategies for evading convex-inducing classifiers". J. Mach. Learn. Res., 13:1293–1332, 2012
  31. ^ B. Biggio, B. Nelson, and P. Laskov. "Support vector machines under adversarial label noise". In Journal of Machine Learning Research - Proc. 3rd Asian Conf. Machine Learning, volume 20, pp. 97–112, 2011.
  32. ^ M. Kloft and P. Laskov. "Security analysis of online centroid anomaly detection". Journal of Machine Learning Research, 13:3647–3690, 2012.
  33. ^ Moisejevs, Ilja (2019-07-15). "Poisoning attacks on Machine Learning - Towards Data Science". Medium. Retrieved 2019-07-15.
  34. ^ D. B. Skillicorn. "Adversarial knowledge discovery". IEEE Intelligent Systems, 24:54–61, 2009.
  35. ^ O. Dekel, O. Shamir, and L. Xiao. "Learning to classify with missing and corrupted features". Machine Learning, 81:149–178, 2010.
  36. ^ W. Liu and S. Chawla. "Mining adversarial patterns via regularized loss minimization". Machine Learning, 81(1):69–83, 2010.
  37. ^ B. Biggio, G. Fumera, and F. Roli. "Evade hard multiple classifier systems". In O. Okun and G. Valentini, editors, Supervised and Unsupervised Ensemble Methods and Their Applications, volume 245 of Studies in Computational Intelligence, pages 15–38. Springer Berlin / Heidelberg, 2009.
  38. ^ B. I. P. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. "Learning in a large function space: Privacy- preserving mechanisms for svm learning". Journal of Privacy and Confidentiality, 4(1):65–100, 2012.
  39. ^ M. Kantarcioglu, B. Xi, C. Clifton. "Classifier Evaluation and Attribute Selection against Active Adversaries". Data Min. Knowl. Discov., 22:291–335, January 2011.
  40. ^ H. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, and F. Roli. "Support vector machines under adversarial label contamination". Neurocomputing, Special Issue on Advances in Learning with Label Noise, In Press.
  41. ^ "cchio/deep-pwning". GitHub. Retrieved 2016-08-08.
  42. ^ A. D. Joseph, P. Laskov, F. Roli, J. D. Tygar, and B. Nelson. "Machine Learning Methods for Computer Security" (Dagstuhl Perspectives Workshop 12371). Dagstuhl Manifestos, 3(1):1–30, 2013.