Adversarial machine learning

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Adversarial machine learning is a machine learning technique that attempts to fool models by supplying deceptive input.[1][2][3] The most common reason is to cause a malfunction in a machine learning model.

Most machine learning techniques were designed to work on specific problem sets in which the training and test data are generated from the same statistical distribution (IID). When those models are applied to the real world, adversaries may supply data that violates that statistical assumption. This data may be arranged to exploit specific vulnerabilities and compromise the results.[3][4]


In Snow Crash (1992), the author offered scenarios of technology that was vulnerable to an adversarial attack. In Zero History (2010), a character dons a t-shirt decorated in a way that renders him invisible to electronic surveillance.[5]

In 2004, Nilesh Dalvi and others noted that linear classifiers used in spam filters could be defeated by simple "evasion attacks" as spammers inserted "good words" into their spam emails. (Around 2007, some spammers added random noise to fuzz words within "image spam" in order to defeat OCR-based filters.) In 2006, Marco Barreno and others published "Can Machine Learning Be Secure?", outlining a broad taxonomy of attacks. As late as 2013 many researchers continued to hope that non-linear classifiers (such as support vector machines and neural networks) might be robust to adversaries. In 2012, deep neural networks began to dominate computer vision problems; starting in 2014, Christian Szegedy and others demonstrated that deep neural networks could be fooled by adversaries.[6]

Recently, it was observed that adversarial attacks are harder to produce in the practical world due to the different environmental constraints that cancel out the effect of noises.[7][8] For example, any small rotation or slight illumination on an adversarial image can destroy the adversariality.


Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of “bad” words or the insertion of “good” words;[9][10] attacks in computer security, such as obfuscating malware code within network packets or to mislead signature detection; attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user;[11] or to compromise users' template galleries that adapt to updated traits over time.

Researchers showed that by changing only one-pixel it was possible to fool deep learning algorithms.[12][13] Others 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle regardless of the angle from which the turtle was viewed.[14] Creating the turtle required only low-cost commercially available 3-D printing technology.[15]

A machine-tweaked image of a dog was shown to look like a cat to both computers and humans.[16] A 2019 study reported that humans can guess how machines will classify adversarial images.[17] Researchers discovered methods for perturbing the appearance of a stop sign such that an autonomous vehicle classified it as a merge or speed limit sign.[3][18][19]

McAfee attacked Tesla's former Mobileye system, fooling it into driving 50 mph over the speed limit, simply by adding a two-inch strip of black tape to a speed limit sign.[20][21]

Adversarial patterns on glasses or clothing designed to deceive facial-recognition systems or license-plate readers, have led to a niche industry of "stealth streetwear".[22]

An adversarial attack on a neural network can allow an attacker to inject algorithms into the target system.[23] Researchers can also create adversarial audio inputs to disguise commands to intelligent assistants in benign-seeming audio;[24] a parallel literature explores human perception of such stimuli.[25][26]

Clustering algorithms are used in security applications. Malware and computer virus analysis aims to identify malware families, and to generate specific detection signatures.[27][28]

Attack Modalities[edit]


Attacks against (supervised) machine learning algorithms have been categorized along three primary axes:[29] influence on the classifier, the security violation and their specificity.

  • Classifier influence: An attack can influence the classifier by disrupting the classification phase. This may be preceded by an exploration phase to identify vulnerabilities. The attacker's capabilities might be restricted by the presence of data manipulation constraints.[30]
  • Security violation: An attack can supply malicious data that gets classified as legitimate. Malicious data supplied during training can cause legitimate data to be rejected after training.
  • Specificity: A targeted attack attempts to allow a specific intrusion/disruption. Alternatively, an indiscriminate attack creates general mayhem.

This taxonomy has been extended into a more comprehensive threat model that allows explicit assumptions about the adversary's goal, knowledge of the attacked system, capability of manipulating the input data/system components, and on attack strategy.[31][32] This taxonomy has further been extended to include dimensions for defense strategies against adverserial attacks.[33] Some of the main attack scenarios are:



Evasion attacks[31][32][34] are the most prevalent type of attack. For instance, spammers and hackers often attempt to evade detection by obfuscating the content of spam emails and malware. Samples are modified to evade detection; that is, to be classified as legitimate. This does not involve influence over the training data. A clear example of evasion is image-based spam in which the spam content is embedded within an attached image to evade textual analysis by anti-spam filters. Another example of evasion is given by spoofing attacks against biometric verification systems.[11]


Poisoning is adversarial contamination of training data. Machine learning systems can be re-trained using data collected during operations. For instance, intrusion detection systems (IDSs) are often re-trained using such data. An attacker may poison this data by injecting malicious samples during operation that subsequently disrupt retraining.[31][32][29][35][36][37]

Model Stealing[edit]

Model stealing (also called model extraction) involves an adversary probing a black box machine learning system in order to either reconstruct the model or extract the data it was trained on.[38][39] This can cause issues when either the training data or the model itself is sensitive and confidential. For example, model stealing could be used to extract a proprietary stock trading model which the adversary could then use for their own financial benefit.

Specific Attacks Types[edit]

There are a large variety of different adversarial attacks that can be used against machine learning systems. Many of these work on both deep learning systems as well as traditional machine learning models such as SVMs[40] and linear regression.[41] A high level sample of these attack types include:

  • Adversarial Examples[42]
  • Trojan Attacks / Backdoor Attacks[43]
  • Model Inversion[44]
  • Membership Inference [45]

Adversarial Examples[edit]

An adversarial examples refers to specially crafted input which is design to look "normal" to humans but causes misclassification to a machine learning model. Often, a form of specially designed "noise" is used to elicit the misclassifications. Below are some current techniques for generating adversarial examples in the literature (by no means an exhaustive list).

  • Fast Gradient Sign Method (FGSM)[46]
  • Projected Gradient Descent (PGD)[47]
  • Carlini and Wagner (C&W) attack[48]
  • Adversarial patch attack[49]


Conceptual representation of the proactive arms race[32][28]

Researchers have proposed a multi-step approach to protecting machine learning.[6]

  • Threat modeling - Formalize the attackers goals and capabilities with respect to the target system.
  • Attack simulation - Formalize the optimization problem the attacker tries to solve according to possible attack strategies.
  • Attack impact evaluation
  • Countermeasure design
  • Noise detection (For evasion based attack)[50]
  • Information laundering - Alter the information received by adversaries (for model stealing attacks)[39]


A number of defense mechanisms against evasion, poisoning, and privacy attacks have been proposed, including:

  • Secure learning algorithms[10][51][52]
  • Multiple classifier systems[9][53]
  • AI-written algorithms.[23]
  • AIs that explore the training environment; for example, in image recognition, actively navigating a 3D environment rather than passively scanning a fixed set of 2D images.[23]
  • Privacy-preserving learning[32][54]
  • Ladder algorithm for Kaggle-style competitions
  • Game theoretic models[55][56][57]
  • Sanitizing training data
  • Adversarial training[46]
  • Backdoor detection algorithms[58]


Available software libraries, mainly for testing and research.

  • AdversariaLib - includes implementation of evasion attacks
  • AdLib - Python library with a scikit-style interface which includes implementations of a number of published evasion attacks and defenses
  • AlfaSVMLib - Adversarial Label Flip Attacks against Support Vector Machines[59]
  • Poisoning Attacks against Support Vector Machines, and Attacks against Clustering Algorithms
  • deep-pwning - Metasploit for deep learning which currently has attacks on deep neural networks using Tensorflow.[60] This framework currently updates to maintain compatibility with the latest versions of Python.
  • Cleverhans - A Tensorflow Library to test existing deep learning models versus known attacks
  • foolbox - Python Library to create adversarial examples, implements multiple attacks
  • SecML - Python Library for secure and explainable machine learning - includes implementation of a wide range of ML and attack algorithms, support for dense and sparse data, multiprocessing, visualization tools.
  • TrojAI- Python Library for generating backdoored and trojaned models at scale for research into trojan detection
  • Adversarial Robustness Toolkit (IBM ART) - Python Library for Machine Learning Security
  • Advertorch - Python toolbox for adversarial robustness research whose main functions are implemented in PyTorch

See also[edit]


  1. ^ Kianpour, Mazaher; Wen, Shao-Fang (2020). "Timing Attacks on Machine Learning: State of the Art". Intelligent Systems and Applications. Advances in Intelligent Systems and Computing. 1037. pp. 111–125. doi:10.1007/978-3-030-29516-5_10. ISBN 978-3-030-29515-8.
  2. ^ Bengio, Samy; Goodfellow, Ian J.; Kurakin, Alexey (2017). "Adversarial Machine Learning at Scale". arXiv:1611.01236 [cs.CV].
  3. ^ a b c Lim, Hazel Si Min; Taeihagh, Araz (2019). "Algorithmic Decision-Making in AVs: Understanding Ethical and Technical Concerns for Smart Cities". Sustainability. 11 (20): 5791. arXiv:1910.13122. Bibcode:2019arXiv191013122L. doi:10.3390/su11205791. S2CID 204951009.
  4. ^ Goodfellow, Ian; McDaniel, Patrick; Papernot, Nicolas (25 June 2018). "Making machine learning robust against adversarial inputs". Communications of the ACM. 61 (7): 56–66. doi:10.1145/3134599. ISSN 0001-0782. Retrieved 2018-12-13.
  5. ^ Vincent, James (12 April 2017). "Magic AI: these are the optical illusions that trick, fool, and flummox computers". The Verge. Retrieved 27 March 2020.
  6. ^ a b Biggio, Battista; Roli, Fabio (December 2018). "Wild patterns: Ten years after the rise of adversarial machine learning". Pattern Recognition. 84: 317–331. arXiv:1712.03141. doi:10.1016/j.patcog.2018.07.023. S2CID 207324435.
  7. ^ Kurakin, Alexey; Goodfellow, Ian; Bengio, Samy (2016). "Adversarial examples in the physical world". arXiv:1607.02533 [cs.CV].
  8. ^ Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Techniques." 2020 IEEE Symposium Series on Computational Intelligence (SSCI). 2020.
  9. ^ a b Biggio, Battista; Fumera, Giorgio; Roli, Fabio (2010). "Multiple classifier systems for robust classifier design in adversarial environments". International Journal of Machine Learning and Cybernetics. 1 (1–4): 27–41. doi:10.1007/s13042-010-0007-7. ISSN 1868-8071. S2CID 8729381.
  10. ^ a b Brückner, Michael; Kanzow, Christian; Scheffer, Tobias (2012). "Static Prediction Games for Adversarial Learning Problems" (PDF). Journal of Machine Learning Research. 13 (Sep): 2617–2654. ISSN 1533-7928.
  11. ^ a b Rodrigues, Ricardo N.; Ling, Lee Luan; Govindaraju, Venu (1 June 2009). "Robustness of multimodal biometric fusion methods against spoof attacks" (PDF). Journal of Visual Languages & Computing. 20 (3): 169–179. doi:10.1016/j.jvlc.2009.01.010. ISSN 1045-926X.
  12. ^ Su, Jiawei; Vargas, Danilo Vasconcellos; Sakurai, Kouichi (2019). "One Pixel Attack for Fooling Deep Neural Networks". IEEE Transactions on Evolutionary Computation. 23 (5): 828–841. arXiv:1710.08864. doi:10.1109/TEVC.2019.2890858. S2CID 2698863.
  13. ^ Su, Jiawei; Vargas, Danilo Vasconcellos; Sakurai, Kouichi (October 2019). "One Pixel Attack for Fooling Deep Neural Networks". IEEE Transactions on Evolutionary Computation. 23 (5): 828–841. arXiv:1710.08864. doi:10.1109/TEVC.2019.2890858. ISSN 1941-0026. S2CID 2698863.
  14. ^ "Single pixel change fools AI programs". BBC News. 3 November 2017. Retrieved 12 February 2018.
  15. ^ Athalye, Anish; Engstrom, Logan; Ilyas, Andrew; Kwok, Kevin (2017). "Synthesizing Robust Adversarial Examples". arXiv:1707.07397 [cs.CV].
  16. ^ "AI Has a Hallucination Problem That's Proving Tough to Fix". WIRED. 2018. Retrieved 10 March 2018.
  17. ^ Zhou, Zhenglong; Firestone, Chaz (2019). "Humans can decipher adversarial images". Nature Communications. 10 (1): 1334. arXiv:1809.04120. Bibcode:2019NatCo..10.1334Z. doi:10.1038/s41467-019-08931-6. PMC 6430776. PMID 30902973.
  18. ^ Jain, Anant (2019-02-09). "Breaking neural networks with adversarial attacks - Towards Data Science". Medium. Retrieved 2019-07-15.
  19. ^ Ackerman, Evan (2017-08-04). "Slight Street Sign Modifications Can Completely Fool Machine Learning Algorithms". IEEE Spectrum: Technology, Engineering, and Science News. Retrieved 2019-07-15.
  20. ^ "A Tiny Piece of Tape Tricked Teslas Into Speeding Up 50 MPH". Wired. 2020. Retrieved 11 March 2020.
  21. ^ "Model Hacking ADAS to Pave Safer Roads for Autonomous Vehicles". McAfee Blogs. 2020-02-19. Retrieved 2020-03-11.
  22. ^ Seabrook, John (2020). "Dressing for the Surveillance Age". The New Yorker. Retrieved 5 April 2020.
  23. ^ a b c Heaven, Douglas (October 2019). "Why deep-learning AIs are so easy to fool". Nature. 574 (7777): 163–166. Bibcode:2019Natur.574..163H. doi:10.1038/d41586-019-03013-5. PMID 31597977.
  24. ^ Hutson, Matthew (10 May 2019). "AI can now defend itself against malicious messages hidden in speech". Nature. doi:10.1038/d41586-019-01510-1. PMID 32385365.
  25. ^ Lepori, Michael A; Firestone, Chaz (2020-03-27). "Can you hear me now? Sensitive comparisons of human and machine perception". arXiv:2003.12362 [eess.AS].
  26. ^ Vadillo, Jon; Santana, Roberto (2020-01-23). "On the human evaluation of audio adversarial examples". arXiv:2001.08444 [eess.AS].
  27. ^ D. B. Skillicorn. "Adversarial knowledge discovery". IEEE Intelligent Systems, 24:54–61, 2009.
  28. ^ a b B. Biggio, G. Fumera, and F. Roli. "Pattern recognition systems under attack: Design issues and research challenges". Int'l J. Patt. Recogn. Artif. Intell., 28(7):1460002, 2014.
  29. ^ a b Barreno, Marco; Nelson, Blaine; Joseph, Anthony D.; Tygar, J. D. (2010). "The security of machine learning" (PDF). Machine Learning. 81 (2): 121–148. doi:10.1007/s10994-010-5188-5. S2CID 2304759.
  30. ^ Sikos, Leslie F. (2019). AI in Cybersecurity. Intelligent Systems Reference Library. 151. Cham: Springer. p. 50. doi:10.1007/978-3-319-98842-9. ISBN 978-3-319-98841-2.
  31. ^ a b c B. Biggio, G. Fumera, and F. Roli. "Security evaluation of pattern classifiers under attack Archived 2018-05-18 at the Wayback Machine". IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, 2014.
  32. ^ a b c d e Biggio, Battista; Corona, Igino; Nelson, Blaine; Rubinstein, Benjamin I. P.; Maiorca, Davide; Fumera, Giorgio; Giacinto, Giorgio; Roli, Fabio (2014). "Security Evaluation of Support Vector Machines in Adversarial Environments". Support Vector Machines Applications. Springer International Publishing. pp. 105–153. arXiv:1401.7727. doi:10.1007/978-3-319-02300-7_4. ISBN 978-3-319-02300-7. S2CID 18666561.
  33. ^ Heinrich, Kai; Graf, Johannes; Chen, Ji; Laurisch, Jakob; Zschech, Patrick (2020-06-15). "FOOL ME ONCE, SHAME ON YOU, FOOL ME TWICE, SHAME ON ME: A TAXONOMY OF ATTACK AND DE-FENSE PATTERNS FOR AI SECURITY". ECIS 2020 Research Papers.
  34. ^ B. Nelson, B. I. Rubinstein, L. Huang, A. D. Joseph, S. J. Lee, S. Rao, and J. D. Tygar. "Query strategies for evading convex-inducing classifiers". J. Mach. Learn. Res., 13:1293–1332, 2012
  35. ^ B. Biggio, B. Nelson, and P. Laskov. "Support vector machines under adversarial label noise". In Journal of Machine Learning Research - Proc. 3rd Asian Conf. Machine Learning, volume 20, pp. 97–112, 2011.
  36. ^ M. Kloft and P. Laskov. "Security analysis of online centroid anomaly detection". Journal of Machine Learning Research, 13:3647–3690, 2012.
  37. ^ Moisejevs, Ilja (2019-07-15). "Poisoning attacks on Machine Learning - Towards Data Science". Medium. Retrieved 2019-07-15.
  38. ^ "How to steal modern NLP systems with gibberish?". cleverhans-blog. 2020-04-06. Retrieved 2020-10-15.
  39. ^ a b Wang, Xinran; Xiang, Yu; Gao, Jun; Ding, Jie (2020-09-13). "Information Laundering for Model Privacy". arXiv:2009.06112 [cs.CR].
  40. ^ Biggio, Battista; Nelson, Blaine; Laskov, Pavel (2013-03-25). "Poisoning Attacks against Support Vector Machines". arXiv:1206.6389 [cs.LG].
  41. ^ Jagielski, Matthew; Oprea, Alina; Biggio, Battista; Liu, Chang; Nita-Rotaru, Cristina; Li, Bo (May 2018). "Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning". 2018 IEEE Symposium on Security and Privacy (SP). IEEE: 19–35. arXiv:1804.00308. doi:10.1109/sp.2018.00057. ISBN 978-1-5386-4353-2. S2CID 4551073.
  42. ^ "Attacking Machine Learning with Adversarial Examples". OpenAI. 2017-02-24. Retrieved 2020-10-15.
  43. ^ Gu, Tianyu; Dolan-Gavitt, Brendan; Garg, Siddharth (2019-03-11). "BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain". arXiv:1708.06733 [cs.CR].
  44. ^ Veale, Michael; Binns, Reuben; Edwards, Lilian (2018-11-28). "Algorithms that remember: model inversion attacks and data protection law". Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences. 376 (2133). arXiv:1807.04644. Bibcode:2018RSPTA.37680083V. doi:10.1098/rsta.2018.0083. ISSN 1364-503X. PMC 6191664. PMID 30322998.
  45. ^ Shokri, Reza; Stronati, Marco; Song, Congzheng; Shmatikov, Vitaly (2017-03-31). "Membership Inference Attacks against Machine Learning Models". arXiv:1610.05820 [cs.CR].
  46. ^ a b Goodfellow, Ian J.; Shlens, Jonathon; Szegedy, Christian (2015-03-20). "Explaining and Harnessing Adversarial Examples". arXiv:1412.6572 [stat.ML].
  47. ^ Madry, Aleksander; Makelov, Aleksandar; Schmidt, Ludwig; Tsipras, Dimitris; Vladu, Adrian (2019-09-04). "Towards Deep Learning Models Resistant to Adversarial Attacks". arXiv:1706.06083 [stat.ML].
  48. ^ Carlini, Nicholas; Wagner, David (2017-03-22). "Towards Evaluating the Robustness of Neural Networks". arXiv:1608.04644 [cs.CR].
  49. ^ Brown, Tom B.; Mané, Dandelion; Roy, Aurko; Abadi, Martín; Gilmer, Justin (2018-05-16). "Adversarial Patch". arXiv:1712.09665 [cs.CV].
  50. ^ Kishor Datta Gupta; Akhtar, Zahid; Dasgupta, Dipankar (2020). "Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial Attacks". arXiv:2007.00337 [cs.CV].
  51. ^ O. Dekel, O. Shamir, and L. Xiao. "Learning to classify with missing and corrupted features". Machine Learning, 81:149–178, 2010.
  52. ^ Liu, Wei; Chawla, Sanjay (2010). "Mining adversarial patterns via regularized loss minimization" (PDF). Machine Learning. 81: 69–83. doi:10.1007/s10994-010-5199-2. S2CID 17497168.
  53. ^ B. Biggio, G. Fumera, and F. Roli. "Evade hard multiple classifier systems". In O. Okun and G. Valentini, editors, Supervised and Unsupervised Ensemble Methods and Their Applications, volume 245 of Studies in Computational Intelligence, pages 15–38. Springer Berlin / Heidelberg, 2009.
  54. ^ B. I. P. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. "Learning in a large function space: Privacy- preserving mechanisms for svm learning". Journal of Privacy and Confidentiality, 4(1):65–100, 2012.
  55. ^ M. Kantarcioglu, B. Xi, C. Clifton. "Classifier Evaluation and Attribute Selection against Active Adversaries". Data Min. Knowl. Discov., 22:291–335, January 2011.
  56. ^ Chivukula, Aneesh; Yang, Xinghao; Liu, Wei; Zhu, Tianqing; Zhou, Wanlei (2020). "Game Theoretical Adversarial Deep Learning with Variational Adversaries". IEEE Transactions on Knowledge and Data Engineering: 1. doi:10.1109/TKDE.2020.2972320. ISSN 1558-2191.
  57. ^ Chivukula, Aneesh Sreevallabh; Liu, Wei (2019). "Adversarial Deep Learning Models with Multiple Adversaries". IEEE Transactions on Knowledge and Data Engineering. 31 (6): 1066–1079. doi:10.1109/TKDE.2018.2851247. ISSN 1558-2191. S2CID 67024195.
  58. ^ "TrojAI". Retrieved 2020-10-14.
  59. ^ H. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, and F. Roli. "Support vector machines under adversarial label contamination". Neurocomputing, Special Issue on Advances in Learning with Label Noise, In Press.
  60. ^ "cchio/deep-pwning". GitHub. Retrieved 2016-08-08.

External links[edit]