Adversarial machine learning

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Adversarial machine learning is a research field that lies at the intersection of machine learning and computer security. It aims to enable the safe adoption of machine learning techniques in adversarial settings like spam filtering, malware detection and biometric recognition.


The problem arises from the fact that machine learning techniques were originally designed for stationary environments in which the training and test data are assumed to be generated from the same (although possibly unknown) distribution. In the presence of intelligent and adaptive adversaries, however, this working hypothesis is likely to be violated to at least some degree (depending on the adversary). In fact, a malicious adversary can carefully manipulate the input data exploiting specific vulnerabilities of learning algorithms to compromise the whole system security.[citation needed]


Examples include: attacks in spam filtering, where spam messages are obfuscated through misspelling of bad words or insertion of good words;[1][2][3][4][5][6][7][8][9][10][11][12] attacks in computer security, e.g., to obfuscate malware code within network packets[13][14][15][16][17][18] or mislead signature detection;[19] attacks in biometric recognition, where fake biometric traits may be exploited to impersonate a legitimate user (biometric spoofing)[20] or to compromise users' template galleries that are adaptively updated over time.[21][22]

In 2017 MIT researchers 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle, no matter the angle the turtle was viewed from.[23][24][25] Creating the turtle required only low-cost commercially available 3-D printing technology.[26] In 2018 Google Brain published a machine-tweaked image of a dog that looked like a cat both to computers and to humans.[27][28]

Security evaluation[edit]

Conceptual representation of the reactive arms race.[29][30][31]

To understand the security properties of learning algorithms in adversarial settings, one should address the following main issues:[29][30][31][32]

  • identifying potential vulnerabilities of machine learning algorithms during learning and classification;
  • devising appropriate attacks that correspond to the identified threats and evaluating their impact on the targeted system;
  • proposing countermeasures to improve the security of machine learning algorithms against the considered attacks.

This process amounts to simulating a proactive arms race (instead of a reactive one, as depicted in Figures 1 and 2), where system designers try to anticipate the adversary in order to understand whether there are potential vulnerabilities that should be fixed in advance; for instance, by means of specific countermeasures such as additional features or different learning algorithms. However proactive approaches are not necessarily superior to reactive ones. For instance, in,[33] the authors showed that under some circumstances, reactive approaches are more suitable for improving system security.

Conceptual representation of the proactive arms race.[29][30][31]

Attacks against machine learning algorithms (supervised)[edit]

The first step of the above-sketched arms race is identifying potential attacks against machine learning algorithms. A substantial amount of work has been done in this direction.[29][30][31][32][34][35][36][37][38][39]

A taxonomy of potential attacks against machine learning[edit]

Attacks against (supervised) machine learning algorithms have been categorized along three primary axes:[32][34][35] their influence on the classifier, the security violation they cause, and their specificity.

  • Attack influence. It can be causative, if the attack aims to introduce vulnerabilities (to be exploited at classification phase) by manipulating training data; or exploratory, if the attack aims to find and subsequently exploit vulnerabilities at classification phase.
  • Security violation. It can be an integrity violation, if it aims to get malicious samples misclassified as legitimate; or an availability violation, if the goal is to increase the misclassification rate of legitimate samples, making the classifier unusable (e.g., a denial of service).
  • Attack specificity. It can be targeted, if specific samples are considered (e.g., the adversary aims to allow a specific intrusion or he/she wants a given spam email to get past the filter); or indiscriminate.

This taxonomy has been extended into a more comprehensive threat model that allows one to make explicit assumptions on the adversary's goal, knowledge of the attacked system, capability of manipulating the input data and/or the system components, and on the corresponding (potentially, formally-defined) attack strategy. Details can be found here.[29][30] Two of the main attack scenarios identified according to this threat model are sketched below.

Evasion attacks[edit]

Evasion attacks[13][14][29][30][36][37][39][40][41] are the most prevalent type of attack that may be encountered in adversarial settings during system operation. For instance, spammers and hackers often attempt to evade detection by obfuscating the content of spam emails and malware code. In the evasion setting, malicious samples are modified at test time to evade detection; that is, to be misclassified as legitimate. No influence over the training data is assumed. A clear example of evasion is image-based spam in which the spam content is embedded within an attached image to evade the textual analysis performed by anti-spam filters. Another example of evasion is given by spoofing attacks against biometric verification systems.[20][21]

Poisoning attacks[edit]

Machine learning algorithms are often re-trained on data collected during operation to adapt to changes in the underlying data distribution. For instance, intrusion detection systems (IDSs) are often re-trained on a set of samples collected during network operation. Within this scenario, an attacker may poison the training data by injecting carefully designed samples to eventually compromise the whole learning process. Poisoning may thus be regarded as an adversarial contamination of the training data. Examples of poisoning attacks against machine learning algorithms (including learning in the presence of worst-case adversarial label flips in the training data) can be found in.[13][15][16][17][19][21][29][30][32][34][35][38][42][43][44][45][46][47][48][49]

Attacks against clustering algorithms[edit]

Clustering algorithms have been increasingly adopted in security applications to find dangerous or illicit activities. For instance, clustering of malware and computer viruses aims to identify and categorize different existing malware families, and to generate specific signatures for their detection by anti-viruses, or signature-based intrusion detection systems like Snort. However, clustering algorithms have not been originally devised to deal with deliberate attack attempts that are designed to subvert the clustering process itself. Whether clustering can be safely adopted in such settings thus remains questionable. Preliminary work reporting some vulnerability of clustering can be found in.[50][51][52][53][54]

Secure learning in adversarial settings[edit]

A number of defense mechanisms against evasion, poisoning and privacy attacks have been proposed in the field of adversarial machine learning, including:

  1. The definition of secure learning algorithms;[6][7][8][9][55][56][57][58]
  2. The use of multiple classifier systems;[3][4][5][10][59][60][61]
  3. The use of randomization or disinformation to mislead the attacker while acquiring knowledge of the system;[4][14][32][34][35][62]
  4. The study of privacy-preserving learning.[30][63]
  5. Ladder algorithm for Kaggle-style competitions.[64]
  6. Game theoretic models for adversarial machine learning and data mining.[65][66][67][68]
  7. Sanitizing training data from adversarial poisoning attacks.[15][16][17]


Some software libraries are available, mainly for testing purposes and research.

Past events[edit]

See also[edit]


  1. ^ N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. "Adversarial classification". In Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 99–108, Seattle, 2004.
  2. ^ D. Lowd and C. Meek. "Adversarial learning". In A. Press, editor, Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pages 641–647, Chicago, IL., 2005.
  3. ^ a b B. Biggio, I. Corona, G. Fumera, G. Giacinto, and F. Roli. "Bagging classifiers for fighting poisoning attacks in adversarial classification tasks". In C. Sansone, J. Kittler, and F. Roli, editors, 10th International Workshop on Multiple Classifier Systems (MCS), volume 6713 of Lecture Notes in Computer Science, pages 350–359. Springer-Verlag, 2011.
  4. ^ a b c B. Biggio, G. Fumera, and F. Roli. "Adversarial pattern classification using multiple classifiers and randomisation". In 12th Joint IAPR International Workshop on Structural and Syntactic Pattern Recognition (SSPR 2008), volume 5342 of Lecture Notes in Computer Science, pages 500–509, Orlando, Florida, USA, 2008. Springer-Verlag.
  5. ^ a b B. Biggio, G. Fumera, and F. Roli. "Multiple classifier systems for robust classifier design in adversarial environments". International Journal of Machine Learning and Cybernetics, 1(1):27–41, 2010.
  6. ^ a b M. Bruckner, C. Kanzow, and T. Scheffer. "Static prediction games for adversarial learning problems". J. Mach. Learn. Res., 13:2617–2654, 2012.
  7. ^ a b M. Bruckner and T. Scheffer. "Nash equilibria of static prediction games". In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 171–179. 2009.
  8. ^ a b M. Bruckner and T. Scheffer. "Stackelberg games for adversarial prediction problems". In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '11, pages 547–555, New York, NY, USA, 2011. ACM.
  9. ^ a b A. Globerson and S. T. Roweis. "Nightmare at test time: robust learning by feature deletion". In W. W. Cohen and A. Moore, editors, Proceedings of the 23rd International Conference on Machine Learning, volume 148, pages 353–360. ACM, 2006.
  10. ^ a b A. Kolcz and C. H. Teo. "Feature weighting for improved classifier robustness". In Sixth Conference on Email and Anti-Spam (CEAS), Mountain View, CA, USA, 2009.
  11. ^ B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini, C. Sutton, J. D. Tygar, and K. Xia. "Exploiting machine learning to subvert your spam filter". In LEET'08: Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, pages 1–9, Berkeley, CA, USA, 2008. USENIX Association.
  12. ^ G. L. Wittel and S. F. Wu. "On attacking statistical spam filters". In First Conference on Email and Anti-Spam (CEAS), Microsoft Research Silicon Valley, Mountain View, California, 2004.
  13. ^ a b c Wagner, D. and P. Soto. Mimicry Attacks on Host-Based Intrusion Detection Systems. in ACM CCS. 2002.
  14. ^ a b c Ke Wang, Janak J. Parekh, Salvatore J. Stolfo; "Anagram: A Content Anomaly Detector Resistant To Mimicry Attack;" Proceedings of the Ninth International Symposium on Recent Advances in Intrusion Detection(RAID); 2006.
  15. ^ a b c Gabriela F. Cretu, Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo; "Extended Abstract: Online Training and Sanitization of AD Systems;" NIPS Workshop on Machine Learning in Adversarial Environments for Computer Security; Vancouver, BC, CA; 2007/12
  16. ^ a b c Gabriela F. Cretu, Angelos Stavrou, Salvatore J. Stolfo, Angelos D. Keromytis; "Data Sanitization: Improving the Forensic Utility of Anomaly Detection Systems;" Proceedings of the Third Workshop on Hot Topics in System Dependability; Edinburgh, UK; 2007/06.
  17. ^ a b c Gabriela F. Cretu, Angelos Stavrou, Michael E. Locasto, Salvatore J. Stolfo, Angelos D. Keromytis; "Casting out Demons: Sanitizing Training Data for Anomaly Sensors;" Proceedings of the IEEE Symposium on Security & Privacy; Oakland, CA; 2008/05.
  18. ^ P. Fogla, M. Sharif, R. Perdisci, O. Kolesnikov, and W. Lee. Polymorphic blending attacks. In USENIX- SS'06: Proc. of the 15th Conf. on USENIX Security Symp., CA, USA, 2006. USENIX Association.
  19. ^ a b J. Newsome, B. Karp, and D. Song. Paragraph: Thwarting signature learning by training maliciously. In Recent Advances in Intrusion Detection, LNCS, pages 81–105. Springer, 2006.
  20. ^ a b R. N. Rodrigues, L. L. Ling, and V. Govindaraju. "Robustness of multimodal biometric fusion methods against spoof attacks". J. Vis. Lang. Comput., 20(3):169–179, 2009.
  21. ^ a b c B. Biggio, L. Didaci, G. Fumera, and F. Roli. "Poisoning attacks to compromise face templates". In 6th IAPR Int'l Conf. on Biometrics (ICB 2013), pages 1–7, Madrid, Spain, 2013.
  22. ^ M. Torkamani and D. Lowd "Convex Adversarial Collective Classification". In Proceedings of the 30th International Conference on Machine Learning (pp. 642-650), Atlanta, GA., 2013.
  23. ^ Gershgorn, Dave (2 November 2017). "Your computer thinks this turtle is a rifle". Quartz. Retrieved 12 February 2018. 
  24. ^ "Single pixel change fools AI programs". BBC News. 3 November 2017. Retrieved 12 February 2018. 
  25. ^ "Optical illusions for computers". CBC Radio. 12 November 2017. Retrieved 12 February 2018. 
  26. ^ Athalye, A., & Sutskever, I. (2017). Synthesizing robust adversarial examples. arXiv preprint arXiv:1707.07397.
  27. ^ "Hacking the Brain With Adversarial Images". IEEE Spectrum: Technology, Engineering, and Science News. 2018. Retrieved 10 March 2018. 
  28. ^ "AI Has a Hallucination Problem That's Proving Tough to Fix". WIRED. 2018. Retrieved 10 March 2018. 
  29. ^ a b c d e f g B. Biggio, G. Fumera, and F. Roli. "Security evaluation of pattern classifiers under attack". IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, 2014.
  30. ^ a b c d e f g h B. Biggio, I. Corona, B. Nelson, B. Rubinstein, D. Maiorca, G. Fumera, G. Giacinto, and F. Roli. "Security evaluation of support vector machines in adversarial environments". In Y. Ma and G. Guo, editors, Support Vector Machines Applications, pp. 105–153. Springer, 2014.
  31. ^ a b c d B. Biggio, G. Fumera, and F. Roli. "Pattern recognition systems under attack: Design issues and research challenges". Int'l J. Patt. Recogn. Artif. Intell., 28(7):1460002, 2014.
  32. ^ a b c d e L. Huang, A. D. Joseph, B. Nelson, B. Rubinstein, and J. D. Tygar. "Adversarial machine learning". In 4th ACM Workshop on Artificial Intelligence and Security (AISec 2011), pages 43–57, Chicago, IL, USA, October 2011.
  33. ^ A. Barth, B. I. P. Rubinstein, M. Sundararajan, J. C. Mitchell, D. Song, and P. L. Bartlett. "A learning-based approach to reactive security. IEEE Transactions on Dependable and Secure Computing", 9(4):482–493, 2012.
  34. ^ a b c d M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar. Can machine learning be secure? In ASIACCS '06: Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pages 16–25, New York, NY, USA, 2006. ACM
  35. ^ a b c d M. Barreno, B. Nelson, A. Joseph, and J. Tygar. "The security of machine learning". Machine Learning, 81:121–148, 2010
  36. ^ a b Vorobeychik, Yevgeniy; Li, Bo (2014-01-01). "Optimal Randomized Classification in Adversarial Settings". Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. AAMAS '14. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems: 485–492. ISBN 9781450327381. 
  37. ^ a b Li, Bo; Vorobeychik, Yevgeniy (2014-01-01). "Feature Cross-substitution in Adversarial Classification". Proceedings of the 27th International Conference on Neural Information Processing Systems. NIPS'14. Cambridge, MA, USA: MIT Press: 2087–2095. 
  38. ^ a b Li, Bo; Wang, Yining; Singh, Aarti; Vorobeychik, Yevgeniy (2016-01-01). Lee, D. D.; Sugiyama, M.; Luxburg, U. V.; Guyon, I.; Garnett, R., eds. Advances in Neural Information Processing Systems 29 (PDF). Curran Associates, Inc. pp. 1885–1893. 
  39. ^ a b Bo, Li,; Yevgeniy, Vorobeychik,; Xinyun, Chen, (2016-04-09). "A General Retraining Framework for Scalable Adversarial Classification". arXiv:1604.02606Freely accessible. 
  40. ^ a b B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Srndic, P. Laskov, G. Giacinto, and F. Roli. "Evasion attacks against machine learning at test time". In H. Blockeel, K. Kersting, S. Nijssen, and F. Zelezny, editors, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Part III, volume 8190 of Lecture Notes in Computer Science, pages 387– 402. Springer Berlin Heidelberg, 2013.
  41. ^ B. Nelson, B. I. Rubinstein, L. Huang, A. D. Joseph, S. J. Lee, S. Rao, and J. D. Tygar. "Query strategies for evading convex-inducing classifiers". J. Mach. Learn. Res., 13:1293–1332, 2012
  42. ^ B. Biggio, B. Nelson, and P. Laskov. "Support vector machines under adversarial label noise". In Journal of Machine Learning Research - Proc. 3rd Asian Conf. Machine Learning, volume 20, pp. 97–112, 2011.
  43. ^ a b B. Biggio, B. Nelson, and P. Laskov. "Poisoning attacks against support vector machines". In J. Langford and J. Pineau, editors, 29th Int'l Conf. on Machine Learning. Omnipress, 2012.
  44. ^ M. Kloft and P. Laskov. "Online anomaly detection under adversarial impact". In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 405–412, 2010.
  45. ^ M. Kloft and P. Laskov. "Security analysis of online centroid anomaly detection". Journal of Machine Learning Research, 13:3647–3690, 2012.
  46. ^ P. Laskov and M. Kloft. "A framework for quantitative security analysis of machine learning". In AISec '09: Proceedings of the 2nd ACM workshop on Security and artificial intelligence, pages 1–4, New York, NY, USA, 2009. ACM.
  47. ^ H. X. Han Xiao and C. Eckert. "Adversarial label flips attack on support vector machines". In 20th European Conference on Artificial Intelligence, 2012.
  48. ^ B. I. P. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S.-h. Lau, S. Rao, N. Taft, and J. D. Tygar. "Antidote: understanding and defending against poisoning of anomaly detectors". In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference, IMC '09, pages 1–14, New York, NY, USA, 2009. ACM.
  49. ^ B. Nelson, B. Biggio, and P. Laskov. "Understanding the risk factors of learning in adversarial environments". In 4th ACM Workshop on Artificial Intelligence and Security, AISec '11, pages 87–92, Chicago, IL, USA, October 2011
  50. ^ a b B. Biggio, I. Pillai, S. R. Bulò, D. Ariu, M. Pelillo, and F. Roli. "Is data clustering in adversarial settings secure?" In Proceedings of the 2013 ACM Workshop on Artificial Intelligence and Security, AISec '13, pages 87–98, New York, NY, USA, 2013. ACM.
  51. ^ J. G. Dutrisac and D. Skillicorn. "Hiding clusters in adversarial settings". In IEEE International Conference on Intelligence and Security Informatics (ISI 2008), pages 185–187, 2008.
  52. ^ D. B. Skillicorn. "Adversarial knowledge discovery". IEEE Intelligent Systems, 24:54–61, 2009.
  53. ^ B. Biggio, K. Rieck, D. Ariu, C. Wressnegger, I. Corona, G. Giacinto, and F. Roli. "Poisoning behavioral malware clustering". In Proc. 2014 Workshop on Artificial Intelligent and Security Workshop, AISec '14, pages 27–36, New York, NY, USA, 2014. ACM.
  54. ^ B. Biggio, S. R. Bulò, I. Pillai, M. Mura, E. Z. Mequanint, M. Pelillo, and F. Roli. "Poisoning complete-linkage hierarchical clustering". In P. Franti, G. Brown, M. Loog, F. Escolano, and M. Pelillo, editors, Joint IAPR Int'l Workshop on Structural, Syntactic, and Statistical Pattern Recognition, volume 8621 of Lecture Notes in Computer Science, pages 42–52, Joensuu, Finland, 2014. Springer Berlin Heidelberg.
  55. ^ O. Dekel, O. Shamir, and L. Xiao. "Learning to classify with missing and corrupted features". Machine Learning, 81:149–178, 2010.
  56. ^ M. Grosshans, C. Sawade, M. Bruckner, and T. Scheffer. "Bayesian games for adversarial regression problems". In Journal of Machine Learning Research - Proc. 30th International Conference on Machine Learning (ICML), volume 28, 2013.
  57. ^ B. Biggio, G. Fumera, and F. Roli. "Design of robust classifiers for adversarial environments". In IEEE Int'l Conf. on Systems, Man, and Cybernetics (SMC), pages 977–982, 2011.
  58. ^ W. Liu and S. Chawla. "Mining adversarial patterns via regularized loss minimization". Machine Learning, 81(1):69–83, 2010.
  59. ^ B. Biggio, G. Fumera, and F. Roli. "Evade hard multiple classifier systems". In O. Okun and G. Valentini, editors, Supervised and Unsupervised Ensemble Methods and Their Applications, volume 245 of Studies in Computational Intelligence, pages 15–38. Springer Berlin / Heidelberg, 2009.
  60. ^ B. Biggio, G. Fumera, and F. Roli. "Multiple classifier systems for adversarial classification tasks". In J. A. Benediktsson, J. Kittler, and F. Roli, editors, Proceedings of the 8th International Workshop on Multiple Classifier Systems, volume 5519 of Lecture Notes in Computer Science, pages 132–141. Springer, 2009.
  61. ^ B. Biggio, G. Fumera, and F. Roli. "Multiple classifier systems under attack". In N. E. Gayar, J. Kittler, and F. Roli, editors, MCS, Lecture Notes in Computer Science, pages 74–83. Springer, 2010.
  62. ^ Li, B and Vorobeychik, Y. Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings. AISTATS, 2015.
  63. ^ B. I. P. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. "Learning in a large function space: Privacy- preserving mechanisms for svm learning". Journal of Privacy and Confidentiality, 4(1):65–100, 2012.
  64. ^ Avrim Blum, Moritz Hardt. "The Ladder: A Reliable Leaderboard for Machine Learning Competitions". 2015.
  65. ^ M. Kantarcioglu, B. Xi, C. Clifton. "Classifier Evaluation and Attribute Selection against Active Adversaries". Data Min. Knowl. Discov., 22:291–335, January 2011.
  66. ^ Y. Zhou, M. Kantarcioglu, B. Thuraisingham, B. Xi. "Adversarial Support Vector Machine Learning". In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '12, pages 1059–1067, New York, NY, USA, 2012.
  67. ^ Y. Zhou, M. Kantarcioglu, B. M. Thuraisingham. "Sparse Bayesian Adversarial Learning Using Relevance Vector Machine Ensembles". In ICDM, pages 1206–1211, 2012.
  68. ^ Y. Zhou, M. Kantarcioglu. "Modeling Adversarial Learning as Nested Stackelberg Games". in Advances in Knowledge Discovery and Data Mining: 20th Pacific-Asia Conference, PAKDD 2016, Auckland, New Zealand, April 19–22, 2016.
  69. ^ H. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, and F. Roli. "Support vector machines under adversarial label contamination". Neurocomputing, Special Issue on Advances in Learning with Label Noise, In Press.
  70. ^ "cchio/deep-pwning". GitHub. Retrieved 2016-08-08. 
  71. ^ A. D. Joseph, P. Laskov, F. Roli, J. D. Tygar, and B. Nelson. "Machine Learning Methods for Computer Security" (Dagstuhl Perspectives Workshop 12371). Dagstuhl Manifestos, 3(1):1–30, 2013.