Bernhard Schölkopf

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Bernhard Schölkopf
GWL 2018 Schölkopf 1433 DavidAusserhofer.jpg
Bernhard Schölkopf in 2018
BornFebruary 1968 (age 53)
Alma mater
Known for
AwardsBBVA Foundation Frontiers of Knowledge Awards (2020)
Körber European Science Prize (2019)
Causality in Statistics Education Award, American Statistical Association[1]
Leibniz Prize (2018)
Fellow of the ACM (Association for Computing Machinery) (2018)
Member of the German National Academy of Science (Leopoldina) (2017)
Milner Award (2014)
Academy Prize of the Berlin-Brandenburg Academy of Sciences and Humanities (2012)
Max Planck Research Award (2011)
J. K. Aggarwal Prize of the International Association for Pattern Recognition (2006)
Scientific career
InstitutionsMax Planck Institute for Intelligent Systems

Bernhard Schölkopf is a German computer scientist (born 20 February 1968) known for his work in machine learning, especially on kernel methods and causality. He is a director at the Max Planck Institute for Intelligent Systems in Tübingen, Germany, where he heads the Department of Empirical Inference. He is also an affiliated professor at ETH Zürich, honorary professor at the University of Tübingen and the Technical University Berlin, and chairman of the European Laboratory for Learning and Intelligent Systems (ELLIS).


Kernel methods[edit]

Schölkopf developed SVM methods achieving world record performance on the MNIST pattern recognition benchmark at the time.[2] With the introduction of kernel PCA, Schölkopf and coauthors argued that SVMs are a special case of a much larger class of methods, and all algorithms that can be expressed in terms of dot products can be generalized to a nonlinear setting by means of what is known as reproducing kernels. [3][4] Another significant observation was that the data on which the kernel is defined need not be vectorial, as long as the kernel Gram matrix is positive definite. [5] Both insights together led to the foundation of the field of kernel methods, encompassing SVMs and many other algorithms. Kernel methods are now textbook knowledge and one of the major machine learning paradigms in research and applications.

Developing kernel PCA, Schölkopf extended it to extract invariant features and to design invariant kernels [6][7][8] and showed how to view other major dimensionality reduction methods such as LLE and Isomap as special cases. In further work with Alex Smola and others, he extended the SVM method to regression and classification with pre-specified sparsity [9] and quantile/support estimation.[10] He proved a representer theorem implying that SVMs, kernel PCA, and most other kernel algorithms, regularized by a norm in a reproducing kernel Hilbert space, have solutions taking the form of kernel expansions on the training data, thus reducing an infinite dimensional optimization problem to a finite dimensional one. He co-developed kernel embeddings of distributions methods to represent probability distributions in Hilbert Spaces,[11][12][13][14] with links to Fraunhofer diffraction[15] as well as applications to independence testing.[16][17][18]


Starting in 2005, Schölkopf turned his attention to causal inference. Causal mechanisms in the world give rise to statistical dependencies as epiphenomena, but only the latter are exploited by popular machine learning algorithms. Knowledge about causal structures and mechanisms is useful by letting us predict not only future data coming from the same source, but also the effect of interventions in a system, and by facilitating transfer of detected regularities to new situations.[19]

Schölkopf and co-workers addressed (and in certain settings solved) the problem of causal discovery for the two-variable setting[20][21][22][23][24]and connected causality to Kolmogorov complexity.[25]

Around 2010, Schölkopf began to explore how to use causality for machine learning, exploiting assumptions of independence of mechanisms and invariance.[26] His early work on causal learning was exposed to a wider machine learning audience during his Posner lecture [27] at NeurIPS 2011, as well as in a keynote talk at ICML 2017.[28] He assayed how to exploit underlying causal structures in order to make machine learning methods more robust with respect to distribution shifts[29][30][31] and systematic errors,[32] the latter leading to the discovery of a number of new exoplanets[33] including K2-18b, which was subsequently found to contain water vapour in its atmosphere, a first for an exoplanet in the habitable zone.

Education and employment[edit]

Schölkopf studied mathematics, physics, and philosophy in Tübingen and London. He was supported by the Studienstiftung and won the Lionel Cooper Memorial Prize for the best M.Sc. in Mathematics at the University of London.[34] He completed a Diplom in Physics, and then moved to Bell Labs in New Jersey, where he worked with Vladimir Vapnik who became co-adviser of his PhD thesis at the TU Berlin (with Stefan Jähnichen). His thesis, defended in 1997, won the annual award of the German Informatics Association.[35] In 2001, following positions in Berlin, Cambridge and New York, he founded the Department for Empirical Inference at the Max Planck Institute for Biological Cybernetics, which grew into a leading center for research in machine learning. In 2011, he became founding director at the Max Planck Institute for Intelligent Systems.[36][37]

With Alex Smola, Schölkopf co-founded the series of Machine Learning Summer Schools.[38] He also co-founded a Cambridge-Tübingen PhD Programme [39] and the Max Planck-ETH Center for Learning Systems.[40] In 2016, he co-founded the Cyber Valley research consortium.[41] He participated in the IEEE Global Initiative on "Ethically Aligned Design".[42]

Schölkopf is co-editor-in-Chief of the Journal of Machine Learning Research, a journal he helped found being part of a mass resignation of the editorial board of Machine Learning (journal). He is among the world’s most cited computer scientists.[43] Alumni of his lab include Ulrike von Luxburg, Carl Rasmussen, Matthias Hein, Arthur Gretton, Gunnar Rätsch, Matthias Bethge, Stefanie Jegelka, Jason Weston, Olivier Bousquet, Olivier Chapelle, Joaquin Quinonero-Candela, and Sebastian Nowozin.[44]


Schölkopf’s awards include the Royal Society Milner Award and, shared with Isabelle Guyon and Vladimir Vapnik, the BBVA Foundation Frontiers of Knowledge Award in the Information and Communication Technologies category. He was the first scientist working in Europe to receive this award.[45]


  1. ^ "Causality in Statistics Education Award".
  2. ^ Decoste, Dennis; Schölkopf, Bernhard (1 January 2002). "Training Invariant Support Vector Machines". Machine Learning. 46 (1): 161–190. doi:10.1023/A:1012454411458. hdl:11858/00-001M-0000-0013-E06A-A. S2CID 85843 – via Springer Link.
  3. ^,
  4. ^ Burges, Christopher J.C. (1 June 1998). "A Tutorial on Support Vector Machines for Pattern Recognition". Data Mining and Knowledge Discovery. 2 (2): 121–167. doi:10.1023/A:1009715923555. S2CID 221627509 – via Springer Link.
  5. ^ B. Schölkopf, Support Vector Learning. PhD Thesis, 1997,
  6. ^ B. Schölkopf, A. J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998e
  7. ^ Schölkopf, P. Simard, A. J. Smola, and V. Vapnik. Prior knowledge in support vector kernels. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems 10, pages 640–646, Cambridge, MA, USA, 1998d. MIT Press
  8. ^ Chapelle and B. Schölkopf. Incorporating invariances in nonlinear SVMs. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 609–616, Cambridge, MA, USA, 2002. MIT Press
  9. ^ B. Schölkopf, A. J. Smola, R. C. Williamson, and P. L. Bartlett. New support vector algorithms. Neural Computation, 12(5):1207–1245, 2000a
  10. ^ B. Schölkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the support of a high-dimensional distribution. Neural Computation, 13(7):1443–1471, 2001b
  11. ^ A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. Smola. A Kernel Method for the Two-Sample-Problem. Advances in Neural Information Processing Systems 19: 513--520, 2007
  12. ^ A. J. Smola and A. Gretton and L. Song and B. Schölkopf. A Hilbert Space Embedding for Distributions. Algorithmic Learning Theory: 18th International Conference: 13--31, 2007
  13. ^ B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf and G. Lanckriet. Hilbert Space Embeddings and Metrics on Probability Measures. Journal of Machine Learning Research, 11: 1517--1561, 2010
  14. ^ A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf and A. J. Smola. A Kernel Two-Sample Test. Journal of Machine Learning Research, 13: 723--773, 2012
  15. ^ S. Harmeling, M. Hirsch, and B. Schölkopf. On a link between kernel mean maps and Fraunhofer diffraction, with an application to super-resolution beyond the diffraction limit. In Computer Vision and Pattern Recognition (CVPR), pages 1083–1090. IEEE, 2013
  16. ^ A. Gretton, R. Herbrich, A. J. Smola, O. Bousquet, and B. Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6:2075–2129, 2005a
  17. ^ A. Gretton, O. Bousquet, A. J. Smola and B. Schölkopf. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Algorithmic Learning Theory: 16th International Conference, 2005b
  18. ^ A. Gretton, K. Fukumizu, C.H. Teo, L. Song, B. Schölkopf and A. J. Smola. A Kernel Statistical Test of Independence. Advances in Neural Information Processing Systems 20, 2007
  19. ^ B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal learning. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress
  20. ^ P. O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Schölkopf. Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 689–696, Red Hook, NY, USA, 2009. Curran
  21. ^ D. Janzing, P. Hoyer, and B. Schölkopf. Telling cause from effect based on high-dimensional observations. In J. Fu ̈rnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning, pages 479–486, Madison, WI, USA, 2010. International Machine Learning Society
  22. ^ J.M. Mooij, J. Peters, D. Janzing, J. Zscheischler, and B. Schölkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17(32):1–102, 2016
  23. ^ J. Peters, JM. Mooij, D. Janzing, and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053, 2014
  24. ^ P. Daniusis, D. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Schölkopf. Inferring deterministic causal relations. In P. Grünwald and P. Spirtes, editors, 26th Conference on Uncertainty in Artificial Intelligence, pages 143–150, Corvallis, OR, 2010. AUAI Press. Best student paper award
  25. ^ Janzing, Dominik; Schölkopf, Bernhard (6 October 2010). "Causal Inference Using the Algorithmic Markov Condition". IEEE Transactions on Information Theory. 56 (10): 5168–5194. arXiv:0804.3678. doi:10.1109/TIT.2010.2060095. S2CID 11867432 – via IEEE Xplore.
  26. ^
  27. ^ "From kernels to causal inference".
  28. ^ "Causal Learning --- Bernhard Schölkopf". 15 October 2017 – via Vimeo.
  29. ^ B. Schölkopf, D. Janzing, J. Peters, E. Sgouritsa, K. Zhang, and J. Mooij. On causal and anticausal learning. In J. Langford and J. Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML), pages 1255–1262, New York, NY, USA, 2012. Omnipress
  30. ^ K. Zhang, B. Schölkopf, K. Muandet, and Z. Wang. Domain adaptation under target and conditional shift. In S. Dasgupta and D. McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of JMLR Workshop and Conference Proceedings, pages 819–827, 2013
  31. ^ Schölkopf, Bernhard (6 February 2015). "Learning to see and act". Nature. 518 (7540): 486–487. doi:10.1038/518486a. PMID 25719660. S2CID 4461791 – via
  32. ^ Schölkopf, Bernhard; Hogg, David W.; Wang, Dun; Foreman-Mackey, Daniel; Janzing, Dominik; Simon-Gabriel, Carl-Johann; Peters, Jonas (5 July 2016). "Modeling confounding by half-sibling regression". Proceedings of the National Academy of Sciences. 113 (27): 7391–7398. doi:10.1073/pnas.1511656113. PMC 4941423. PMID 27382154.
  33. ^ D. Foreman-Mackey, B. T. Montet, D. W. Hogg, T. D. Morton, D. Wang, and B. Schölkopf. A systematic search for transiting planets in the K2 data. The Astrophysical Journal, 806(2), 2015
  34. ^
  35. ^ "TU Berlin – Medieninformation Nr. 209 – 17. September 1998".
  36. ^ "History of the Institute".
  37. ^
  38. ^ "Machine Learning Summer Schools – MLSS".
  39. ^ "Cambridge Machine Learning Group | PhD Programme in Advanced Machine Learning".
  40. ^ Williams, Jonathan. "Max Planck ETH Center for Learning Systems".
  41. ^ "Service". Baden-Wü
  42. ^
  43. ^ "World's Top Computer Scientists: H-Index Computer Science Ranking".
  44. ^
  45. ^ Williams, Jon. "Bernhard Schölkopf receives Frontiers of Knowledge Award | Empirical Inference". Max Planck Institute for Intelligent Systems.

External links[edit]