Differentiable programming

Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation.^[1]^[2]^[3]^[4]^[5] This allows for gradient-based optimization of parameters in the program, often via gradient descent, as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly scientific computing and machine learning.^[5] One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the Advanced Concepts Team at the European Space Agency in early 2016.^[6]

Approaches[edit]

Most differentiable programming frameworks work by constructing a graph containing the control flow and data structures in the program.^[7] Attempts generally fall into two groups:

Static, compiled graph-based approaches such as TensorFlow,^{[note 1]} Theano, and MXNet. They tend to allow for good compiler optimization and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving loops or recursion), as well as making it harder for users to reason effectively about their programs.^[7] A proof of concept compiler toolchain called Myia uses a subset of Python as a front end and supports higher-order functions, recursion, and higher-order derivatives.^[8]^[9]^[10]

Operator overloading, dynamic graph based approaches such as PyTorch and NumPy's autograd package. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to interpreter overhead (particularly when composing many small operations), poorer scalability, and reduced benefit from compiler optimization.^[9]^[10] A package for the Julia programming language – Zygote – works directly on Julia's intermediate representation, allowing it to still be optimized by Julia's just-in-time compiler.^[7]^[11]^[5]

A limitation of earlier approaches is that they are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs. Newer approaches resolve this issue by constructing the graph from the language's syntax or IR, allowing arbitrary code to be differentiated.^[7]^[9]

Applications[edit]

Differentiable programming has been applied in areas such as combining deep learning with physics engines in robotics,^[12] solving electronic structure problems with differentiable density functional theory,^[13] differentiable ray tracing,^[14] image processing,^[15] and probabilistic programming.^[5]

Multidisciplinary application[edit]

Differentiable programming is making significant strides in various fields beyond its traditional applications. In healthcare and life sciences, for example, it is being used for deep learning in biophysics-based modelling of molecular mechanisms. This involves leveraging differentiable programming in areas such as protein structure prediction and drug discovery. These applications demonstrate the potential of differentiable programming in contributing to significant advancements in understanding complex biological systems and improving healthcare solutions.^[16]

Notes[edit]

^ TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.

References[edit]

^ Izzo, Dario; Biscani, Francesco; Mereta, Alessio (2017). "Differentiable Genetic Programming". Genetic Programming. Lecture Notes in Computer Science. Vol. 10196. pp. 35–51. arXiv:1611.04766. doi:10.1007/978-3-319-55696-3_3. ISBN 978-3-319-55695-6. S2CID 17786263.
^ Baydin, Atilim Gunes; Pearlmutter, Barak A.; Radul, Alexey Andreyevich; Siskind, Jeffrey Mark (2018). "Automatic Differentiation in Machine Learning: a Survey". Journal of Marchine Learning Research. 18 (153): 1–43.
^ Wang, Fei; Decker, James; Wu, Xilun; Essertel, Gregory; Rompf, Tiark (2018). "Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming" (PDF). In Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K (eds.). NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates. pp. 10201–10212.
^ Innes, Mike (2018). "On Machine Learning and Programming Languages" (PDF). SysML Conference 2018. Archived from the original (PDF) on 2019-07-17. Retrieved 2019-07-04.
^ ^a ^b ^c ^d Innes, Mike; Edelman, Alan; Fischer, Keno; Rackauckas, Chris; Saba, Elliot; Viral B Shah; Tebbutt, Will (2019). "A Differentiable Programming System to Bridge Machine Learning and Scientific Computing". arXiv:1907.07587.
^ "Differential Intelligence". October 2016. Retrieved 2022-10-19.
^ ^a ^b ^c ^d Innes, Michael; Saba, Elliot; Fischer, Keno; Gandhi, Dhairya; Marco Concetto Rudilosso; Neethu Mariya Joy; Karmali, Tejan; Pal, Avik; Shah, Viral (2018). "Fashionable Modelling with Flux". arXiv:1811.01457.
^ Merriënboer, Bart van; Breuleux, Olivier; Bergeron, Arnaud; Lamblin, Pascal (3 December 2018). "Automatic differentiation in ML: where we are and where we should be going". NIPS'18. Vol. 31. pp. 8771–81.
^ ^a ^b ^c Breuleux, O.; van Merriënboer, B. (2017). "Automatic Differentiation in Myia" (PDF). Archived from the original (PDF) on 2019-06-24. Retrieved 2019-06-24.
^ ^a ^b "TensorFlow: Static Graphs". Tutorials: Learning PyTorch. PyTorch.org. Retrieved 2019-03-04.
^ Innes, Michael (2018). "Don't Unroll Adjoint: Differentiating SSA-Form Programs". arXiv:1810.07951.
^ Degrave, Jonas; Hermans, Michiel; Dambre, Joni; wyffels, Francis (2016). "A Differentiable Physics Engine for Deep Learning in Robotics". arXiv:1611.01652.
^ Li, Li; Hoyer, Stephan; Pederson, Ryan; Sun, Ruoxi; Cubuk, Ekin D.; Riley, Patrick; Burke, Kieron (2021). "Kohn-Sham Equations as Regularizer: Building Prior Knowledge into Machine-Learned Physics". Physical Review Letters. 126 (3): 036401. arXiv:2009.08551. Bibcode:2021PhRvL.126c6401L. doi:10.1103/PhysRevLett.126.036401. PMID 33543980.
^ Li, Tzu-Mao; Aittala, Miika; Durand, Frédo; Lehtinen, Jaakko (2018). "Differentiable Monte Carlo Ray Tracing through Edge Sampling". ACM Transactions on Graphics. 37 (6): 222:1–11. doi:10.1145/3272127.3275109. S2CID 52839714.
^ Li, Tzu-Mao; Gharbi, Michaël; Adams, Andrew; Durand, Frédo; Ragan-Kelley, Jonathan (August 2018). "Differentiable Programming for Image Processing and Deep Learning in Halide". ACM Transactions on Graphics. 37 (4): 139:1–13. doi:10.1145/3197517.3201383. S2CID 46927588.
^ AlQuraishi, Mohammed; Sorger, Peter K. (October 2021). "Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms". Nature Methods. 18 (10): 1169–1180. doi:10.1038/s41592-021-01283-4. PMC 8793939. PMID 34608321.

[8] TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.

[izzo2016_dCGP-1] Izzo, Dario; Biscani, Francesco; Mereta, Alessio (2017). "Differentiable Genetic Programming". Genetic Programming. Lecture Notes in Computer Science. Vol. 10196. pp. 35–51. arXiv:1611.04766. doi:10.1007/978-3-319-55696-3_3. ISBN 978-3-319-55695-6. S2CID 17786263.

[baydin2018automatic-2] Baydin, Atilim Gunes; Pearlmutter, Barak A.; Radul, Alexey Andreyevich; Siskind, Jeffrey Mark (2018). "Automatic Differentiation in Machine Learning: a Survey". Journal of Marchine Learning Research. 18 (153): 1–43.

[3] Wang, Fei; Decker, James; Wu, Xilun; Essertel, Gregory; Rompf, Tiark (2018). "Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming" (PDF). In Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K (eds.). NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates. pp. 10201–10212.

[innes-4] Innes, Mike (2018). "On Machine Learning and Programming Languages" (PDF). SysML Conference 2018. Archived from the original (PDF) on 2019-07-17. Retrieved 2019-07-04.

[diffprog-zygote-5] Innes, Mike; Edelman, Alan; Fischer, Keno; Rackauckas, Chris; Saba, Elliot; Viral B Shah; Tebbutt, Will (2019). "A Differentiable Programming System to Bridge Machine Learning and Scientific Computing". arXiv:1907.07587.

[differential_intelligence-6] "Differential Intelligence". October 2016. Retrieved 2022-10-19.

[flux-7] Innes, Michael; Saba, Elliot; Fischer, Keno; Gandhi, Dhairya; Marco Concetto Rudilosso; Neethu Mariya Joy; Karmali, Tejan; Pal, Avik; Shah, Viral (2018). "Fashionable Modelling with Flux". arXiv:1811.01457.

[9] Merriënboer, Bart van; Breuleux, Olivier; Bergeron, Arnaud; Lamblin, Pascal (3 December 2018). "Automatic differentiation in ML: where we are and where we should be going". NIPS'18. Vol. 31. pp. 8771–81.

[myia1-10] Breuleux, O.; van Merriënboer, B. (2017). "Automatic Differentiation in Myia" (PDF). Archived from the original (PDF) on 2019-06-24. Retrieved 2019-06-24.

[pytorchtut-11] "TensorFlow: Static Graphs". Tutorials: Learning PyTorch. PyTorch.org. Retrieved 2019-03-04.

[12] Innes, Michael (2018). "Don't Unroll Adjoint: Differentiating SSA-Form Programs". arXiv:1810.07951.

[13] Degrave, Jonas; Hermans, Michiel; Dambre, Joni; wyffels, Francis (2016). "A Differentiable Physics Engine for Deep Learning in Robotics". arXiv:1611.01652.

[Li2021-14] Li, Li; Hoyer, Stephan; Pederson, Ryan; Sun, Ruoxi; Cubuk, Ekin D.; Riley, Patrick; Burke, Kieron (2021). "Kohn-Sham Equations as Regularizer: Building Prior Knowledge into Machine-Learned Physics". Physical Review Letters. 126 (3): 036401. arXiv:2009.08551. Bibcode:2021PhRvL.126c6401L. doi:10.1103/PhysRevLett.126.036401. PMID 33543980.

[15] Li, Tzu-Mao; Aittala, Miika; Durand, Frédo; Lehtinen, Jaakko (2018). "Differentiable Monte Carlo Ray Tracing through Edge Sampling". ACM Transactions on Graphics. 37 (6): 222:1–11. doi:10.1145/3272127.3275109. S2CID 52839714.

[16] Li, Tzu-Mao; Gharbi, Michaël; Adams, Andrew; Durand, Frédo; Ragan-Kelley, Jonathan (August 2018). "Differentiable Programming for Image Processing and Deep Learning in Halide". ACM Transactions on Graphics. 37 (4): 139:1–13. doi:10.1145/3197517.3201383. S2CID 46927588.

[17] AlQuraishi, Mohammed; Sorger, Peter K. (October 2021). "Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms". Nature Methods. 18 (10): 1169–1180. doi:10.1038/s41592-021-01283-4. PMC 8793939. PMID 34608321.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[note 1]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

Approaches[edit]

Applications[edit]

Multidisciplinary application[edit]

See also[edit]

Notes[edit]

References[edit]