# Transfer learning

Transfer learning (TL) is a research problem in machine learning (ML) that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.[1] For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks. This area of research bears some relation to the long history of psychological literature on transfer of learning, although practical ties between the two fields are limited. From the practical standpoint, reusing or transferring information from previously learned tasks for the learning of new tasks has the potential to significantly improve the sample efficiency of a reinforcement learning agent.[2]

## History

In 1976 Stevo Bozinovski and Ante Fulgosi published a paper explicitly addressing transfer learning in neural networks training.[3][4] The paper gives a mathematical and geometrical model of transfer learning. In 1981 a report was given on the application of transfer learning in training a neural network on a dataset of images representing letters of computer terminals. Both positive and negative transfer learning was experimentally demonstrated.[5]

In 1993, Lorien Pratt published a paper on transfer in machine learning, formulating the discriminability-based transfer (DBT) algorithm.[6]

In 1997, Pratt and Sebastian Thrun guest edited a special issue of Machine Learning devoted to transfer learning,[7] and by 1998, the field had advanced to include multi-task learning,[8] along with a more formal analysis of its theoretical foundations.[9] Learning to Learn,[10] edited by Thrun and Pratt, is a 1998 review of the subject.

Transfer learning has also been applied in cognitive science, with Pratt also guest editing an issue of Connection Science on reuse of neural networks through transfer in 1996.[11]

Andrew Ng said in his NIPS 2016 tutorial [12][13][14] that TL will be the next driver of ML commercial success after supervised learning to highlight the importance of TL.

## Definition

The definition of transfer learning is given in terms of domains and tasks. A domain ${\displaystyle {\mathcal {D}}}$ consists of: a feature space ${\displaystyle {\mathcal {X}}}$ and a marginal probability distribution ${\displaystyle P(X)}$, where ${\displaystyle X=\{x_{1},...,x_{n}\}\in {\mathcal {X}}}$. Given a specific domain, ${\displaystyle {\mathcal {D}}=\{{\mathcal {X}},P(X)\}}$, a task consists of two components: a label space ${\displaystyle {\mathcal {Y}}}$ and an objective predictive function ${\displaystyle f:{\mathcal {X}}\rightarrow {\mathcal {Y}}}$. The function ${\displaystyle f}$ is used to predict the corresponding label ${\displaystyle f(x)}$ of a new instance ${\displaystyle x}$. This task, denoted by ${\displaystyle {\mathcal {T}}=\{{\mathcal {Y}},f(x)\}}$, is learned from the training data consisting of pairs ${\displaystyle \{x_{i},y_{i}\}}$, where ${\displaystyle x_{i}\in X}$ and ${\displaystyle y_{i}\in {\mathcal {Y}}}$.[15]

Given a source domain ${\displaystyle {\mathcal {D}}_{S}}$ and learning task ${\displaystyle {\mathcal {T}}_{S}}$, a target domain ${\displaystyle {\mathcal {D}}_{T}}$and learning task ${\displaystyle {\mathcal {T}}_{T}}$, where ${\displaystyle {\mathcal {D}}_{S}\neq {\mathcal {D}}_{T}}$, or ${\displaystyle {\mathcal {T}}_{S}\neq {\mathcal {T}}_{T}}$, transfer learning aims to help improve the learning of the target predictive function ${\displaystyle f_{T}(\cdot )}$ in ${\displaystyle {\mathcal {D}}_{T}}$ using the knowledge in ${\displaystyle {\mathcal {D}}_{S}}$ and ${\displaystyle {\mathcal {T}}_{S}}$.[15]

## Applications

Algorithms are available for transfer learning in Markov logic networks[16] and Bayesian networks.[17] Transfer learning has also been applied to cancer subtype discovery,[18] building utilization,[19][20] general game playing,[21] text classification,[22][23] digit recognition,[24] medical imaging and spam filtering.[25]

In 2020 it was discovered that, due to their similar physical natures, transfer learning is possible between Electromyographic (EMG) signals from the muscles when classifying the behaviors of Electroencephalographic (EEG) brainwaves from the gesture recognition domain to the mental state recognition domain. It was also noted that this relationship worked vice versa, showing that EEG can likewise be used to classify EMG in addition.[26] The experiments noted that the accuracy of neural networks and convolutional neural networks were improved[27] through transfer learning both at the first epoch (prior to any learning, ie. compared to standard random weight distribution) and at the asymptote (the end of the learning process). That is, algorithms are improved by exposure to another domain. Moreover, the end-user of a pre-trained model can change the structure of fully-connected layers to achieve superior performance.[28]

## References

1. ^ West, Jeremy; Ventura, Dan; Warnick, Sean (2007). "Spring Research Presentation: A Theoretical Foundation for Inductive Transfer". Brigham Young University, College of Physical and Mathematical Sciences. Archived from the original on 2007-08-01. Retrieved 2007-08-05.
2. ^ George Karimpanal, Thommen; Bouffanais, Roland (2019). "Self-organizing maps for storage and transfer of knowledge in reinforcement learning". Adaptive Behavior. 27 (2): 111–126. arXiv:1811.08318. doi:10.1177/1059712318818568. ISSN 1059-7123. S2CID 53774629.
3. ^ Stevo. Bozinovski and Ante Fulgosi (1976). "The influence of pattern similarity and transfer learning upon the training of a base perceptron B2." (original in Croatian) Proceedings of Symposium Informatica 3-121-5, Bled.
4. ^ Stevo Bozinovski (2020) "Reminder of the first paper on transfer learning in neural networks, 1976". Informatica 44: 291–302.
5. ^ S. Bozinovski (1981). "Teaching space: A representation concept for adaptive pattern classification." COINS Technical Report, the University of Massachusetts at Amherst, No 81-28 [available online: UM-CS-1981-028.pdf]
6. ^ Pratt, L. Y. (1993). "Discriminability-based transfer between neural networks" (PDF). NIPS Conference: Advances in Neural Information Processing Systems 5. Morgan Kaufmann Publishers. pp. 204–211.
7. ^ Pratt, L. Y.; Thrun, Sebastian (July 1997). "Machine Learning - Special Issue on Inductive Transfer". link.springer.com. Springer. Retrieved 2017-08-10.
8. ^ Caruana, R., "Multitask Learning", pp. 95-134 in Thrun & Pratt 2012
9. ^ Baxter, J., "Theoretical Models of Learning to Learn", pp. 71-95 Thrun & Pratt 2012
10. ^
11. ^ Pratt, L. (1996). "Special Issue: Reuse of Neural Networks through Transfer". Connection Science. 8 (2). Retrieved 2017-08-10.
12. ^
13. ^ "NIPS 2016 Schedule". nips.cc. Retrieved 2019-12-28.
14. ^ Nuts and bolts of building AI applications using Deep Learning, slides
15. ^ a b Lin, Yuan-Pin; Jung, Tzyy-Ping (27 June 2017). "Improving EEG-Based Emotion Classification Using Conditional Transfer Learning". Frontiers in Human Neuroscience. 11: 334. doi:10.3389/fnhum.2017.00334. PMC 5486154. PMID 28701938. Material was copied from this source, which is available under a Creative Commons Attribution 4.0 International License.
16. ^ Mihalkova, Lilyana; Huynh, Tuyen; Mooney, Raymond J. (July 2007), "Mapping and Revising Markov Logic Networks for Transfer" (PDF), Learning Proceedings of the 22nd AAAI Conference on Artificial Intelligence (AAAI-2007), Vancouver, BC, pp. 608–614, retrieved 2007-08-05
17. ^ Niculescu-Mizil, Alexandru; Caruana, Rich (March 21–24, 2007), "Inductive Transfer for Bayesian Network Structure Learning" (PDF), Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS 2007), retrieved 2007-08-05
18. ^ Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. arXiv:1810.09433
19. ^ Arief-Ang, I.B.; Salim, F.D.; Hamilton, M. (2017-11-08). DA-HOC: semi-supervised domain adaptation for room occupancy prediction using CO2 sensor data. 4th ACM International Conference on Systems for Energy-Efficient Built Environments (BuildSys). Delft, Netherlands. pp. 1–10. doi:10.1145/3137133.3137146. ISBN 978-1-4503-5544-5.
20. ^ Arief-Ang, I.B.; Hamilton, M.; Salim, F.D. (2018-12-01). "A Scalable Room Occupancy Prediction with Transferable Time Series Decomposition of CO2 Sensor Data". ACM Transactions on Sensor Networks. 14 (3–4): 21:1–21:28. doi:10.1145/3217214. S2CID 54066723.
21. ^ Banerjee, Bikramjit, and Peter Stone. "General Game Learning Using Knowledge Transfer." IJCAI. 2007.
22. ^ Do, Chuong B.; Ng, Andrew Y. (2005). "Transfer learning for text classification". Neural Information Processing Systems Foundation, NIPS*2005 (PDF). Retrieved 2007-08-05.
23. ^ Rajat, Raina; Ng, Andrew Y.; Koller, Daphne (2006). "Constructing Informative Priors using Transfer Learning". Twenty-third International Conference on Machine Learning (PDF). Retrieved 2007-08-05.
24. ^ Maitra, D. S.; Bhattacharya, U.; Parui, S. K. (August 2015). "CNN based common approach to handwritten character recognition of multiple scripts". 2015 13th International Conference on Document Analysis and Recognition (ICDAR): 1021–1025. doi:10.1109/ICDAR.2015.7333916. ISBN 978-1-4799-1805-8. S2CID 25739012.
25. ^ Bickel, Steffen (2006). "ECML-PKDD Discovery Challenge 2006 Overview". ECML-PKDD Discovery Challenge Workshop (PDF). Retrieved 2007-08-05.
26. ^ Bird, Jordan J.; Kobylarz, Jhonatan; Faria, Diego R.; Ekart, Aniko; Ribeiro, Eduardo P. (2020). "Cross-Domain MLP and CNN Transfer Learning for Biological Signal Processing: EEG and EMG". IEEE Access. Institute of Electrical and Electronics Engineers (IEEE). 8: 54789–54801. doi:10.1109/access.2020.2979074. ISSN 2169-3536.
27. ^ Maitra, Durjoy Sen; Bhattacharya, Ujjwal; Parui, Swapan K. (August 2015). "CNN based common approach to handwritten character recognition of multiple scripts". 2015 13th International Conference on Document Analysis and Recognition (ICDAR): 1021–1025. doi:10.1109/ICDAR.2015.7333916.
28. ^ Kabir, H. M., Abdar, M., Jalali, S. M. J., Khosravi, A., Atiya, A. F., Nahavandi, S., & Srinivasan, D. (2020). Spinalnet: Deep neural network with gradual input. arXiv preprint arXiv:2007.03347.