# Matrix factorization (recommender systems)

Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices.[1] This family of methods became widely known during the Netflix prize challenge due to its effectiveness as reported by Simon Funk in his 2006 blog post,[2] where he shared his findings with the research community.

## Techniques

The idea behind matrix factorization is to represent users and items in a lower dimensional latent space. Since the initial work by Funk in 2006 a multitude of matrix factorization approaches have been proposed for recommender systems. Some of the most used and simpler ones are listed in the following sections.

### Funk MF

The original algorithm proposed by Simon Funk in his blog post [2] factorized the user-item rating matrix as the product of two lower dimensional matrices, the first one has a row for each user, while the second has a column for each item. The row or column associated to a specific user or item is referred to as latent factors.[3] Note that, in Funk MF no singular value decomposition is applied, it is a SVD-like machine learning model [2]. The predicted ratings can be computed as ${\displaystyle {\tilde {R}}=HW}$, where ${\displaystyle {\tilde {R}}\in \mathbb {R} ^{users\times items}}$ is the user-item rating matrix, ${\displaystyle H\in \mathbb {R} ^{users\times latentfactors}}$ contains the user's latent factors and ${\displaystyle W\in \mathbb {R} ^{latentfactors\times items}}$ the item's latent factors.

Specifically, the predicted rating user u will give to item i is computed as:

${\displaystyle {\tilde {r}}_{ui}=\sum _{f=0}^{nfactors}H_{u,f}W_{f,i}}$

It is possible to tune the expressive power of the model by changing the number of latent factors. It has been demonstrated [4] that a matrix factorization with one latent factor is equivalent to a most popular or top popular recommender (e.g. recommends the items with the most interactions without any personalization). Increasing the number of latent factor will improve personalization, therefore recommendation quality, until the number of factors becomes too high, at which point the model starts to overfit and the recommendation quality will decrease. A common strategy to avoid overfitting is to add regularization terms to the objective function. Funk MF was developed as a rating prediction problem, therefore it uses explicit numerical ratings as user-item interactions.

All things considered, Funk MF minimizes the following objective function:

${\displaystyle {\underset {H,W}{\operatorname {arg\,min} }}\,\|R-{\tilde {R}}\|_{\rm {F}}+\alpha \|H\|+\beta \|W\|}$

Where ${\displaystyle \|.\|_{\rm {F}}}$ is defined to be the frobenius norm whereas the other norms might be either frobenius or another norm depending on the specific recommending problem.[5]

### SVD++

While Funk MF is able to provide very good recommendation quality, its ability to use only explicit numerical ratings as user-items interactions constitutes a limitation. Modern day recommender systems should exploit all available interactions both explicit (e.g. numerical ratings) and implicit (e.g. likes, purchases, skipped, bookmarked). To this end SVD++ was designed to take into account implicit interactions as well.[6][7] Compared to Funk MF, SVD++ takes also into account user and item bias.

The predicted rating user u will give to item i is computed as:

${\displaystyle {\tilde {r}}_{ui}=\mu +b_{i}+b_{u}+\sum _{f=0}^{nfactors}H_{u,f}W_{f,i}}$

SVD++ has however some disadvantages, with the main drawback being that this method is not model-based. This means that if a new user is added, the algorithm is incapable of modeling it unless the whole model is retrained. Even though the system might have gathered some interactions for that new user, its latent factors are not available and therefore no recommendations can be computed. This is an example of a cold-start problem, that is the recommender cannot deal efficiently with new users or items and specific strategies should be put in place to handle this disadvantage.[8]

A possible way to address this cold start problem is to modify SVD++ in order for it to become a model-based algorithm, therefore allowing to easily manage new items and new users.

As previously mentioned in SVD++ we don't have the latent factors of new users, therefore it is necessary to represent them in a different way. The user's latent factors represent the preference of that user for the corresponding item's latent factors, therefore user's latent factors can be estimated via the past user interactions. If the system is able to gather some interactions for the new user it is possible to estimate its latent factors. Note that this does not entirely solve the cold-start problem, since the recommender still requires some reliable interactions for new users, but at least there is no need to recompute the whole model every time. It has been demonstrated that this formulation is almost equivalent to a SLIM model,[9] which is an item-item model based recommender.

${\displaystyle {\tilde {r}}_{ui}=\mu +b_{i}+b_{u}+\sum _{f=0}^{nfactors}{\biggl (}\sum _{j=0}^{nitems}r_{uj}W_{j,f}{\biggr )}W_{f,i}}$

With this formulation, the equivalent item-item recommender would be ${\displaystyle {\tilde {R}}=RS=RW^{\rm {T}}W}$. Therefore the similarity matrix is symmetric.

### Asymmetric SVD

Asymmetric SVD aims at combining the advantages of SVD++ while being a model based algorithm, therefore being able to consider new users with a few ratings without needing to retrain the whole model. As opposed to the model-based SVD here the user latent factor matrix H is replaced by Q, which learns the user's preferences as function of their ratings.[10]

The predicted rating user u will give to item i is computed as:

${\displaystyle {\tilde {r}}_{ui}=\mu +b_{i}+b_{u}+\sum _{f=0}^{nfactors}\sum _{j=0}^{nitems}r_{uj}Q_{j,f}W_{f,i}}$

With this formulation, the equivalent item-item recommender would be ${\displaystyle {\tilde {R}}=RS=RQ^{\rm {T}}W}$. Since matrices Q and W are different the similarity matrix is asymmetric, hence the name of the model.

### Hybrid MF

In recent years many other matrix factorization models have been developed to exploit the ever increasing amount and variety of available interaction data and use cases. Hybrid matrix factorization algorithms are capable of merging explicit and implicit interactions [11] or both content and collaborative data [12][13][14]

## References

1. ^ Koren, Yehuda; Bell, Robert; Volinsky, Chris (August 2009). "Matrix Factorization Techniques for Recommender Systems". Computer. 42 (8): 30–37. CiteSeerX 10.1.1.147.8295. doi:10.1109/MC.2009.263.
2. ^ a b c Funk, Simon. "Netflix Update: Try This at Home".
3. ^ Agarwal, Deepak; Chen, Bee-Chung (28 June 2009). "Regression-based latent factor models". Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09. ACM. pp. 19–28. doi:10.1145/1557019.1557029. ISBN 9781605584959.
4. ^ Jannach, Dietmar; Lerche, Lukas; Gedikli, Fatih; Bonnin, Geoffray (2013). What Recommenders Recommend – An Analysis of Accuracy, Popularity, and Sales Diversity Effects. User Modeling, Adaptation, and Personalization. Lecture Notes in Computer Science. 7899. Springer Berlin Heidelberg. pp. 25–37. CiteSeerX 10.1.1.465.96. doi:10.1007/978-3-642-38844-6_3. ISBN 978-3-642-38843-9.
5. ^ Paterek, Arkadiusz (2007). "Improving regularized singular value decomposition for collaborative filtering" (PDF). Proceedings of KDD Cup and Workshop.
6. ^ Cao, Jian; Hu, Hengkui; Luo, Tianyan; Wang, Jia; Huang, May; Wang, Karl; Wu, Zhonghai; Zhang, Xing (2015). Distributed Design and Implementation of SVD++ Algorithm for E-commerce Personalized Recommender System. Communications in Computer and Information Science. 572. Springer Singapore. pp. 30–44. doi:10.1007/978-981-10-0421-6_4. ISBN 978-981-10-0420-9.
7. ^ Jia, Yancheng (September 2014). Users' brands preference based on SVD++ in recommender systems. 2014 IEEE Workshop on Advanced Research and Technology in Industry Applications (WARTIA). IEEE. pp. 1175–1178. doi:10.1109/wartia.2014.6976489. ISBN 978-1-4799-6989-0.
8. ^ Kluver, Daniel; Konstan, Joseph A. (6 October 2014). "Evaluating recommender behavior for new users". Proceedings of the 8th ACM Conference on Recommender systems - Rec Sys '14. ACM. pp. 121–128. doi:10.1145/2645710.2645742. ISBN 9781450326681.
9. ^ Zheng, Yong; Mobasher, Bamshad; Burke, Robin (6 October 2014). "CSLIM". CSLIM: contextual SLIM recommendation algorithms. ACM. pp. 301–304. doi:10.1145/2645710.2645756. ISBN 9781450326681.
10. ^ Pu, Li; Faltings, Boi (12 October 2013). "Understanding and improving relational matrix factorization in recommender systems". Proceedings of the 7th ACM conference on Recommender systems - Rec Sys '13. ACM. pp. 41–48. doi:10.1145/2507157.2507178. ISBN 9781450324090.
11. ^ Zhao, Changwei; Sun, Suhuan; Han, Linqian; Peng, Qinke (2016). "HYBRID MATRIX FACTORIZATION FOR RECOMMENDER SYSTEMS IN SOCIAL NETWORKS". Neural Network World. 26 (6): 559–569. doi:10.14311/NNW.2016.26.032.
12. ^ Zhou, Tinghui; Shan, Hanhuai; Banerjee, Arindam; Sapiro, Guillermo (26 April 2012). Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information. Proceedings of the 2012 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics. pp. 403–414. doi:10.1137/1.9781611972825.35. ISBN 978-1-61197-232-0.
13. ^ Adams, Ryan Prescott; Dahl, George E.; Murray, Iain (25 March 2010). "Incorporating Side Information in Probabilistic Matrix Factorization with Gaussian Processes 1003.4944". arXiv:1003.4944 [stat.ML].
14. ^ Fang, Yi; Si, Luo (27 October 2011). "Matrix co-factorization for recommendation with rich side information and implicit feedback". Proceedings of the 2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems - Het Rec '11. ACM. pp. 65–69. doi:10.1145/2039320.2039330. ISBN 9781450310277.