# User:Lixx0607/sandbox

Ranking SVM is an application of Support vector machine, which is used to solve certain ranking problems. The algorithm of ranking SVM was published by Torsten Joachims by 2003 [1]. The original purpose of of Ranking SVM is to improve the performance of internet search engine. However, it was found Ranking SVM also can be used to solve other problems such as Rank-SIFT[2] .

## Description

Ranking SVM, one kind of the pair-wise ranking methods, which is used to adaptively sort the web-pages by their relationships (how relevant) to a specific inquire. A mapping function is required to define such relationships, which project each data pair (inquire and clicked web-page) on to a feature space.These features combined with user’s click-through data (which implies page ranks for a specific inquire) can be considered as training data for machine learning algorithms.

Generally, Ranking-SVM includes three steps in training period:

1. It maps the similarities between inquires and the clicked pages onto certain features space.
2. It calculates the distances between any two of the vectors obainned in step 1.
3. It forms optimization problem that is similar to SVM classification and solve the problem with regular SVM solver.

## Background

### Ranking Method

Suppose ${\displaystyle \mathbb {C} }$ is a data set containing ${\displaystyle C}$ elements ${\displaystyle c_{i}}$. ${\displaystyle r}$ is a ranking method applied in ${\displaystyle \mathbb {C} }$. Then ${\displaystyle r}$ can be defined as follows: The ${\displaystyle r}$ in ${\displaystyle \mathbb {C} }$ can be represented as a ${\displaystyle C}$ by ${\displaystyle C}$ asymmetric binary matrix. If the the rank of ${\displaystyle c_{i}}$ is higher than rank of ${\displaystyle c_{j}}$,i.e.${\displaystyle c_{i}, the corresponding position of this matrix is set to value of "1". Otherwise the element of the matrix will be set as the value "0".

### Kendall’s Tau [3][4]

Kendall's Tau also refers to Kendall tau rank correlation coefficient, which is commonly used to compare two ranking methods for the same data set.

Suppose ${\displaystyle r_{1}}$ and ${\displaystyle r_{2}}$ are two ranking method applied to data set ${\displaystyle \mathbb {C} }$, the Kendall's Tau between ${\displaystyle r_{1}}$ and ${\displaystyle r_{2}}$ can be represented as follows:

${\displaystyle \tau (r_{1},r_{2})={P-Q \over P+Q}=1-{2Q \over P+Q}}$

where ${\displaystyle P}$ is the number of same elements in the upper triangular parts of matrices of ${\displaystyle r_{1}}$ and ${\displaystyle r_{2}}$, ${\displaystyle Q}$ is the number of different elements in the upper triangular parts of matrices of ${\displaystyle r_{1}}$ and ${\displaystyle r_{2}}$. The diagonals of the matrices are not included in the upper triangular part stated above.

### Information Retrieval Quality [5][6][7]

Information retrieval quality is usually evaluated by the following three measurements:

1. Precision
2. Recall
3. Average Precision

For an specific inquire to a database, let ${\displaystyle Prelevant}$ be the set of relevant information elements in the database and ${\displaystyle Pretrieved}$ be the set of the retrieved information elements. Then the above three measurements can be represented as follows:

${\displaystyle {\begin{array}{lcl}Precision={\left\vert Prelevant\cap Pretrieved\right\vert \over \left\vert Pretrieved\right\vert };\\\\Recall={\left\vert Prelevant\cap Pretrieved\right\vert \over \left\vert Prelevant\right\vert };\\\\AveragePrecision=\int _{0}^{1}{Prec(R_{ecall})}dR_{ecall},\\\end{array}}}$

where ${\displaystyle Prec(R_{ecall})}$ is a Precision function of ${\displaystyle Recall}$.

Let ${\displaystyle r^{*}}$ and ${\displaystyle r_{f(q)}}$ be the expected and proposed ranking methods of a database respectively, the Average Precision of method ${\displaystyle r_{f(q)}}$ can be represented as follows:

${\displaystyle AvgPrec(r_{f(q)})\geqq {1 \over R}\left[Q+{\binom {R+1}{2}}\right]^{-1}(\sum _{i=1}^{R}{\sqrt {i}})^{2}}$

where ${\displaystyle Q}$ is the number of different elements in the upper triangular parts of matrices of ${\displaystyle r^{*}}$ and ${\displaystyle r_{f(q)}}$

### SVM Classifier [8]

Suppose ${\displaystyle ({\vec {x}}_{i},y_{i})}$ is the element of a training data set , where ${\displaystyle {\vec {x}}_{i}}$ is the feature vector (with information about features) and ${\displaystyle y_{i}}$ is the label(which classifies the category of ${\displaystyle {\vec {x}}_{i}}$). An typical SVM classifier can be defined as the solution of the following optimization problem.

${\displaystyle {\begin{array}{lcl}minimize:V({\vec {w}},{\vec {\xi }})={1 \over 2}{\vec {w}}\cdot {\vec {w}}+CF\sum {\xi _{i}^{\sigma }}\\s.t.\\{\begin{array}{lcl}\sigma \geqq 0;\\\forall y_{i}({\vec {w}}{\vec {x}}_{i}+b)\geqq 1-xi_{i}^{\sigma };\end{array}}\\where\\{\begin{array}{lcl}b\ is\ a\ scalar;\\\forall y_{i}\in \left\{-1,1\right\};\\\forall \xi _{i}\geqq 0;\\\end{array}}\end{array}}}$

The solution of the above optimization problem can be represented as a linear combination of the feature vectors ${\displaystyle x_{i}}$s.

${\displaystyle {\vec {w}}=\sum _{i}{\alpha _{i}y_{i}x_{i}}}$

where ${\displaystyle \alpha _{i}}$ is the coefficients to be determined.

## Ranking SVM algorithm

### Loss Function

Let ${\displaystyle \tau _{P(f)}}$ be the Kendall's tau between expected ranking method ${\displaystyle r^{*}}$ and proposed method ${\displaystyle r_{f(q)}}$, it can be proved that maximizing ${\displaystyle \tau _{P(f)}}$ helps to improve the lower bound of the Average Precision of ${\displaystyle r_{f(q)}}$.

• Expected Loss Function [9]

The negative ${\displaystyle \tau _{P(f)}}$ can be selected as the loss function to minimize the lower bound of Average Precision of ${\displaystyle r_{f(q)}}$ ${\displaystyle L_{expected}=-\tau _{P(f)}=-\int \tau (r_{f(q)},r^{*})dPr(q,r^{*})}$

where ${\displaystyle Pr(q,r^{*})}$ is the statistical distribution of ${\displaystyle r^{*}}$ to certain query ${\displaystyle q}$.

• Empirical Loss Function

Since the expected loss function is not applicable, the following empirical loss function was selected for the training data.

${\displaystyle \tau _{S}(f)={1 \over n}\sum _{i=1}^{n}{\tau (r_{f(q_{i})},r_{i}^{*})}}$

### Collecting training data

${\displaystyle n}$ i.i.d. queries are applied to a database and each query corresponds to a ranking method.So The training data has ${\displaystyle n}$ elements.Each elements containing a query and the corresponding ranking method.

### Features Space

A mapping function ${\displaystyle \Phi (q,d)}$[10][11] is required to map each query and the element of database to a feature space. Then each point in the feature space is labelled with certain rank by ranking method.

### Optimization problem

The points generated by the training data are in the feature space, which also carry the rank information (the labels). These labeled points can be used to find the boundary (classifier) that specifies the order of them. In the linear case, such boundary (classifier) is an vector.
Let vector ${\displaystyle {\vec {w}}}$ be the optimal linear classifier in the feature space, then the original ranking problem could be represented as the following optimization problem:

${\displaystyle {\begin{array}{lcl}minimize:V({\vec {w}},{\vec {\xi }})={1 \over 2}{\vec {w}}\cdot {\vec {w}}+C\sum {\xi _{i,j,k}}\\s.t.\\{\begin{array}{lcl}\forall \xi _{i,j,k}\geqq 0\\\forall (d_{i},d_{j})\in r_{i}^{*}\\{\vec {w}}(\Phi (q_{1},d_{i})-\Phi (q_{1},d_{j}))\geqq 1-\xi _{i,j,1}\\...\\{\vec {w}}(\Phi (q_{n},d_{i})-\Phi (q_{n},d_{j}))\geqq 1-\xi _{i,j,n}\\\end{array}}\end{array}}}$

The above optimization problem is identical to the classical SVM classification problem, which is the reason why this algorithm is called Ranking-SVM.

### Retrieval Function

The optimal vector ${\displaystyle {\vec {w}}}$ obtained by the training sample is

${\displaystyle {\vec {w}}^{*}=\sum {\alpha _{k,l}^{*}\Phi (q_{k},d_{i})}}$

So the retrieval function could be formed based on such optimal classifier.
For new query ${\displaystyle q}$, the retrieval function first projects all elements of the database to the feature space. Then it orders these feature points by the values of their inner products with the optimal vector. And the rank of each feature point is the rank of the corresponding element of database for the query ${\displaystyle q}$.

## Ranking SVM application

Ranking SVM can be applied to rank the pages according to the query. Click-through Data is needed to train the algorithm, where Click-through data includes three parts:

1. Query.
2. Page rank obtained by previous trained result.
3. Pages that user clicks.

The combination of 2 and 3 cannot provide full training data order which is needed to apply full svm algorithm. Instead, it provides a part of the ranking information of the training data, So the algorithm can be slightly revised as follows.

${\displaystyle {\begin{array}{lcl}minimize:V({\vec {w}},{\vec {\xi }})={1 \over 2}{\vec {w}}\cdot {\vec {w}}+C\sum {\xi _{i,j,k}}\\s.t.\\{\begin{array}{lcl}\forall \xi _{i,j,k}\geqq 0\\\forall (d_{i},d_{j})\in r_{i}^{'}\\{\vec {w}}(\Phi (q_{1},d_{i})-\Phi (q_{1},d_{j}))\geqq 1-\xi _{i,j,1}\\...\\{\vec {w}}(\Phi (q_{n},d_{i})-\Phi (q_{n},d_{j}))\geqq 1-\xi _{i,j,n}\\\end{array}}\end{array}}}$

The method ${\displaystyle r'}$ does not provide ranking information of the whole dataset, it's a subset of the full ranking method. So the condition of optimization problem becomes more relax compared with the original Ranking-SVM.

## Reference

1. ^ Joachims, T. (2003), "Optimizing Search Engines using Clickthrough Data", Proceedings of the ACM Conference on Knowledge Discovery and Data Mining
2. ^ Bing Li; Rong Xiao; Zhiwei Li; Rui Cai; Bao-Liang Lu; Lei Zhang; "Rank-SIFT: Learning to rank repeatable local interest points",Computer Vision and Pattern Recognition (CVPR), 2011
3. ^ M.Kemeny . Rank Correlation Methods, Hafner, 1955
4. ^ A.Mood, F. Graybill, and D. Boes. Introduction to the Theory of Statistics. McGraw-Hill, 3edtion,1974
5. ^ J. Kemeny and L. Snell. Mathematical Models in THE Social Sciences. Ginn & Co. 1962
6. ^ Y. Yao. Measuring retrieval effectiveness based on user preference of documents. Journal of the American Society for Information Science, 46(2): 133-145, 1995.
7. ^ R.Baeza- Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison- Wesley-Longman, Harlow, UK, May 1999
8. ^ C. Cortes and V.N Vapnik. Support-vector networks. Machine Learning Journal, 20: 273-297,1995
9. ^ V.Vapnik. Statistical Learning Theory. WILEY, Chichester,GB,1998
10. ^ N.Fuhr. Optimum polynomial retrieval functions based on the probability ranking principle. ACM TRANSACTIONS on Information Systems, 7(3): 183-204
11. ^ N.Fuhr, S.Hartmann, G.Lustig, M.Schwantner, K.Tzeras,and G.Knorz. Air/x - a rule-based multistage indexing system for large subject fields. In RIAO,1991