# Pathfinder network

A method for pruning dense networks to highlight key links

## Rationale

Relationships among a set of elements are often represented as a square matrix with entries representing the relations between all pairs of the elements. Relations such as distances, dissimilarities, similarities, relatedness, correlations, co-occurrences, conditional probabilities, etc., can be represented by such matrices. Such data can also be represented as networks with weighted links between the elements. Such matrices and networks are extremely dense and are not easily apprehended without some form of data reduction or pruning.

A pathfinder network results from applying a pruning method that removes weaker links from a (usually dense) network according to the lengths of alternative paths (see below). It is used as a psychometric scaling method based on graph theory and used in the study of expertise, education, knowledge acquisition, mental models, and knowledge engineering. It is also employed in generating communication networks, software debugging, visualizing scientific citation patterns, information retrieval, and other forms of data visualization. Pathfinder networks are potentially applicable to any problem addressed by network theory.

## Overview

Network pruning aims to highlight the more important links between elements represented in a network. It helps to simplify the collection of connections involved which is valuable in data visualization and in comprehending essential relations among the elements represented in the network.

Several psychometric scaling methods start from pairwise data and yield structures revealing the underlying organization of the data. Data clustering and multidimensional scaling are two such methods. Network scaling represents another method based on graph theory. Pathfinder networks are derived from matrices of data for pairs of entities. Because the algorithm uses distances, similarity data are inverted to yield dissimilarities for the computations.

In the pathfinder network, the entities correspond to the nodes of the generated network, and the links in the network are determined by the patterns of proximities. For example, if the proximities are similarities, links will generally connect nodes of high similarity. When proximities are distances or dissimilarities, links will connect the shorter distances. The links in the network will be undirected if the proximities are symmetrical for every pair of entities. Symmetrical proximities mean that the order of the entities is not important, so the proximity of i and j is the same as the proximity of j and i for all pairs i,j. If the proximities are not symmetrical for every pair, the links will be directed.

## Algorithm

The pathfinder algorithm uses two parameters.

1. The $q$ parameter constrains the number of indirect proximities examined in generating the network. $q$ is an integer between $2$ and $n-1$ , inclusive where $n$ is the number of nodes or items. Shortest paths can have no more than $q$ links. When $q=n-1$ , all possible paths are included.
2. The $r$ parameter defines the metric used for computing the distance of paths (cf. the Minkowski distance). $r$ is a real number between $1$ and $\infty$ , inclusive.

Path distance $d_{p}$ is computed as: $d_{p}=(\sum _{i=1}^{k}l_{i}^{r})^{1/r}$ , where $l_{i}$ is the distance of the $ith$ link in the path and $2\leq k\leq q$ . For $r=1$ , $d_{p}$ is simply the sum of the distances of the links in the path. For $r=\infty$ , $d_{p}$ is the maximum of the distances of the links in the path because $\lim _{r\rightarrow \infty }d_{p}=\max _{i=1}^{k}l_{i}$ . A link is pruned if its distance is greater than the minimum distance of paths between the nodes connected by the link. Efficient methods for finding minimum distances include the Floyd-Warshall algorithm (for $q=n-1$ ) and Dijkstra's algorithm (for any value of $q$ ).

A network generated with particular values of $q$ and $r$ is called a $PFNet(q,r)$ . Both of the parameters have the effect of decreasing the number of links in the network as their values are increased. The network with the minimum number of links is obtained when $q=n-1$ and $r=\infty$ , i.e., $PFNet(n-1,\infty )$ .

With ordinal-scale data (see level of measurement), the $r$ parameter should be $\infty$ because the same $PFNet$ would result from any positive monotonic transformation of the proximity data. Other values of $r$ require data measured on a ratio scale. The $q$ parameter can be varied to yield the desired number of links in the network or to focus on more local relations with smaller values of $q$ .

Essentially, pathfinder networks preserve the shortest possible paths given the data. Therefore, links are eliminated when they are not on shortest paths. The $PFNet(n-1,\infty )$ will be the minimum spanning tree for the links defined by the proximity data if a unique minimum spanning tree exists. In general, the $PFNet(n-1,\infty )$ includes all of the links in any minimum spanning tree.

## Example

Here is an example of an undirected pathfinder network derived from average ratings of a group of biology graduate students. The students rated the relatedness of all pairs of the terms shown, and the mean rating for each pair was computed. The solid blue links are the $PFNet(n-1,\infty )$ (labeled "both" in the figure). The dotted red links are added in the $PFNet(2,\infty )$ . For the added links, there are no 2-link paths shorter than the link distance but there is at least one shorter path with more than two links in the data. A minimal spanning tree would have 24 links so the 26 links in $PFNet(n-1,\infty )$ implies that there is more than one minimum spanning tree. There are two cycles present so there are tied distances in the set of links in the cycle. Breaking each cycle would require removing one of the tied links in each cycle.