A pathfinder network is a psychometric scaling method based on graph theory and used in the study of expertise, knowledge acquisition, knowledge engineering, scientific citation patterns, information retrieval, and data visualization. Pathfinder networks are potentially applicable to any problem addressed by network theory.
Several psychometric scaling methods start from proximity data and yield structures revealing the underlying organization of the data. Data clustering and multidimensional scaling are two such methods. Network scaling represents another method based on graph theory. Pathfinder networks are derived from proximities for pairs of entities.
Proximities can be obtained from similarities, correlations, distances, conditional probabilities, or any other measure of the relationships among entities. The entities are often concepts of some sort, but they can be anything with a pattern of relationships.
In the pathfinder network, the entities correspond to the nodes of the generated network, and the links in the network are determined by the patterns of proximities. For example, if the proximities are similarities, links will generally connect nodes of high similarity. The links in the network will be undirected if the proximities are symmetrical for every pair of entities. Symmetrical proximities mean that the order of the entities is not important, so the proximity of i and j is the same as the proximity of j and i for all pairs i,j. If the proximities are not symmetrical for every pair, the links will be directed.
Here is an example of an undirected pathfinder network derived from average similarity ratings of a group of biology graduate students. The students rated the relatedness of all pairs of the terms shown, and the mean rating for each pair was computed. The network shown is the PFnet(2, ∞).
The pathfinder algorithm uses two parameters.
- The q parameter constrains the number of indirect proximities examined in generating the network. The q parameter is an integer value between 2 and n − 1, inclusive where n is the number of nodes or items.
- The r parameter defines the metric used for computing the distance of paths (cf. the Minkowski distance). The r parameter is a real number between 1 and infinity, inclusive.
A network generated with particular values of q and r is called a PFnet(q, r). Both of the parameters have the effect of decreasing the number of links in the network as their values are increased. The network with the minimum number of links is obtained when q = n − 1 and r = ∞, i.e., PFnet(n − 1, ∞).
With ordinal-scale data (see level of measurement), the r-parameter should be infinity because the same PFnet would result from any positive monotonic transformation of the proximity data. Other values of r require data measured on a ratio scale. The q parameter can be varied to yield the desired number of links in the network.
Essentially, pathfinder networks preserve the shortest possible paths given the data so links are eliminated when they are not on shortest paths. The PFnet(n − 1, ∞) will be the minimum spanning tree for the links defined by the proximity data if a unique minimum spanning tree exists. In general, the PFnet(n − 1, ∞) includes all of the links in any minimum spanning tree.
Further information on pathfinder networks and several examples of the application of PFnets to a variety of problems can be found in:
- Schvaneveldt, R. W. (Ed.) (1990) Pathfinder Associative Networks: Studies in Knowledge Organization. Norwood, NJ: Ablex. The book is out of print. A zipped copy of pdf chapters can be downloaded: zip
A shorter article summarizing pathfinder networks:
- Schvaneveldt, R. W., Durso, F. T., & Dearholt, D. W. (1989). Network structures in proximity data. In G. Bower (Ed.), The psychology of learning and motivation: Advances in research and theory, Vol. 24 (pp. 249–284). New York: Academic Press. pdf
Three papers describing fast implementations of pathfinder networks:
- Guerrero-Bote, V.; Zapico-Alonso, F.; Esinosa-Calvo, M.; Gomez-Crisostomo, R.; Moya-Anegon, F. (2006). "Binary pathfinder: An improvement to the pathfinder algorithm". Information Processing and Management. 42 (6): 1484–1490. CiteSeerX 10.1.1.378.5375. doi:10.1016/j.ipm.2006.03.015.
- Quirin, A; Cordón, O; Santamaría, J; Vargas-Quesada, B; Moya-Anegón, F (2008). "A new variant of the Pathfinder algorithm to generate large visual science maps in cubic time". Information Processing and Management. 44 (4): 1611–1623. doi:10.1016/j.ipm.2007.09.005.
- Quirin, A.; Cordón, O.; Guerrero-Bote, V. P.; Vargas-Quesada, B.; Moya-Anegón, F. (2008). "A Quick MST-based Algorithm to Obtain Pathfinder Networks". Journal of the American Society for Information Science and Technology. 59 (12): 1912–1924. CiteSeerX 10.1.1.331.1548. doi:10.1002/asi.20904.
(The two variants by Quirin et al. are significantly faster. While the former can be applied with q = 2 or q = n − 1 and any value for r, the latter can only be applied in cases where q = n − 1 and r = ∞.)