# h-index

The h-index is an index that attempts to measure both the productivity and impact of the published work of a scientist or scholar. The index is based on the set of the scientist's most cited papers and the number of citations that they have received in other publications. The index can also be applied to the productivity and impact of a group of scientists, such as a department or university or country, as well as a scholarly journal. The index was suggested in 2005 by Jorge E. Hirsch, a physicist at UCSD, as a tool for determining theoretical physicists' relative quality[1] and is sometimes called the Hirsch index or Hirsch number.

## Definition and purpose

h-index from a plot of decreasing citations for numbered papers

The index is based on the distribution of citations received by a given researcher's publications. Hirsch writes:

A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np − h) papers have no more than h citations each.

In other words, a scholar with an index of h has published h papers each of which has been cited in other papers at least h times.[2] Thus, the h-index reflects both the number of publications and the number of citations per publication. The index is designed to improve upon simpler measures such as the total number of citations or publications. The index works properly only for comparing scientists working in the same field; citation conventions differ widely among different fields.

The h-index serves as an alternative to more traditional journal impact factor metrics in the evaluation of the impact of the work of a particular researcher. Because only the most highly cited articles contribute to the h-index, its determination is a simpler process. Hirsch has demonstrated that h has high predictive value for whether a scientist has won honors like National Academy membership or the Nobel Prize. The h-index grows as citations accumulate and thus it depends on the "academic age" of a researcher.

## The h-index across disciplines and career levels

Hirsch suggested (with large error bars) that, for physicists, a value for h of about 12 might be typical for advancement to tenure (associate professor) at major research universities. A value of about 18 could mean a full professorship, 15–20 could mean a fellowship in the American Physical Society, and 45 or higher could mean membership in the United States National Academy of Sciences.[3]

The London School of Economics found that professors in the social sciences had average h-indices ranging from 2.8 (in law) to 7.6 (in economics).[4]

Among the 22 scientific disciplines listed in the Thomson Reuters Essential Science Indicators Citation Thresholds, physics has the second most citations after space science.[5] During the period January 1, 2000–February 28, 2010, a physicist had to receive 2073 citations to be among the most cited 1% of physicists in the world.[5] The threshold for space science is the highest (2236 citations), and physics is followed by clinical medicine (1390) and molecular biology & genetics (1229). Most disciplines, such as environment/ecology (390), have fewer scientists, fewer papers, and fewer citations.[5] Therefore, these disciplines have lower citation thresholds in the Essential Science Indicators, with the lowest citation thresholds observed in social sciences (154), computer science (149), and multidisciplinary sciences (147).[5]

Little systematic investigation has been made on how academic recognition correlates with h-index over different institutions, nations and fields of study. However, Hirsch estimates that after 20 years a "successful scientist" will have an h-index of 20, an "outstanding scientist" an h-index of 40, and a "truly unique" individual an h-index of 60. However, he points out that values of h will vary between different fields.[6]

For the most highly cited scientists in the period 1983–2002, Hirsch identified the top 10 in the life sciences (in order of decreasing h): Solomon H. Snyder, h = 191; David Baltimore, h = 160; Robert C. Gallo, h = 154; Pierre Chambon, h = 153; Bert Vogelstein, h = 151; Salvador Moncada, h =143; Charles A. Dinarello, h =138; Tadamitsu Kishimoto, h =134; Ronald M. Evans, h =127; and Axel Ullrich, h = 120. Among 36 new inductees in the National Academy of Sciences in biological and biomedical sciences in 2005, the median h-index was 57.[1]

## Calculation

The h-index can be manually determined using citation databases or using automatic tools. Subscription-based databases such as Scopus and the Web of Knowledge provide automated calculators. Harzing's Publish or Perish program calculates the h-index based on Google Scholar entries. In July 2011 Google trialled a tool which allows scholars to keep track of their own citations and also produces an h-index and an i10-index.[7] In addition, specific databases, such as the INSPIRE-HEP database can automatically calculate the h-index for researchers working in high energy physics.

Each database is likely to produce a different h for the same scholar, because of different coverage. A detailed study showed that the Web of Knowledge has strong coverage of journal publications, but poor coverage of high impact conferences. Scopus has better coverage of conferences, but poor coverage of publications prior to 1996; Google Scholar has the best coverage of conferences and most journals (though not all), but like Scopus has limited coverage of pre-1990 publications.[8][9] The exclusion of conference proceedings papers is a particular problem for scholars in computer science, where conference proceedings are considered an important part of the literature.[10] Google Scholar has been criticized for producing "phantom citations," including gray literature in its citation counts, and failing to follow the rules of Boolean logic when combining search terms.[11] For example, the Meho and Yang study found that Google Scholar identified 53% more citations than Web of Knowledge and Scopus combined, but noted that because most of the additional citations reported by Google Scholar were from low-impact journals or conference proceedings, they did not significantly alter the relative ranking of the individuals. It has been suggested that in order to deal with the sometimes wide variation in h for a single academic measured across the possible citation databases, one should assume false negatives in the databases are more problematic than false positives and take the maximum h measured for an academic.[12]

Hirsch intended the h-index to address the main disadvantages of other bibliometric indicators, such as total number of papers or total number of citations. Total number of papers does not account for the quality of scientific publications, while total number of citations can be disproportionately affected by participation in a single publication of major influence (for instance, methodological papers proposing successful new techniques, methods or approximations, which can generate a large number of citations), or having many publications with few citations each. The h-index is intended to measure simultaneously the quality and quantity of scientific output.

## Criticism

There are a number of situations in which h may provide misleading information about a scientist's output:[13] (However, most of these are not exclusive to the h-index.)

• The h-index does not account for the number of authors of a paper. In the original paper, Hirsch suggested partitioning citations among co-authors. Even in the absence of explicit gaming, the h-index and similar indexes tend to favor fields with larger groups, e.g. experimental over theoretical.
• The h-index does not account for the typical number of citations in different fields. Different fields, or journals, traditionally use different numbers of citations.
• The h-index discards the information contained in author placement in the authors' list, which in some scientific fields is significant.[14][15]
• The h-index is bounded by the total number of publications. This means that scientists with a short career are at an inherent disadvantage, regardless of the importance of their discoveries. For example, Évariste Galois' h-index is 2, and will remain so forever. Had Albert Einstein died after publishing his four groundbreaking Annus Mirabilis papers in 1905, his h-index would be stuck at 4 or 5. This is also a problem for any measure that relies on the number of publications. However, as Hirsch indicated in the original paper, the index is intended as a tool to evaluate researchers in the same stage of their careers. It is not meant as a tool for historical comparisons.
• The h-index does not consider the context of citations. For example, citations in a paper are often made simply to flesh out an introduction, otherwise having no other significance to the work. h also does not resolve other contextual instances: citations made in a negative context and citations made to fraudulent or retracted work. This is also a problem for regular citation counts.
• The h-index gives books the same count as articles making it difficult to compare scholars in fields that are more book-oriented such as the humanities.
• The h-index does not account for confounding factors such as "gratuitous authorship", the so-called Matthew effect, and the favorable citation bias associated with review articles. Again, this is a problem for all other metrics using publications or citations.
• The h-index has been found to have slightly less predictive accuracy and precision than the simpler measure of mean citations per paper.[16] However, this finding was contradicted by another study.[17]
• The h-index is a natural number which reduces its discriminatory power. Ruane and Tol therefore propose a rational h-index that interpolates between h and h + 1.[18]
• The h-index can be manipulated through self-citations,[19][20] and if based on Google Scholar output, then even computer-generated documents can be used for that purpose, e.g. using SCIgen.[21]
• The h-index does not provide a significantly more accurate measure of impact than the total number of citations for a given scholar. In particular, by modeling the distribution of citations among papers as a random integer partition and the h-index as the Durfee square of the partition, Yong[22] arrived at the formula $h\approx 0.54\sqrt N$, where N is the total number of citations, which, for mathematicians, turns out to provide a highly accurate approximation of h-index in most cases.

## Alternatives and modifications

Various proposals to modify the h-index in order to emphasize different features have been made.[23][24][25][26][27][28] As the variants have proliferated, comparative studies have become possible and they demonstrate that most proposals do not differ significantly from the original h-index as they remain highly correlated with it.[29]

• An individual h-index normalized by the average number of co-authors in the h-core has been proposed.[23] It was found that the distribution of the h-index, although it depends on the field, can be normalized by a simple rescaling factor. For example, assuming as standard the hs for biology, the distribution of h for mathematics collapse with it if this h is multiplied by three, that is, a mathematician with h = 3 is equivalent to a biologist with h = 9. This method has not been readily adopted, perhaps because of its complexity. It might be simpler to divide citation counts by the number of authors before ordering the papers and obtaining the h-index, as originally suggested by Hirsch.
• The m-index is defined as h/n, where n is the number of years since the first published paper of the scientist;[1] also called m-quotient.[30][31]
• A generalization of the h-index and some other indices that gives additional information about the shape of the author's citation function (heavy-tailed, flat/peaked, etc.) has been proposed.[32]
• A successive Hirsch-type-index for institutions has also been devised.[33][34] A scientific institution has a successive Hirsch-type-index of i when at least i researchers from that institution have an h-index of at least i.
• Three additional metrics have been proposed: h2 lower, h2 center, and h2 upper, to give a more accurate representation of the distribution shape. The three h2 metrics measure the relative area within a scientist's citation distribution in the low impact area, h2 lower, the area captured by the h-index, h2 center, and the area from publications with the highest visibility, h2 upper. Scientists with high h2 upper percentages are perfectionists, whereas scientists with high h2 lower percentages are mass producers. As these metrics are percentages, they are intended to give a qualitative description to supplement the quantitative h-index.[35]
• The g-index can be seen as the h-index for an averaged citations count.[36]
• It has been argued that "For an individual researcher, a measure such as Erdős number captures the structural properties of network whereas the h-index captures the citation impact of the publications. One can be easily convinced that ranking in coauthorship networks should take into account both measures to generate a realistic and acceptable ranking." Several author ranking systems such as eigenfactor (based on eigenvector centrality) have been proposed already, for instance the Phys Author Rank Algorithm.[37]
• The c-index accounts not only for the citations but for the quality of the citations in terms of the collaboration distance between citing and cited authors. A scientist has c-index n if n of [his/her] N citations are from authors which are at collaboration distance at least n, and the other (Nn) citations are from authors which are at collaboration distance at most n.[38]
• An s-index, accounting for the non-entropic distribution of citations, has been proposed and it has been shown to be in a very good correlation with h.[39]
• The e-index, the square root of surplus citations for the h-set beyond h2, complements the h-index for ignored citations, and therefore is especially useful for highly cited scientists and for comparing those with the same h-index (isohindex group).[40][41]
• Because the h-index was never meant to measure future publication success, recently, a group of researchers has investigated the features that are most predictive of future h-index. It is possible to try the predictions using an online tool.[42] However, later work has shown that since h-index is a cumulative measure, it contain intrinsic auto-correlation that led to significant overestimation of its predictability. Thus, the true predictability of future h-index is much lower compared to what has been claimed before.[43]
• The h-index has been applied to Internet Media, such as YouTube channels. The h-index is defined as the number of videos with ≥ h × 105 views. When compared with a video creator's total view count, the h-index and g-index better capture both productivity and impact in a single metric.[44]
• The i10-index is a measure developed by Google Scholar. It is the number of publications with at least ten citations.
• The h-index has been shown to have a strong discipline bias. However, a simple normalization $h/\langle h \rangle_d$ by the average h of scholars in a discipline d is an effective way to mitigate this bias, obtaining a universal impact metric that allows to compare scholars across different disciplines.[45] Of course this method does not deal with academic age bias.

## References

1. ^ a b c Hirsch, J. E. (15 November 2005). "An index to quantify an individual's scientific research output". PNAS 102 (46): 16569–16572. arXiv:physics/0508025. Bibcode:2005PNAS..10216569H. doi:10.1073/pnas.0507655102. PMC 1283832. PMID 16275915.
2. ^ McDonald, Kim (8 November 2005). "Physicist Proposes New Way to Rank Scientific Output". PhysOrg. Retrieved 13 May 2010.
3. ^ Peterson, Ivars (December 2, 2005). "Rating Researchers". Science News. Retrieved 13 May 2010.
4. ^ Key Measures of Academic Influence
5. ^ a b c d "Citation Thresholds". Science Watch. May 1, 2010. Retrieved 13 May 2010. |first1= missing |last1= in Authors list (help)
6. ^ Meho, L.I. (2007) The rise and rise of citation analysis. Physics World, January 2007, 32-36
7. ^ Google Scholar Citations Help, retrieved 2012-09-18.
8. ^ Meho, L. I.; Yang, K. (2007). "Impact of Data Sources on Citation Counts and Rankings of LIS Faculty: Web of Science vs. Scopus and Google Scholar". Journal of the American Society for Information Science and Technology 58 (13): 2105–2125. doi:10.1002/asi.20677.
9. ^ Meho, L. I. and Yang, K (23 December 2006). "A New Era in Citation and Bibliometric Analyses: Web of Science, Scopus, and Google Scholar". arXiv:cs/0612132 (preprint of paper published as 'Impact of data sources on citation counts and rankings of LIS faculty: Web of Science versus Scopus and Google Scholar', in Journal of the American Society for Information Science and Technology, Vol. 58, No. 13, 2007, 2105–2125)
10. ^ Meyer, Bertrand; Choppy, Christine; Staunstrup, Jørgen; Van Leeuwen, Jan (2009). "Research Evaluation for Computer Science". Communications of the ACM 52 (4): 31–34. doi:10.1145/1498765.1498780.
11. ^ Jacsó, Péter (2006). "Dubious hit counts and cuckoo's eggs". Online Information Review 30 (2): 188–193. doi:10.1108/14684520610659201.
12. ^ Sanderson, Mark (2008). "Revisiting h measured on UK LIS and IR academics". Journal of the American Society for Information Science and Technology 59 (7): 1184–1190. doi:10.1002/asi.20771.
13. ^ Wendl, Michael (2007). "H-index: however ranked, citations need context". Nature 449 (7161): 403. Bibcode:2007Natur.449..403W. doi:10.1038/449403b. PMID 17898746.
14. ^ Sekercioglu, Cagan H. (2008). "Quantifying coauthor contributions". Science 322 (5900): 371. doi:10.1126/science.322.5900.371a. PMID 18927373.
15. ^ Zhang, Chun-Ting (2009). "A proposal for calculating weighted citations based on author rank". EMBO reports 10 (5): 416–7. doi:10.1038/embor.2009.74. PMC 2680883. PMID 19415071.
16. ^ Sune Lehmann, Jackson, Lautrup (2006). "Measures for measures". Nature 444 (7122): 1003–4. Bibcode:2006Natur.444.1003L. doi:10.1038/4441003a. PMID 17183295.
17. ^ Hirsch J. E. (2007). "Does the h-index have predictive power?". PNAS 104 (49): 19193–19198. arXiv:0708.0646. Bibcode:2007PNAS..10419193H. doi:10.1073/pnas.0707962104. PMC 2148266. PMID 18040045.
18. ^ Frances Ruane & Richard S. J. Tol; Tol (2008). "Rational (successive) h -indices: An application to economics in the Republic of Ireland". Scientometrics 75 (2): 395–405. doi:10.1007/s11192-007-1869-7.
19. ^ Christoph Bartneck & Servaas Kokkelmans; Kokkelmans (2011). "Detecting h-index manipulation through self-citation analysis". Scientometrics 87 (1): 85–98. doi:10.1007/s11192-010-0306-5. PMC 3043246. PMID 21472020.
20. ^ Emilio Ferrara & Alfonso Romero; Romero (2013). "Scientific impact evaluation and the effect of self-citations: Mitigating the bias by discounting the h-index". Journal of the American Society for Information Science and Technology 64 (11): 2332–2339. doi:10.1002/asi.22976.
21. ^ Labbé, Cyril (2010). "﻿Ike Antkare one of the great stars in the scientific firmament﻿" (PDF). Laboratoire d'Informatique de Grenoble RR-LIG-2008 (technical report) (Joseph Fourier University).
22. ^ Alexander Yong, Critique of Hirsch’s Citation Index: A Combinatorial Fermi Problem, Notices of the American Mathematical Society, vol. 61 (2014), no. 11, pp. 1040-1050
23. ^ a b Batista P. D. et al. (2006). "Is it possible to compare researchers with different scientific interests?". Scientometrics 68 (1): 179–189. doi:10.1007/s11192-006-0090-4.
24. ^ Sidiropoulos, Antonis; Katsaros, Dimitrios; Manolopoulos, Yannis (2007). "Generalized Hirsch h-index for disclosing latent facts in citation networks". Scientometrics 72 (2): 253–280. doi:10.1007/s11192-007-1722-z.
25. ^ Jayant S Vaidya (December 2005). "V-index: A fairer index to quantify an individual's research output capacity". BMJ 331: 339–c–1340–c.
26. ^ Katsaros D., Sidiropoulos A., Manolopous Y., (2007), Age Decaying H-Index for Social Network of Citations in Proceedings of Workshop on Social Aspects of the Web Poznan, Poland, April 27, 2007
27. ^ Anderson, T.R.; Hankin, R.K.S and Killworth, P.D. (2008). "Beyond the Durfee square: Enhancing the h-index to score total publication output". Scientometrics 76 (3): 577–588. doi:10.1007/s11192-007-2071-2.
28. ^ Baldock, C.; Ma, R.M.S and Orton, C.G.; Orton, Colin G. (2009). "The h index is the best measure of a scientist's research productivity". Medical Physics 36 (4): 1043–1045. Bibcode:2009MedPh..36.1043B. doi:10.1118/1.3089421. PMID 19472608.
29. ^ Bornmann L., et al.,(2011), A multilevel meta-analysis of studies reporting correlations between the h-index and 37 different h-index variants, J. of Informetrics, Vol.5, Iss. 3, July 2011, p.346-59
30. ^ Anne-Wil Harzing (2008-04-23). "Reflections on the h-index". Retrieved 2013-07-18.
31. ^ von Bohlen und Halbach O (2011). "How to judge a book by its cover? How useful are bibliometric indices for the evaluation of "scientific quality" or "scientific productivity"?". Annals of Anatomy 193 (3): 191–6. doi:10.1016/j.aanat.2011.03.011. PMID 21507617.
32. ^ Gągolewski, M.; Grzegorzewski, P. (2009). "A geometric approach to the construction of scientific impact indices". Scientometrics 81 (3): 617–634. doi:10.1007/s11192-008-2253-y.
33. ^ Kosmulski, M. (2006). "I—a bibliometric index". Forum Akademickie 11: 31.
34. ^ Prathap, G. (2006). "Hirsch-type indices for ranking institutions' scientific research output". Current Science. 91 (11): 1439.
35. ^ Bornmann, Lutz; Mutz, Rüdiger; Daniel, Hans-Dieter (2010). "The h index research output measurement: Two approaches to enhance its accuracy". Journal of Informetrics 4 (3): 407. doi:10.1016/j.joi.2010.03.005.
36. ^ Egghe, Leo (2013). "Theory and practise of the g-index". Scientometrics 69: 131. doi:10.1007/s11192-006-0144-7.
37. ^ Kashyap Dixit, S Kameshwaran, Sameep Mehta, Vinayaka Pandit, N Viswanadham, (February 2009). "Towards simultaneously exploiting structure and outcomes in interaction networks for node ranking". IBM Research Report R109002.; see also Kameshwaran, Sampath; Pandit, Vinayaka; Mehta, Sameep; Viswanadham, Nukala; Dixit, Kashyap (2010). "Outcome aware ranking in interaction networks". "Proceedings of the 19th ACM international conference on Information and knowledge management - CIKM '10". p. 229. doi:10.1145/1871437.1871470. ISBN 9781450300995.
38. ^ Bras-Amorós, M.; Domingo-Ferrer, J.; Torra, V; (2011). "A bibliometric index based on the collaboration distance between cited and citing authors". Journal of Informetrics 5 (2): 248–264. doi:10.1016/j.joi.2010.11.001. edit
39. ^ Silagadze, Z. K. (2010). "Citation entropy and research impact estimation". Acta Phys. Polon. B 41: 2325–2333. arXiv:0905.1039v2. Bibcode:2009arXiv0905.1039S.
40. ^ Zhang, Chun-Ting (2009). Joly, Etienne, ed. "The e-Index, Complementing the h-Index for Excess Citations". PLoS ONE 4 (5): e5429. doi:10.1371/journal.pone.0005429. PMC 2673580. PMID 19415119.
41. ^ Dodson, M.V. (2009). "Citation analysis: Maintenance of h-index and use of e-index". Biochemical and Biophysical Research Communications 387 (4): 625–6. doi:10.1016/j.bbrc.2009.07.091. PMID 19632203.
42. ^ Acuna, Daniel E.; Allesina, Stefano; Kording, Konrad P. (2012). "Future impact: Predicting scientific success". Nature 489 (7415): 201–2. doi:10.1038/489201a. PMC 3770471. PMID 22972278.
43. ^ Penner, Orion; Pan, Raj K.; Petersen, Alexander M.; Kaski, Kimmo; Fortunato, Santo (2013). "On the Predictability of Future Impact in Science". Scientific Reports 3: 3052. doi:10.1038/srep03052. PMC 3810665. PMID 24165898.
44. ^ Hovden, R. (2013). "Bibliometrics for Internet media: Applying the h-index to YouTube". Journal of the American Society for Information Science and Technology 64 (11): 2326. arXiv:1303.0766. doi:10.1002/asi.22936.
45. ^ Kaur, Jasleen; Radicchi, Filippo; Menczer, Filippo (2013). "Universality of scholarly impact metrics". Journal of Informetrics 7 (4): 924–932. arXiv:1305.6339. doi:10.1016/j.joi.2013.09.002.