Talk:H-index

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Merger with Hirsch number[edit]

As the creator of Hirsch number, the merger is fine with me. I'm not sure what article name will server future readers best but as long as there is a redirect from the eliminated name, I guess it doesn't much matter. Is there an automated way to update all links to the article that is merged away? Alison Chaiken 17:48, 3 June 2006 (UTC)

Yeah, I think there will be redirects for all of them and that there will be no broken links. I'm not sure what the best name would be either. H-index is used in business, but that is what Hirsch called it originally, so i don't know which to pick. (Rajah 04:46, 11 June 2006 (UTC))

I chose "Hirsch number" because that's what the original _Science_ article I read used. "h-index" has way more google hits than "hirsch number" but undoubtedly a lot of the "h index" references are to something else. Is there any way to tell which terms users have most frequently looked up at the Wikipedia site? That is, are Wikipedia HTTP statistics gathered anywhere? Consulting those would seem to me to be the best way to pick an article name. Alison Chaiken 17:05, 11 June 2006 (UTC)
I went to the Science web article linked from your Hirsch number page and it quote Hirsch himself referring to it as "h-index". I say we go with h-index. However, the h(erfindahl)-index of finance might confuse, so there should definitely be a disambiguation page as well. And maybe just make the main article "H-index (science)" etc. (Rajah 18:48, 11 June 2006 (UTC))
Merge completed. Still need to rewrite some of the material though. --Rajah 22:37, 26 June 2006 (UTC)

active or passive[edit]

In other words, a scholar with an index of h has published h papers with at least h citations each. Active voice or passive voice (did the scholar's papers cite h other papers or were the scholar's papers citated in h other papers)? --Abdull 19:34, 8 June 2006 (UTC)

Yes, that's correct. The strict defintion on the page though precisely defines the number from below as well. i.e. in your defintion an author of 100 papers with 100 citations each could have a hirsch number from 0 to 100 inclusive. You can add a "plain english" translation if you'd like. (Rajah 04:46, 11 June 2006 (UTC))


Can somebody fix the middle part? It's shooting across the page. No wordwrap? -anon

what middle part are you talking about? --Rajah 20:31, 21 November 2006 (UTC)

matlab script[edit]

(should this be here? advertising?) by anon

i think it's fine as it is a free script. --Rajah 20:30, 21 November 2006 (UTC)

lists of scientists[edit]

I do not see the point of a list of physicists according to h index in Web of knowledge, when there is one according to Web of science, which is the part of WoK that includes all of physics. But the lists are different, so there must be some purpose to it--could it be explained? And could somebody source or explain " a moderately productive scientist should have an h equal to the number of years of service "

DGG 19:58, 3 December 2006 (UTC)
I second this. Separating WoS and WoK does not make any sense, besides, the latter list just does not make any sense. h=54 or even 71 is by far not spectacular enough to deserve publishing the corresponding names in Wikipedia.
mdina

moved from H-index/Comments[edit]

The descriptive part of the article is relatively OK, although there is room for improvement (see below). However, the main problem with this article is that 3/4 of the article is composed by the lists of scientists with allegedly large h-indices, which they are not.

The way the article is written siggests the listed people are THE people with the highest h-index in their respective field, which is ridiculous. In reality, each list has a few randomely selected high-h indiduals on top and the rest is totally random - just the name the compliler happened to be familiar with. As a result, physicists with an h-index of 30 appear in the list of "scientist with high h", while in reality there are many hundreds, if not thousands physicists with h>30. On the other hand, some truly highly cited physicsts (juat for example, A.A. Tseytlin, h=62, in the first, high-energy list, or A. Zunger, h=76, in the second, condensed matter list). Either these lists should be completely removed, or just 3-4 scientists from the top of each should be listed, with a very clear explanation that these are not THE most h-cited scientists, but SOME highly h-cited scientists.

Second, at least one thing needs to be aded to the criticsms section: take two scientists with the same h-index, say, h=30, but one has 10 papers cited more than 200 times, and the other has none. Clearly scietific output of the former is more valuable. Several recipies to correct for that have been proposed, but none has gain a universal recognition.

Finally, Google Scholar is not a reliable source fir computing the h-index, for it often indexes the same paper twice: as a preprint and as a paper, and also misses many of the journals that ISI indexes. Only ISI WOK can be used to reliably compute the h-index. —The preceding unsigned comment was added by Chris 73 (talkcontribs) 17:48, 13 January 2007 (UTC).

I altogether agree--they are a proposed measure that has won considerable acceptance, primarily because of dissatisfaction with the incorrect use of impact factors--and sometimes from dissatisfaction with the results of correct use of impact factors. Like most measures, they are more reliable in the middle of the range--Impact factors, when used for journals, have the advantage of being more sensitive at the ends of the ranges because of the Zipfian distribution of citations.
The h-index per se is a pretty good gauge of scientific performance. I found it to be much more reliable than all other popular gauges, such as the total number of publication, total number of citations, number of citations weighted with the impact factor, or the "100 index" (the number of papers cited more than 100 times). However, I found that using GS produces unreliable values for the h-index. Contrary to your assertion, I found the h-index and its time derivative to be very sensitive precisely at the upper end of the range - every single point above 30 indicates a substantial progress. mdina
I am not sure whether the GS algorithm takes repeated hits into account--do you know of a comparison with WOK results?
I did such a comparison myself, and the results were highly unsatisfactory, with deviations of 2-4 points in the range of h=35 (an that's a lot!). Errors go both ways, because GS does not index some sources that WOK does, but indexes preprints. mdina
  • In any case, these lists do not belong here, and I wonder where to put them. I do not want to give them the respectability of separate article status, and WP does not do long footnotes. The only reason I did not remove them myself months ago is that I have made a number of skeptical comments here and elsewhere about h factors. DGG 21:46, 13 January 2007 (UTC)
I agree that the lists were way too long. I've just cut them down to five items per list. Rrelf 21:31, 14 January 2007 (UTC)

I don't see much use for these (rather arbitrary) lists at all and think someone should remove them. Crusio 10:51, 23 April 2007 (UTC)

''h''-index or <math>h</math>-index?[edit]

Do you think we should write h-index (''h''-index) or -index (<math>h</math>-index) in the text? It currently uses the former, but I think that the latter looks quite a bit better. It is a little more cumbersome to write, however. In any case, it is important that we keep it consistent. Rrelf 21:17, 14 January 2007 (UTC)

Usefulness/Criticism?[edit]

It could also be argued that the number isn't useful when it should be, that is, when scientists are first appling for jobs? Maybe it should be discussed when the h-index is useful? Kaw in stl 19:42, 16 January 2007 (UTC)

There are people who have suggested this use, but in the context of deprecating it. If you can find a source the other way, put it in together with one on the other side.DGG 19:48, 16 January 2007 (UTC)

The article says that hirsch is a high-energy physicist. This is incorrect. He works in theoretical condensded matter, and is known for his criticism of the BCS mechanism for superconductivity. — Preceding unsigned comment added by 128.227.53.143 (talk) 19:33, 9 June 2012 (UTC)

Much too many irrelevant names[edit]

Indiscriminate copying of verious list from unreliable source makes a disservice to the community. The second list, "Based on the ISI Web of Science", makes sense as long as only the solid state physicists are concerned. One may believe that if there are others with h>90, they are only one or two. Note that between Cardona and Heeger there is only a 17 point gap. On the contrary, the first list is already suspicious: it is unbelievable that Witten's index grew by 22 points in 4 months, nor that between the #1 and the #5 high-energy physicist there is a gap of 47 point! These numbers are bogus.

The "Based on the ISI Web of Knowledge" list is a joke. WoK is no different from woS, and 54-71 is not large enough to warrant mentioning in wikipedia. Knowing nothing about ecomomy, I was able, using WoK and ISI Essential Scientic Indicators to find an aconomist, J. Tirole, with h=46, that is, 30% higher than the alleged highest-h economist, Andrei Shleifer!

If there will be no objections here, I am going to delete most of the names from this article.

I think having some list illustrates the method, but from my previous comments you will realize that I do not consider the method valid, and am therefore not the one to judge either way. These lists have appeared on the internet, probably in stable places. Why not simply link to them? DGG 05:28, 23 January 2007 (UTC)
This is fine with me; Unfortunately, most of these lists do not have URLs listed, and others are dislaimed at the original web pages as random entertainment of compilers with no claims at actual accuracy. I am going to verify the first list with WoK, add a proper disclaimer and delete the rest. —The preceding unsigned comment was added by Mdina (talkcontribs) 23:55, 25 January 2007 (UTC).

Comment: the previous longer lists may have had some value as tracking lists for Wikipedia, to identify which highly rated scientists did and did not yet have entries on WP. This useful aspect is lost if the lists are deleted.

In some cases the lists also identified scientists of the first rank of importance with notably low H-index scores -- a valuable check against uncritical overenthusiasm for the measure. Jheald 09:59, 26 January 2007 (UTC)

The comments based on low ranking scientists would have been relevant had all scientists in the field been covered, but only the top few percent were included in the first place. If a scientist is defined as someone who publishes a scientific article, since about half of them never publish another, the average h for a field is 2. By the time we get to h values of 20 we are already talking about the top 5 or 10%. some of the above comments are unduly restrictive in terms of WP coverage. DGG 02:03, 3 February 2007 (UTC)DGG 02:04, 3 February 2007 (UTC)

I agree that the lists are too long for the average reader and in proportion to the article as a whole. Maybe some lists should be outsourced into a separate page and linked in. Especially the list limited to Stanford and Berkeley physicists is overly specific and should be removed. Jakob Suckale 13:58, 2 October 2007 (UTC)

Inspiration[edit]

The h-index is an original idea, but is Hirsch the first to propose that type of construct? Here is something about Sir Arthur Eddington: "His cycling record was coded in a simple Eddington number E, where E was the number of days in which he had cycled more than E miles" ( J.Barrow, The Constants of Nature (ch.5) p.83) al 09:49, 4 February 2007 (UTC)

Error in Description[edit]

I believe the following summary statement from the article is in error:

In other words, a scholar with an index of h has published h papers with at least h citations each

The point is that others have published h papers citing the author's papers at least h times each, not that the author has published h papers. I think the statement is confused about the meaning of 'citation'. —The preceding unsigned comment was added by 71.202.248.81 (talk) 14:44, 5 February 2007 (UTC).

sorry, but that does not make sense: 50 people have published 50 papers citing the work of author X. The formulation you propose would mean that each of the 50 papers would contain 50 citations. The original sentence means not that the authors 50 papers each have 50 citations, but that the re ar 50 papers, each of which has been cited by others 50 times or more. DGG 02:57, 6 February 2007 (UTC)

Search reliability[edit]

After trying some of the software I added a paragraph on the (un) reliability of automated citation counts. I am not sure how reliable the numbers are, but my hand count using web of science made me a 25, and automated processes claimed I was a 12. I have also seen this count issue arise in citation reports in tenure proposals. George Phillies

This counting issue is probably more important in disciplines where more work is published in the form of books or book chapters. My h-index comes out exactly correct at 25 in ISI's WoS. Of course, that does not tae into account any citations to my work made in books or chapters. Crusio 10:09, 27 March 2007 (UTC)

m value and v-index[edit]

Somebody anonymous has added comments on an "m value" and "v-index". This should be explained or otherwise removed, I think. Crusio 12:07, 27 March 2007 (UTC)

POV in Criticism section[edit]

I've put a POV template towards the end of the h-index#Criticism section because the following text appears to have an axe to grind. However, the points made there could be legitimate — if written in a neutral way, and certainly with sources. Unfortunately, as it stands it appears that the questionable content represents the author's own criticisms of the h-index. Ryan Reich 00:06, 6 May 2007 (UTC)


Alternative Introduction[edit]

A change to the introductory sentence. I argue that this sentence, which currently reads as :

“The h-index is an index for quantifying the scientific productivity of scientists based on their publication record. It was suggested in 2005 by Jorge E. Hirsch of the University of California, San Diego.”

Should actually be rephrased more along the following lines :

"The h-index is an index for attempting to quantify scientific productivity based upon individual publication records (and not, for example, competition-based or exam based performance). "

The advantage of this approach is that it avoids misleading the reader into believing that the index is either a widely respected or even widely applicable indexing measure. It also shows that very important aspects of individual performance (such as competition performance or examination performance) are not taken into account with trying to quantify scientific productivity as so forth.

Of course, you could argue that the sentence was short before, though it still is – except now it adds a little information concerning the possible future use of the index. It would also be a good idea to discuss whether or not the actual use of the index is controversial. If it is controversial, then the page should be tagged – especially if the index is used to determine government funding allocation and the like.

The part about the guy from San Diego has been cut out. It probably isn't relevant to what the h-index actually *is*.

ConcernedScientist 22:44, 7 May 2007 (UTC)

My first impression is that, regardless of your opinion of the h-index itself, the identity of its creator is certainly relevant to this article. Your argument implies that you would consider justified the removal of all attribution to the creator of any concept, since for the most part their identities are irrelevant to the concepts themselves. My second impression is that although your proposal appears to advance the cause of neutrality, making a statment like "(and not, for example, competition-based or exam based performance)" is making a claim about both of those two "metrics" which is at least as loaded as the original one made about the h-index. Simply saying "The h-index is an index for attempting to quantify scientific productivity based upon individual publication records" gets your point across (the h-index is not universally accepted for what it claims to accomplish) without making additional, more insidious claims. However, I feel personally as though saying that the h-index "attempts to quantify" in some way minimizes it. Why don't we say instead that "The h-index rates a scientist's productivity based on the number of their most influential papers."? A rating is just what you want: an attempted quantification, and in addition, it gets across the gist of the index in the first sentence. We can put the bit about Hirsch in the second sentence, since as you say it's not relevant to the actual index...though it is relevant to the article :) Ryan Reich 23:23, 7 May 2007 (UTC)


The h-index does actually quantify scientific productivity (turns scientific productivity into a number), it does not simply "attempt" to do so. If the quantity is relevant as a measure or not is another discussion and does not change the fact that the h-index is a quantification of scientific productivity. Another way to quantify scientific productivity would be to count the papers written by the scientist. Yet another way to quantify scientific productivity would be counting all papers read by the scientist. Neither of these quantifications are very good measures of scientific productivity (or its quality), but they would be quantifications nevertheless. If we want to convey the information that the h-index is not a widely accepted index, it is better if we simply write that.
I do not see the need to say what the h-index is not based on. There are many things that the h-index is not based on (e.g., number of dollars in funded research, number of conferences attended, number of journal articles reviewed, number of hours spent working as a journal editor, and so on).
One of the reasons for saying who came up with the index is to give the reader an insight into why it is called as it is.
I'm reediting the intro so that the name of the index and its connection to the inventor is more apparent. I'm also editing the sentence in accordance with my discussion above. Furthermore, I think that the phrase "publication record" could be expanded to something that better conveys what the h-index is and I'll attempt to address that too. Rrelf 03:41, 8 May 2007 (UTC)
I like what you said, I cleaned it a little, and I added a bit about use for groups (though I wonder if it is not too much detail for an intro paragraph,) . I am also not yet sure about the wording of the last sentence. It is not universally accepted, but it is accepted to some extent as one of the frequently used metrics. I'm not sure I'm happy about that, but that's just my 2 cents and doesn't count. DGG 05:19, 8 May 2007 (UTC)
Nice! I changed the order of the sentences slightly and got a better flow. Rrelf 08:23, 8 May 2007 (UTC)

Prolificity/productivity or impact?[edit]

In its current form, the article states that the h-index is a measure of a researchers productivity. I do not really agree. A very productive researcher, cranking out a new paper weekly, could still have a very low h-index if nobody cites all these articles. On the other hand, as the article notes, the h-index has an upper bound defined by the number of articles that someone has produced. Nevertheless, it appears to me that the h-index is more a measure of the impact of the research someone has done than of this person's productivity. Before making any changes though, I'd like to see what other people think about this.

--Crusio 19:22, 4 September 2007 (UTC)


I tend to agree but with some caution. One possible reading is that h "quantifies productivity" in terms of good papers i.e. only papers satisfying some condition are counted. This is different from simple productivity measured by the overall number of papers. An other reading is that h shows how good are your top papers i.e. having an index h means that your best papers have been quoted at least h times. So it says something about the impact of your work. Non-quoted papers are not really counted, so a modest writer of junk cannot be distinguished from a terribly prolific one. The author of few widely quoted papers however would be judged solely by their number. One is lead to conclude that the index quantifies 'good' productivity which is to say that it also qualifies. Of course, the interesting feature of the h-index is its ambiguity: for standard cases it can be seen either as a number of papers or as a number of citations. In non standard cases the ambiguity breaks down. Anyway the word productivity is repeated 4 times in the first 6 lines which gives it undue emphasis.al 11:24, 8 September 2007 (UTC)

Good point. I have slightly modified the introductory paragraph. Have a look whether you like this better. --Crusio 15:12, 8 September 2007 (UTC)
Better is the enemy of good: I also attempted to rewrite. If you disagree and decide to revert, please insert a "both" i.e. 'based on both the number... and...'. The other reason were the 5 'science' words in the first 4 lines. Probably someone will object to the adjective 'hybrid' but I like it; 'his/her' is awkward but pc. al 10:31, 9 September 2007 (UTC)

Not widely accepted[edit]

The following phrase was removed: "The h-index is currently not a widely accepted measure of scientific productivity, but only one of a number of possible indices.[citation needed]" I think this sentence should be left and the [citation needed] removed. I don't think that such a general statement needs any citation. If the statement had said that the -index was a widely accepted metric, then it would have needed a citation. Rrelf 08:07, 12 September 2007 (UTC)

Well, since we have no idea what goes on in most departmental hiring/tenure committee meetings, who is to say that the h-index is not used? Furthermore, the h-index is only a formalization of a process that such committees have used for a long time; "Dr Fulani sure has a lot of citations." "Yes, but has only published the one paper that anybody cares about." SolidPlaid 08:45, 12 September 2007 (UTC)
I am sure the -index is being used, but the sentence did not say anything about whether or not it was used. It only said that it was "not a widely accepted measure of scientific productivity" and that it is "only one of a number of possible indices". It can very well be used without being widely accepted. Both statements are quite obvious though (both statements are true for almost all imaginable indices) and therefore I don't think they need a citation. I think, however, that they should be in the article as they are very useful for conveying that the -index is not the standard metric nor the most widely accepted metric for scientific productivity. Rrelf 12:19, 12 September 2007 (UTC)
I re-inserted citation needed. It's an empirical question whether the h-index is or is not widely accepted. If the article is going to assert that it's not widely accepted, there needs to be some evidence for the claim. It's not obvious at all. In my university the h-index is an explicit part of the hiring and promotion evaluation process.--Cooper24 (talk) 11:14, 21 January 2008 (UTC)

In the same vein, I removed the sentence stating that the h-index had yet to replace other bibliographic indices. First it is not true, at least in some institution. For instance in my department, group leaders are evaluated and compared using h-index and h-index trajectories. But even if it was "true", there would be no way of proving it. Plus the sentence did not bring any useful information. Nicolas Le Novere (talk) 10:21, 8 May 2008 (UTC)

weaknesses of h-index[edit]

I have tried to understand how exactly h-index is computed. It appears to me that there is another weakness of h-index that is not discussed in the article.

H-index relies on information from the Web of Science. However, the Web of Science itself is not good at dealing with citations of several versions of the same paper, particularly pre-publication versions and the published version. At least in my field, mathematics, it is common for a prepublication version of the paper to circulate in a preprint form. In fact, there are often more than pre-publication versions of the same paper and some-times the title changes a little as well. It is rather common to cite a preprint version of the paper (since the journal publication speed in math is rather slow) and very many of such references appear in print. Thus it is rather common for a single paper to appear in the person's Web of Science record several times (as many as 4), corresponding to different versions, with the total number of citations divided between them. This is exacerbated by the fact that there is no uniform style of citing preprints and even the same preprint version can be cited in several different ways, further splitting the Web of Science record of the author. This often produces the following effect. Say there is a paper for which the total number of citations is 25, with 10 citations for the published version and 7 and 8 for each of the two preprint versions. In the Web of Science record this paper will appear as 3 separate entries with citation hits of 10, 7, 8. I looked up my own record and this splitting phenomenon seems to be very pronounced. My understanding is that this has a rather substantial effect on the h-index value.

I was wondering if anyone has seen any references discussing this aspect of h-index.... Regards, Nsk92 13:28, 13 September 2007 (UTC)

I haven't seen any references about this, but obviously it happens. Also, WOS only counts certain types of publications (articles in covered journals). Books, book chapters, or articles in non-covered journals are not counted. If somebody has significant amounts of citations to such items, the h-index will have to be adjusted manually. However, isn't this always the case with automatic procedures, that one has to look at them carefully? A more serious problem, perhaps, is that citations from the non-covered items listed above to covered items are not listed in any database. This may cause a considerable bias in fields (like social sciences and humanities) where books are the preferred way of publishing. --Crusio 14:04, 13 September 2007 (UTC)

I don't think that this is really a 'weakness of the h-index', it is rather a problem of collecting reliable data for its calculation. The h-index is a single number that somehow characterizes of a whole corpus of data; if the data is lousy it cannot be much better. However if the splitting of citations is taken to be a basic phenomenon, their distribution would deviate markedly from ~1/n and then the h-index and some other metrics should be ditched, along with ideas from Zipf, Pareto & Co.al 22:49, 13 September 2007 (UTC)

For the abstract concept, it is true that this is not a weakness of h-index but rather a problem of data collection. But as a practical matter it is a problem with h-index, since the data collection sources currently used/available for its calculation (WoS, Google Scholar, QuadSearch, etc) all have a widespread problem with splitting references. I looked up the original records in WoS for myself and a number of other people in my field and the splitting references problem is so bad there that it is not really possibly to compute the h-index manually using their records (well, it is possible, but it will take a couple of hours of painkstaking work). GoogleScholar and QuadSearch are a little better but not by much. Interestingly, I observed that the split references phenomenon may both both lower and raise one's h-index. For relatively young researchers it usually lowers their index if one paper with, say, 25 citations total, is split in the records as, say, 10-7-7-1. For more senior authors with several extremely highly cited papers/monographs, reference splitting may increase their h-index. For example, you have a monograph with 400 citations total. If it is split as 100-100-100-100, this will raise the h-index if its true value was under 100. I saw this effect clearly at play in the record of a famous and prominent modern mathematician, Mikhail Gromov. When you search his record in QuadSearch under "M.Gromov", you get an h-index of 43. But it is apparent that several of his papers and monographs, because of the slight differences in the way they are cited, are split about 4-ways each. It is hard to say what his actual h-index is, but it could easily be, say, 34 or 53.
I also looked up my own record in QuadSearch. I am a relatively junior mathematician, and just got tenure a year ago. When I searched my record in QuadSearch (using first initial and the last name seems to produce the most complete results), I got an h-index of 10. But it is clear that most of my records are substantially split and it is really hard to say what the true number is(even just using their records that exclude citations in books). It might easily be 12-15. My informal impression is that currently tools like QuadSearch and GoogleScholar usually produce an h-index with a plus/minus 10 margin of error and probably a rather high average deviation from the correct answer. As a practical matter, this seems to me to be a substantial weakness of h-index, at least for the moment. Regards, Nsk92 05:21, 14 September 2007 (UTC)
People with large publication records need not to worry about the splitting of citations. Ideally, h is the solution of C/x=x, an equation which admits the obvious rescaling (C/k)/x=x/k. If due to splitting, the hyperbola of citations is decreased k times, the number of papers increases by the same factor, but the ranking appears to be decreased by the same number. To put it otherwise: a paper with rank r had R citations; after rescaling, the same value R would correspond to the lower rank r/k. We should not be in a hurry to ditch the h-index just because of the split citing. 91.92.179.156 21:31, 14 September 2007 (UTC)
  • removed item: I have removed the following phrase: "*Some potential drawbacks of the impact factor apply equally to the h-index. For example, review articles are usually more cited than original articles, so a hypothetic author who would only write review articles would have a higher h-index than authors who would actually contribute original research." The reason for this is that I feel that it is debatable whether this is a drawback or not. It all depends on how one sees "reviews". If you consider them simply to be enumerations of published primary research, then I agree that they have not much value and that citations to these reviews unjustifiedly inflate h-indices and impact factors. However, people often put forward very original ideas in a review paper, and the influence/impact this has should be reflected in the h-index of that person and the impact factor of the journal that published the review. I'll be interested to see whether others agree with this position. --Crusio 16:44, 21 September 2007 (UTC)
I am OK with this modification. Regards, Nsk92 19:21, 21 September 2007 (UTC)
I am not quite that OK, because I think this is a more general matter. (We need a good article on "scientific reviews". It is no indication of lack of importance, just that its a separate population with a different distribution. Maybe we should try rewording? DGG (talk) 00:56, 22 September 2007 (UTC)
Yes, I think it may not be a bad idea to reword the original paragraph and replace the word "drawback" by something more neutral. E.g. something along the lines: "Similarly to the impact factor, the h-index has been criticized for rewarding review articles that are often more highly cited than original research articles." I did not object to Crusio's edit for several reasons. The main reason is that this particular criticism of h-index (as are most others) is unsourced in the main article. Second, the review issue mentioned appears to me to be relatively minor in the grand scheme of things. Third, (and this reflects my personal POV), I myself think that writing reviews is a valuable activity and should be encouraged.
I would really like to see the "Criticism" section of the article better sourced and documented. I happen to think that the split citation phenomenon is a much more serios problem with h-index than many others, but I would not put it in the article unless I find a reasonable source to cite. Regards, Nsk92 06:06, 22 September 2007 (UTC)
A crude numerical test confirms apparently the robustness of h with regard to split citing; but OR is anathema in WP. In Criticism(4) the problem seems to be addressed in general terms such as 'limitations of data bases' and 'not properly matching'.
Next item, Criticism(5), states a fact - the number of authors is ignored- then strays from the topic with suggestions and hypotheses; gaming illustrates the point but does not look realistic. Collaborative effort is not simply additive and in many cases partial contributions are better evaluated by dividing with the sqrt(nbr of authors)[citation needed]. al 17:41, 22 September 2007 (UTC)
this page should simply be about h factor--we probably need a good page or 2 to give a general discussion of journal metrics. And discussions about scientific authorship in general should go elsewhere as well--this is not the place. The meaning and evaluation of coauthorship might well be a separate article, there has been enough written about it, including dozens of articles by garfield. My personal view is that it is so different in different subjects that there is no general method. And between people too-- James Watson is his autobiography explains why he almost never added his name as a coauthor to his students work. That was a good deal of the ethos in molecular biology generally. My advisor Gunther Stent only added his when he actually made a contribution to the work, or totally rewrote the ms, or when the though a beginners paper needed his name as support. In medical science it seems to be very different. And high energy experimental physics is another game entirely. 06:57, 3 October 2007 (UTC)

Reverted[edit]

I am not starting a revert war: I sincerely believe the old introduction was much better. At least it offers a general description what the h-index is instead of giving an inaccurate definition which is reworded precisely a few lines below. Reading for the first time the definition of h is always confusing because it is one number for two different things: number of papers and lower limit of the number of citations; 'hybrid' is a good word here. Many arguments can be found in the preceding discussion so there is not really a point to repeat them. Please read at least the discussion about 'attempting'. It seems obvious that 'the set of most quoted papers' is a subset of all papers. The image fails to offer any insight or real explanation. It is much better to look up the original paper. The bright idea behind the h-index was to use rank statistics: arranging the papers according to decreasing number of quotes and relating the two i.e. rank & citations. Or graphically: a falling hyperbola (C/x) and a rising slope x. And this is in fact the recipe to find manually h: 1/make a list of papers; 2/ write in front the number of quotes;3/arrange in decreasing order;4/start counting lines;5/stop when the line number becomes greater than the number written on it.al 22:02, 6 October 2007 (UTC)

Original Research?[edit]

Where are we getting some of these h-index numbers from? Some have obvious references, they are ok. But the three tables that supposedly use the three databases of research come unreferenced. Where did we get the h-index data from in those tables? My concern is whether these tables of h-index table (say for the SPIRES HEP Database) were computed by a wikipedia user or if they came from a verifiable source. Keep in mind that the databases are constantly expanding not just with current research but also old research. Even if we ignored published research from beyond a certain point in time, these databases will continue to grow. -- KarlHallowell 23:00, 25 October 2007 (UTC)

they were actually from published sources, and if its not clear, it needs to be reconstructed. But the intention is only to use them as representative values to illustrate the method. You are right that has to be made clear--some of the information is over a year old. Given the sources (Scopus or WoS or whatever), anyone should be able to come up with the same figure--its not really a problematic calculation, except for Google Scholar, where it takes careful consideration which items should be included08:35, 27 October 2007 (UTC)
The three tables illustrate mostly how far from being complete are the different databases. In all cases the estimates are biased in the same way: the number of citations is always underestimated. To obtain a more reliable value of h one has 1/to compile the list of most quoted papers from all databases 2/to eliminate occurrences of the same paper with lower numbers of citations 3/to compute h .
Some comments here: the fuzzy set of most quoted papers will rapidly acquire shape with the first two or three sources and will not change significantly if more bases are used. The value of h will converge to a reasonable underestimate. With access to N sources it would be instructive to compare results obtained from different subsets including N-1 sources.
Of course this would be original research, but you might get a grant for doing it. al 07:53, 28 October 2007 (UTC)

Rankings[edit]

The list of rankings of individual scientists do not belong here, and more than a list of IFs would belong in that article. I'm not sure where to put them, though. butt hen, I'm not sure they are in the least encycloepdic at all, as a list.. suggestions welcome. DGG (talk) 10:43, 21 December 2007 (UTC)

I agree, not useful and not encyclopedic. I would suggest removing them completely. In addition, to keep these lists updated, original research is needed. --Crusio (talk) 10:46, 21 December 2007 (UTC)
Not true. The indices can be checked at the databases mentioned by typing in the persons's name, having the database sort the list by times cited (usually via a pull-down menu) and scrolling down until the number on the article is greater than the number of times cited. This is no more original research than might be used to gather information from the US Census website, or for finding an Erdős number for a particular mathematician (see List of people by Erdős number). The article needs a few examples, although all these lists may be a bit much. AnteaterZot (talk) 10:58, 21 December 2007 (UTC)
I think it may be appropriate to include those people's h-indices that were in Hirsch's original paper, but I agree that accumulating lists from various disciplines is not a good thing. Especially, as DGG often points out, the databases don't go back in time very consistently and often produce incorrect results for cases whose output is not entirely in the digital age, this tips it into WP:OR for me. Pete.Hurd (talk) 18:32, 21 December 2007 (UTC)
I agree that these lists are not useful--it is likely one or more of the h values listed changes every day. Is someone going to spend their life updating these? And what's the point of including long lists of specific names? As was pointed out earlier, many of these names seem arbitrary. How do these lists help people understand what the h-index is? A couple of examples from the original Hirsch article, as someone else suggested, would make sense as an illustration, and maybe a list of the top h in each field (although I don't think it's easy to discover this without an exhaustive search).--Cooper24 (talk) 11:08, 21 January 2008 (UTC)
WP:Be Bold! --Crusio (talk) 15:44, 21 January 2008 (UTC)
Go for it. T DGG (talk) 06:09, 22 January 2008 (UTC)

Date h-index was invented[edit]

I'm certain the H-number has been around since before 2005 -- even if it was not published until then. I removed "in 2005" from the first sentence of this entry. —Preceding unsigned comment added by 135.245.8.37 (talk) 21:46, 19 January 2008 (UTC)

If you're certain, where's the citation?--Cooper24 (talk) 11:08, 21 January 2008 (UTC)
Eventually see above Inspiration. al (talk) 11:58, 23 January 2008 (UTC)

Splitting of references phenomenon[edit]

I would like to bring up again the matter of the split references phenomenon, to see if the others have encountered similar problems and if this phenomenon has been mentioned somewhere else. Briefly, the problem is that the search tools used to calculate h-index (such as Web of Science and Google Scholar) frequently do not register correctly the citations of a particular work, and count them as citations of separate works. This is due to the fact that many papers are first cited as preprints, or as "to appear" papers (without the volum number and the year), and because of other minor variations in how a paper is cited even after it has appeared (e.g. sometimes the issue number is ommitted, etc), and in many instances people are fairly lazy in updating their rerefernce (they just cut and paste) and continue to refer to a paper as "to appear" or as a preprint, even after it has appeared in print. In my experience, the Web of Science is particularly terrible in this regard: even very minor variations in how exactly the paper is cited often result in split references. I have recently repeated an h-index experiment for Mikhail Gromov, one of the most influential and famous modern mathematicians.

1) The h-index search on QuadSearch [1], for "M Gromov", with disciplines restricted to mathematical and physical sciences. The resulting h-index was 41.

The total number of articles referenced was given as 542, while in reality, a MathSciNet search shows that Gromov has only 121 published papers.


2) The results of a Web of Science h-index calculation are quite different.

First I did an author search for "Gromov, M*" and then refined the discipline to mathematics. The result was a list of only 49 articles. When I sorted them by times cited, and computed the h-index, the value produced was only 19!

I then did a "cited reference" search under "Gromov, M*". I got a list of 967 (!) items attribited to Gromov (the great majority of them in Math). The list clearly demonstrates the pervasive nature of the split reference phenomenon, with most references split multiple ways. Of course, it was not possible to extract an h-index value from a list of 967 items.

The experiment shows that the results of h-index computations produced by Web of Science search and by QuadSearch (that seems to use Google Scholar) are widely divergent. I don't know what Gromov's actual h-index is, but I think it is a lot closer to 41 rather than to 19, and, in all likelihood, is greater than 41. A more careful analysis of the QuadSearch results shows that there is substantial reference splitting that occurs there as well (not surprising, with 542 items listed where only 121 papers have been published), although not as bad as in the Web of Science.


I did a similar experiment for another famous mathematician, Steve Smale, and again got widely divergent result for his h-index: 41 via QuadSearch and 21 via Web of Science, with substantial reference splitting in both cases.


I would really like to know how widespread this kind of phenomenon is in other disciplines, and if anyone has written about it anywhere. Regards, Nsk92 (talk) 23:05, 19 March 2008 (UTC)

it is in my opinion poor invalid work to use automatically calculated values from any of the databases. There are discussions of this and other problems in quite a number of sources by now. The best general place to find them all is the SIGMETRICS mailing list. DGG (talk) 18:24, 20 March 2008 (UTC)
I'll look up SIGMETRICs, thanks. However, if not using automated indices, then how is one supposed to compute h-index in practice? Especially for some-one like Gromov, whose total number of citations is in the thousands? Regards, Nsk92 (talk) 23:12, 20 March 2008 (UTC)
one advantage of h index is the ease of manual computation. Get a list of articles and number of citation for each. Sort them in descending order. Count down from the top. As a bonus, you do not have to even consider the papers with only a few citations which will be in the majority for almost everyone, or be concerned that you might miss some minor ones. DGG (talk) 21:05, 8 May 2008 (UTC)
Sorry, but my experience is exactly the opposite and I have found manually computing h-index essentially impossible, at least in mathematics, which is my field. The problem is with getting "a list of articles and number of citation for each". If you do manage to get an accurate list like that, then, sure, computing h-index manually is easy. The difficulty, however, is that the currently available tools are extremely poor in providing accurate lists of this sort. The Web of Science is simply terrible in terms of the split citation phenomenon (not to mention the fact that it does not count citations in books and monographs). GoogleScholar is a little better with split citations but still pretty bad. GoogleScholar has another problem in that it also counts citations that occur in pre-publications, such as preprints in ArXiv and then again when these preprints are published as research papers. Both WoS and GoogleScholar also have a problem with filtering the results for right names (which often causes difficulties if the name is fairly common or if the person had used several spellings of their name, had variations in using or not using their middle name etc; this often happens with foreign-born scientists). I suggest that you conduct an experiment and try to get an accurate list of papers with citation hits from either WoS or GoogleScholar for a few mathematicians, from very famous to just notable to maybe a few non-notable. I found that the only times when I was able to compute h-index manually with any degree of accuracy was when it was very low, around 3 or 4. Once it gets over 10, it is complete mayhem and over 20 it is simply impossible to do anything by hand when trying to create an accurate citation list. You can try to look at the WoS or GoogleScholar records of Mikhail Gromov (who is sometimes sited as Misha Gromov, Michael Gromov, etc), Steve Smale, and Alexander S. Kechris. All are well-known and influential mathematicians but I have no idea what their real h-index is. Nsk92 (talk) 23:26, 8 May 2008 (UTC)
That is a separate problem. There has been considerable discussion, of course--the SIGMETRICS list is the central place, especially since Garfield posts there abstracts of all work relating to bibliometrics, broadly defined, that is published elsewhere. The most recent publication is indexes. The two main difficulties, as you say, are the split indexing problem, and the coverage of non-journal publications. The field which has been most studied s, not surprisingly, information science, and the problem here is whether or not to include conference proceedings. WoS at present includes those published in journal forma, or in collections which can be referred to as if they were journals. -- it also does include material published in books and the like .
I do have a bias towards WoS, partially because my own subject is molecular biology--and WoS does perfectly there, which is again not surprising, because it was specifically designed for that subject; the beta was Garfield's 1963/64 print "Genetics Citation Index", prepared on a NSF contract. As the development of the science was in practice almost entirely Anglo-American, with almost no Russian contributions for many decades, the slavic name problem is not prominent. As the work cannot be done outside of fairly large size organised laboratories, there is also relatively little fragmentation. And almost all the important symposia are in fact published in series are referred to as if they were journals. I also am comfortable with it since I have known it since the beginning, and know how to deal with the quirks and inadequacies.
I have a bias against GS, for it is impossible to actually know their standards. In particularly, they include not just formal publication, and informal versions of them, and conferences of all sorts, peer-reviewed or otherwise, but also blog postings of various sorts that fit an apparently rather rough idea of respectability primarily being published in .edu, which is a very rough standard indeed, much wider than accepted as RSs in Wikipedia.
the preprint problem is different--WoS essentially ignores them, but Gs does a fairly good job of combined them as versions with the published paper. It does not get all of them, but it is usually fairly accurate if the title is identical. In my experience for physics, arXiv does an excellent job, for it does not combine merely by algorithm, but allows manual specification of the grouping--as does RePEc. The problem here is not with the bibliometrics, but with the publication system. I remind you that it cannot be assumed without inspection that what is claimed to be a preprint is identical to the paper--people have been known to modify extensively.
So I'm not really clear what you propose instead--any of the suggested modification of this measure would have the same objection in terms of the list of documents. DGG (talk) 00:13, 12 May 2008 (UTC)
I still do not see why it should affect preferentially the really major authors, especially since the intrinsic weakness of the h index is that it does not distinguish well at the very top of the spectrum. If someone has published, say , papers with citations 100 20 19 18 17 16 ... it does not matter whether the first paper is cited 100 times or 120 or for that matter 200, for the value will not be affected. It should be a practical problem more at the low to middle levels, where it is not obvious which apparently separate papers should be combined.
But the problem of reference fragmentation is not peculiar to the h index--it will affect any attempt to retrieve or calculate, since it affects the base data.
I don't really have a good suggestion regarding how to deal with the problem of adequate indexes for now. Hopefully there will be better ones in the future. MathSciNet actually is getting much better in this regard: they now list refences for all the items reviewed and, moreover, they actually try to update and cross-link these references when the papers they refer to do appear in print. Of course, MathSciNet has a big staff working on this and there is a lot of labor involved. Also, they still do not have the reference info on older papers although they are slowly expanding in that direction as well. All I can say is that, for all its faults, I will mainly stick with GoogleScholar for the time being (and I definitely would not try to compute h-index manually). My impression is that, on average, GS has a smaller margin of error than WoS for computing h-index (at least when it comes to mathematics). Nsk92 (talk) 00:29, 12 May 2008 (UTC)

Debasement[edit]

In an attempt to improve the Wikipedia article, I changed the words "the apparent scientific impact of a scientist" to "the apparent scientific effect of a scientist." In changing "impact" to "effect," I felt justified because I assumed that scientific articles should contain correct scientific terms. The word "impact" is a scientific term and has, for many years, meant "the striking or impinging of one body against another body." Only since the 1980s has it also been used as a lazy substitute by people who don't know the difference between "affect" and "effect." However, Crusio has reversed my change and has asserted that "This is not slang, 'impact' is an accepted term in this field." Thus, according to Crusio, this use of "impact" is not colloquial or beneath the standard of cultivated speech. This may be true since there do not seem to be any standards today, in speech or any other behavior. I would have thought that, of all possible fields, this use of "impact" would not be accepted in the field of science where it has unambiguously had its own meaning for quite a long time. The widespread usage of a word in the mass media does not mean that the usage is correct. This can also be seen in the current popular misuse of the phrase "begs the question."Lestrade (talk) 18:04, 11 June 2008 (UTC)Lestrade

Dear Lestrade, I agree with you about the deplorable state of English (I do know the difference between affect and effect, and all too often I see things in Wikipedia like "referances" and not just as a typo as some people use that systematically...). However, "impact" really is an accepted term in the field of scientometrics/bibliometrics, as can already be seen from the term Impact factor used to measure the "impact" of a journal. Etymologically, this may be incorrect, but it is a fact of life now.... --Crusio (talk) 18:12, 11 June 2008 (UTC)

Tant pis (shrug). Schopenhauer was right.Lestrade (talk) 19:18, 11 June 2008 (UTC)Lestrade

I agree with Crusio. By the way, according to the OED we've got Coleridge, in 1817, to thank for the earliest record use of "impact" meaning "effect".--Phil Barker 12:54, 12 June 2008 (UTC) —Preceding unsigned comment added by Philbarker (talkcontribs)
I also agree with Crusio. As a scientist myself, I can tell you that the word "impact" is very widely used now in academia exactly in the way Crusio used this term here. Nsk92 (talk) 15:06, 12 June 2008 (UTC)
the meaning is in fact a little more specific. The factor measures the cites in the first two years after publication. It thus describes not the total effect of the works published in the journals, which is arguably best measured by the total citations, but the relatively rapid impact in the narrow sense, upon the field, just as in the analogy of hitting a ball. Thus the factors are high in fast-moving fields like molecular biology, and low in descriptive biology. Garfield used language carefully. (detailed cites available, if not already in the article) DGG (talk) 04:57, 13 June 2008 (UTC)
There is actually a distinction between "impact" in a generic sense (the effect a scientist's work has on a field) and "impact" in the very specific bibliometric sense, as defined by the Institute for Scientific Information's Web of Knowledge citation database (link). (The latter is what DGG describes above.) Both usages are now well-established in the culture and tradition of science, so the term can be properly used here.Agricola44 (talk) 14:41, 20 June 2008 (UTC)

I believe that "…the word 'impact' is very widely used now in academia exactly in the way Crusio used this term here." In academia, one might also notice the wide use of "multitasking," "front–ending," "crashing," and "inputting." These may or may not be accompanied by the wiggling of fingers to signify quotation marks.Lestrade (talk) 21:58, 13 June 2008 (UTC)Lestrade

Not that it matters, but I wish Phil Barker would say where, in Coleridge's writings, the OED found that he used "impact" for "affect" or "effect." In Biographia Literaria, he used it to mean physical striking (Ch. V) and impressed contact (Ch. VIII).Lestrade (talk) 16:54, 19 June 2008 (UTC)Lestrade

B class[edit]

I've assessed this article as "b-class" and further categorized it in the "history of science" project, as I don't know how else to categorize a meta-discussion of citation. The article has an instructive illustration, clear and concise explanation and appropriate references. The external link section could be cleaned up, but the breakdown of different resources is helpful and (largely) clear. Some claims require more explicit inline citiation (see the tail end of "criticism" or the "advantages" sections). The criticism section itself should be rewritten as a series of paragraphs, rather than a disconnected list. The second paragraph in "calculating h" provides an outstanding example of what such a section should look like. Likewise, the "advantages" section is a little unclear. Either way, the article is informative, NPOV and well sourced. Protonk (talk) 18:44, 4 July 2008 (UTC)

age[edit]

Can we just admit that the H-score is meant to measure the performance of academics within their age peer groups. You simply can't compare an old & young person's H-score. What an H-score tells you is when an old person has been either very prolific or has underperformed. So you can use H-score when hiring full professors, chairs, etc., or even when assigning administrative tasks & teaching, but you can't use it when hiring young people. 204.52.215.123 (talk) 20:09, 12 November 2008 (UTC)

irrational stuff about rational H index[edit]

This remark that one can refine the H index to get an index which can take values between n and n+1 and therefore is much better, is nonsense. If an H-index could easily be off by 1 or 2 or 3 or even 5 or 7 by using different data-bases, then it makes no sense what soever to rank university departments or top scientists by looking at differences in the first digit after the decimal point. Hopefully the work in reference [11] has been debunked by someone, and then another reference can be added that this proposal is thought to be nonsense by some reputable scientometricians. Gill110951 (talk) 11:06, 15 July 2009 (UTC)

Article issues[edit]

  • Slowly but steadily, the article seems to be filling up with personal opinions, original research, and unsourced statements. Especially the Criticism section is becoming a mess. Most of these criticisms seem to be based on the idea that the h-index is not a perfect measure of impact/productivity and that, in fact, such a measure can exist. Obviously, capturing a scientist's whole career in a single number is rather ludicrous (although I admit that this remark is as much OR and POV as what the article now contains). Much of this stuff should be cut back and only the sourcable statements should remain. --Crusio (talk) 01:21, 7 August 2009 (UTC)
Some of these are not problems. The article now does represent a range of views, and does not rely on primary sources. The comments about the weaknesses of it can perhaps be sourced better, but there is an immense amount of literature available. I agree that it needs rewriting and better organization--the only reason I have not done it myself is because I do not have a NPOV here--I have had from its introduction a similarly low opinion about the value of the measure as Crusio. Nonetheless, there as at least one article on various modifications and aspects of this is each issue of JASIS. It seems to have attracted a good deal of information scientists--in addition to the bureaucrats who like things over-simplified) DGG ( talk ) 13:56, 11 August 2009 (UTC)
DGG, while you're here, how widely is the h-index used in practice? I'm on our college P&T committee and when we found the h-index listed on one person's dossier, no one other than me had even heard of it. (I share your skepticism toward the index -- at worst it reduces "deans don't have to know your field, they just have to count" to "now deans don't even have to count." ;-) Short Brigade Harvester Boris (talk) 04:06, 12 August 2009 (UTC)
  • Over here in Europe the bean counters carry the day... The h-index is often asked for specifically when you apply for grants, positions, or promotions. Our institute is up for it's 4-year evaluation and everybody here is calculating her/his h, the h of her/his group, and even the institute's h... This, of course, in addition to the almighty impact factor... Publications are listed in two different lists, one of them so-called "rank A" journals (IF>7). All the things we have to do to avoid actually reading papers!! :-) --Crusio (talk) 10:29, 12 August 2009 (UTC)
  • My impression is that use of H in tenure & promotion deliberations is on the rise, sometimes as a "sanity check" to lend confirmation to a decision and sometimes as a component of the decision itself. I have a friend at the senior professor level who says he now always checks H ... and he has sat on quite a number of P&T committees. Anecdotal, perhaps, but I think the trend is unmistakable. Respectfully, Agricola44 (talk) 19:25, 17 September 2009 (UTC).
I have the same impression. (And I am happy about that. A few years ago, only the number of publications was taken into account. A paper totally uninteresting, published because of shallow peer-reviewed was equal to hugely acclaimed papers. And in my field, the advent of genomics led to papers with thousands of citations but hundreds of authors, which defeated the citation index. H-index is far from perfect, but as a bibliometric index, it is useful. Which does not mean we must judge only from bibliometric measures). It is on the rise, and it is respected. Our institute even used it to compare our production with other institutes, by superposing "clouds" of H-index, and also trace the evolution of H-indice since faculty employment. Nicolas Le Novere (talk) 20:06, 16 March 2010 (UTC)
  • Just to add one more thing, it's a triviality to determine H using WoS by ordering publications according to number of citations. Respectfully, Agricola44 (talk) 19:27, 17 September 2009 (UTC).
I beg to differ. If your name is uncommon, mis-citations are very frequent (believe someone who has a space and an accent in his name). If your name is common, you gather the citations of all the homonyms. By experience, such a procedure is only trivial for 50% or so of scientists. MOREOVER, WoS is not a good source of citations for some fields, such as computing science. In fact, I systematically use WoS, Scopus and Google Scholar. And according to the publication, each of the three give a max.Nicolas Le Novere (talk) 20:06, 16 March 2010 (UTC)
So, you are saying that WoS systematically attributes papers to you that you did not write and/or overlooks papers you did write in their database? That would, in fact, be unusual. They do have a formal reporting mechanism for submitting corrections. If you're instead talking about searches that mess-up one's stats, then the queries are not sufficiently focused. For example, false-positives are easily rooted-out by further constraining upon subject area, institution, even department name. Finally, I think the notion that WoS is no good for compsci has taken on mythical proportions. There are compsci journals, compsci people do publish in them, and they are covered by WoS. Agricola44 (talk) 06:45, 14 December 2010 (UTC).
The subset of cs that is published in journals is highly biased (some subfields publish regularly in journals, others have conferences whose proceedings are journals, others almost completely avoid journals). So when you use a source like WoS that only covers the journals and not the conferences, you get systematic distortions. That to me is a much bigger problem than if it were just a random but unbiased subset. —David Eppstein (talk) 12:04, 14 December 2010 (UTC)

Institutions, nations and fields of study[edit]

The quote I have edited has so often been used out of context in the academic AfD page of Wikipedia that I have thought best to de-emphasise it by moving it from the lede to the body and add a proviso. Do revert if research exists. Xxanthippe (talk) 01:19, 16 November 2009 (UTC).

I agree. This quote is so frequently misused that I would not mind removing it from the article altogether. Nsk92 (talk) 02:16, 16 November 2009 (UTC)

Criticism?[edit]

Practically all of the criticisms of the h-index apply to practically all forms of bibliometry; they are not specific criticisms of the h-index. —Preceding unsigned comment added by 193.29.77.101 (talk) 14:25, 8 April 2010 (UTC)

In addition, one of the criticisms appears to be plain wrong. It says: "The h-index does not take into account self-citations. A researcher working in the same field for a long time will likely refer to his or her previous publications." In fact, the description of the index is given at the head of the article: "The index is based on the set of the scientist's most cited papers and the number of citations that they have received in other people's publications. [my emphasis]" I think that the last criticism should be removed, and perhaps the in other people's publications in the opening section placed in italics so that this is noted. Kmasters0 (talk) 04:03, 31 May 2010 (UTC)

WP:NOTAFORUM
The following discussion has been closed. Please do not modify it.
When prominent groups publish your work as their own without citing you, they get the kudos and you suffer. Senior academics routinely place their names on works to which they did not contribute ("routine" in some fields). Both degrade the h-index for the person who deserves the credit, and inflate the h-index of the bullies. 2A01:CB0C:56A:9700:CD27:7068:5AEE:5938 (talk) 06:18, 30 August 2018 (UTC)

Why the hyphen?[edit]

It appears to me that this should be called the "h index", not the "h-index", according to WP:Manual_of_Style#Hyphens (and general English usage rules). —156.106.216.156 (talk) 11:18, 20 July 2010 (UTC)

Notability of "successive Hirsch-type-index"[edit]

My recent edit about "successive Hirsch-type-index" was removed by User:Crusio for non-notability. I'd like to challenge that opinion. The article by Prathap alone is cited quite a few times (44 hits on Google Scholar: http://scholar.google.pl/scholar?q=%22Hirsch-type+indices+for+ranking+institutions%27+scientific+research%22&hl=en&btnG=Szukaj&lr= , Web of Science and other databases can be checked for comparison), Google web search shows 178 hits for the exact phrase of article's title. Reference 7 for example (also from 2006) has only 10 hits in Scholar: http://scholar.google.pl/scholar?hl=pl&q=%22Dubious+hit+counts+and+cuckoo%27s+eggs%22&btnG=Szukaj&lr=&as_ylo=&as_vis=0 . Therefore, I think "successive Hirsch-type-index" meets notability criteria for a single-sentence mention in the section about Hirsch Index variants.

Regards, Michał Kosmulski (talk) 20:08, 29 November 2010 (UTC)

  • I have no time to look into this right now, so I'll self-revert and will leave it up to other editors to decide whether this link should stand. --Crusio (talk) 23:12, 4 December 2010 (UTC)
Thank you. --Michał Kosmulski (talk) 12:57, 5 December 2010 (UTC)

Cited by scholarly source?[edit]

This site which ranks and academic journals tells the reader they can read about the H-factor on Wikipedia, and then links here.AerobicFox (talk) 04:48, 11 March 2011 (UTC)

AuthorRank[edit]

Someone should create a page for [2], it's already mentioned in several papers in google scholar, like [3]. Tijfo098 (talk) 15:36, 7 April 2011 (UTC)

I've added something about it here from that secondary source. The original 2005 AuthorRank paper doi:10.1016/j.ipm.2005.03.012 has some 69 citations in Scopus and over 100 in GS, which probably makes it notable for a topic like this. Tijfo098 (talk) 16:24, 7 April 2011 (UTC)

New External Link[edit]

This external link:

  • Scholar H-Index Batch Calculator an online tool designed by Luca Boscolo that calculates h-index and other parameters.

has been deleted because considered as spam. It is not spam. It links to a very good software I designed for calculating the h-index and other parameters online. It sends the results by email in a cvs format. For more info please see (link removed- consensus is that this is spam). Also I'm not advertising my personal website, is an organisation of Italian Scientists working abroad, it has around 400 members, which includes professors and researchers working for Universities around the world, but mainly in UK. This software has been tested by the Via-academy's members and it has been using by many Italian Academics in Italy. The results of this software are under rewiew by the ANVUR, the National Agency for the Evaluation of Italian Universities. — Preceding unsigned comment added by Luca boscolo (talkcontribs) 09:57, 9 September 2011 (UTC) Luca boscolo (talkcontribs) has made few or no other edits outside this topic.

I've tried the above and it seems to work as it says, provided you apply the usual caveats such as checking the references are not duplicated, making sure you have specified the correct author, etc. There is no charge for use of the service, so on balance I think this is not spam, rather a useful tool for demonstrating the application of the h-index. Therefore I think the external link should stay. Norman21 (talk) 17:03, 12 September 2011 (UTC)
If the gadget did not work there would be good reason for deleting it from Wikipedia, but I see no reason why, if it does work, that fact should be sufficient to include it in Wikipedia. There are many such devices and Wikipedia does not need to list them all. The use of Wikipedia for self-promotion is generally frowned upon, particularly when done by spas, anons, canvassers, stalkers and possible socks (I don't include you in those categories). Xxanthippe (talk) 08:31, 13 September 2011 (UTC).

Thanks for your comments. Well, it is a sort of self-promoting, the knowledge has to be promoted, otherwise we would all be ignorants. Also, what about the other external links, are they not promoting themselves? Are they specials? Why?--Luca boscolo (talk) 10:08, 13 September 2011 (UTC)

I've taken a look at this H-index calculator. In my opinion, the problem is not so much self-promotion as the fact that the calculator asks for the e-mail of the person doing the search and then claims to e-mail the search results to that e-mail address. This creates a serious privacy issue, as collecting e-mails of people interested in a particular kind of information is exactly the sort of thing that various spammers look for. I feel rather strongly that a link such as this should not be added to a Wikipedia page. Nsk92 (talk) 12:28, 13 September 2011 (UTC)
Come on, asking for an email is not for spamming reasons, it is only because it is a batch calculation and the results must be sent later on in some way. I do not think it is a privacy issue, emails does not mean much, you can always specify a different email to get the results. If you really believe this is the only issue, then I can think another way of sending the results, for example getting a download link.--Luca boscolo (talk) 13:32, 13 September 2011 (UTC)
Asking for an e-mail is not at all a minor issue. If the calculator simply displayed the search results right then and there, instead of e-mailing them, i would have no problems with adding the link. As it is, I am very much against it. Nsk92 (talk) 13:58, 13 September 2011 (UTC)
I can not display the data immediately, because, as I said, it is a batch process. If we all agree this is the only issue, I'll change the software to provide a link to use to download the results when ready. — Preceding unsigned comment added by Luca boscolo (talkcontribs) 14:10, 13 September 2011 (UTC)

I agree that the current state of our external links on this article is not great. However that is a reason to remove existing links, not add new ones. Protonk (talk) 21:04, 13 September 2011 (UTC)

I'd like to know why, for example, this link: "A tool by INRIA Lille, France" is in the list, while mine, that provides more information, it is not (ok, apart the privacy issue, which, I said, I could fix it)... I think, here, there is something not fair and not open!!!--Luca boscolo (talk) 08:18, 14 September 2011 (UTC)

Wikipedia:Other stuff exists. Xxanthippe (talk) 09:43, 14 September 2011 (UTC).
  • Luca argument does raise a point asn as said by Protonk, we need to clean up the external links section here (the section has been tagged for a while for cleanup, too). Which, if any, links should stay? I think "Publish or Perish" is probably the best known one, but don't see much reason to keep any of the others (including the complete list of lists...) --Crusio (talk) 10:01, 14 September 2011 (UTC)
  • Crusio, I agree to delete the external links section from the h-index page and to create a new page called H-index Calculators, or something like that, which contains the list of the calculators. I agree, Publish and Perish, probably is the most known one, but it is fair to give to anyone the chance to show his/her work and let the people to decide.--Luca boscolo (talk) 10:51, 14 September 2011 (UTC)
  • I don't think there is a need for a separate article on h-index calculators. As for which ones to include, I definitely remember that PoP was discussed in some scientific journals (that's how I first found it), so that is a good reason to include it. I doubt that there are sources on the other ones, so that's a good reason to exclude them. --Crusio (talk) 10:53, 14 September 2011 (UTC)
    • ok, then delete them all and add only PoP. Few articles will come out, very soon, on scientific journals discussing the Scholar H-Index Batch Calculator, after that, can I add the link on the H-Index page?--Luca boscolo (talk) 12:46, 14 September 2011 (UTC)

Promotional links[edit]

I removed the external links at the top of this thread. The first statement with the links appears to be blatant self promotion WP:PROMOTION. Links that follow in the body of the paragraph also appear to be promotional (self promotional). This contradicts guidelines and policies. Wikipedia is not meant for advertising or personal notoriety. Please see Wikipedia:Autobiography, WP:COI.

To be helpful, I believe a link to your own personal web page is allowed on your User page. I doubt it would be appropriate in any talk page discussions (of course there might be exceptions). ---- Steve Quinn (talk) 14:27, 14 September 2011 (UTC)

Also the icon at the top of the page this thread is being misused. It appears to me that the icon was placed to emphasize the (probably inappropriate) external links that were there. This is another indicator of an attempt at self-promotion. ---- Steve Quinn (talk) 14:37, 14 September 2011 (UTC)

  • I have trimmed the external link section. I don't mind repec on there (as they are a pretty well established directory service) but a lot of the other links seemed marginal. Protonk (talk) 17:01, 14 September 2011 (UTC)
Thanks Protonk. ---- Steve Quinn (talk) 03:16, 15 September 2011 (UTC)
Feel free to add some as necessary. However w/ a topic like the h-index, low content ELs are going to proliferate. There are just so many different ways of measuring citations, computing the index and so forth. Protonk (talk) 04:50, 15 September 2011 (UTC)

H-Index Calculators Page[edit]

I think there is now a need to create a new article/page for the H-Index Calculators, saying who they are and what they do and writing down indipendent com--Luca boscolo (talk) 06:53, 16 September 2011 (UTC)ments for each of those.--Luca boscolo (talk) 11:03, 15 September 2011 (UTC)

  • If you have independent reliable sources discussing such calculators and comparing them, that would (perhaps) be possible. I don't recall, however, ever seeing much in this direction. --Crusio (talk) 12:07, 15 September 2011 (UTC)
  • Since you guys have been so quick in deleting those external links, I thought you are independent reliable sources, if not, we need to establish rules, one that seems already in place, see the PoP case, is whether it has been discussed/used in a scientific journal.--Luca boscolo (talk) 14:20, 15 September 2011 (UTC)
    • I'm going to tell you right now not to waste your time. We aren't going to create a page of h-index calculators because we aren't a directory of h-index calculators. We certainly aren't going to create one so that you can promote your site. You are welcome to edit wikipedia but you are going to see a lot of frustration if you continue down this path. There are thousands of statistics articles out there which need improvement and dozens of bibliometric articles as well. Work on those! But please don't keep trying to add this link to wikipedia. Protonk (talk) 16:50, 15 September 2011 (UTC)
      • dear protonk, ok, now I see who is behind to these articles, thank you very much for your help. Luca boscolo

Delete due to recentism[edit]

There are several articles cited in this article from 2010 and 2011. Hence, the choice to delete the edit on 22:58, 19 December 2011 to the "Alternatives and modifications" section due to "recentism" is quite unjustified. Thallium18 (talk) 16:57, 20 December 2011 (UTC)

  • I have no opinion on the "recentism" argument, but the stuff you added was not about a modification of the h-index or such, so it doesn't seem to belong here. --Guillaume2303 (talk) 17:51, 20 December 2011 (UTC)

Formatting[edit]

The term h-index is formatted inconsistently in the article. Although in most places editors rendered it as h-index, in some places it is h-index or H-index. Could someone please provide a peer-reviewed source that makes the preferred capitalisation and italicisation clear? Ringbang (talk) 18:12, 6 September 2012 (UTC)

I did some harmonization in the article. Now all instances use the h-index format. --Waldir talk 14:59, 12 February 2013 (UTC)

tori and riq indices[edit]

I added information and links to the tori index and the riq index which are important to fight against the impact of the self-citations, and to reduce the effects of the academic age. They allow for meaningful comparisons across different fields with different citation practices. 45 Wuz (talk) 10:38, 9 November 2013 (UTC)

  • "fight"?? Anyway, the tori and riq indexes are based on a single article and it is unclear whether anybody (except apparently one single database) has taken any notice of them. The added blog is not a reliable source. Not sure this addition should stay. --Randykitty (talk) 17:48, 9 November 2013 (UTC)
  • My reaction is also that it's WP:TOOSOON to have separate articles on these, or even to mention them in the h-index article. —David Eppstein (talk) 18:52, 9 November 2013 (UTC)
  • Wait until it becomes demonstrably more established. Xxanthippe (talk) 21:27, 9 November 2013 (UTC).

The END of science[edit]

This bullshit is going to kill science. How would the giants of science be rated according to this index? Don't think to modern physicists only, just think about every science, in every era. Gnome-against-trolls (talk) 22:05, 22 February 2014 (UTC)

  • O share your opinion. Unfortunately, that's all it is, opinion, and talk pages are for discussing improvements to the article, not for venting our own opinions/frustrations. --Randykitty (talk) 22:10, 22 February 2014 (UTC)

Ok, my question is: how would past giants of science be rated according to this and similar indexes?Gnome-against-trolls (talk) 22:21, 22 February 2014 (UTC)

Easy to answer. Take a look of the citation data for Einstein, Maxwell and Dirac on Google scholar. Xxanthippe (talk) 22:54, 22 February 2014 (UTC).
So these indexes are a good "tool" to evaluate the impact of works in the field of physics only? What about other natural sciences or formal sciences or even humanities? I had read long time ago about criticism against the whole "publish or perish" matter, if I'll find those sources, I will propose to include such section in each relevant article.Gnome-against-trolls (talk) 04:18, 23 February 2014 (UTC)
There has already been extensive discussion (almost ad nauseam) of all these issues on the policy pages and its talks and their archives. I suggest you look at these first. Best wishes. Xxanthippe (talk) 05:02, 23 February 2014 (UTC).
Where exactly? The publish or perish article includes some relevant criticism, should I look in its talk page, or even elsewhere?Gnome-against-trolls (talk) 09:38, 23 February 2014 (UTC)
h-index and WP:Prof. Xxanthippe (talk) 10:52, 23 February 2014 (UTC).

Removed L-index[edit]

Anonymous editors have been repeatedly adding the following text to the article:

  • The L-index has been proposed [1]. It accounts for the number of coauthors, the age of publications, is independent of the number of publications and conveniently ranges from 0.0 to 9.9.

References

  1. ^ Belikov, A.V.; Belikov, V.V.; (2015). "A citation-based, author- and age-normalized, logarithmic index for evaluation of individual researchers independently of publication counts". F1000Research. 4: 884. doi:10.12688/f1000research.7070.1.CS1 maint: extra punctuation (link) CS1 maint: multiple names: authors list (link)

My own opinion is that this is WP:REFSPAM, it is an insignificant aspect of the subject, it is only one of what Google scholar tells me are 31500 articles on the h-index, among which it has no better claim to fame than any other, as a newly published article it is too soon to have accumulated any impact, and Wikipedia is an inappropriate choice for publicizing new research. Accordingly, I have semiprotected the article, not to prevent this material from ever being added, but rather to redirect this from being an edit war into a discussion where we can build a consensus about what to include. That way, I hope, we can make a decision based on Wikipedia policy rather than including things based on how persistent their proponents are. As such, other opinions are of course welcome here; what do the other editors of this article think? —David Eppstein (talk) 20:21, 15 October 2015 (UTC)

The Belikov paper has so far proved to be of little interest and should be excluded from the article. WP:Semiprotection should be made indefinite to stop this trolling. Xxanthippe (talk) 21:59, 15 October 2015 (UTC).


Journal H index[edit]

Please note that SCImago Journal Rank uses an h-index (or H index) in their journal ranking: http://www.scimagojr.com/journalsearch.php?q=21100230018&tip=sid&clean=0

Maybe the article should be updated with a section on this. (Simiprof (talk)) —Preceding undated comment added 16:26, 17 September 2016 (UTC)

Example given in "Calculation" section might be erroneous[edit]

The article states the following:

First we order the values of f from the largest to the lowest value. Then, we look for the last position in which f is greater than or equal to the position (we call h this position). For example, if we have a researcher with 5 publications A, B, C, D, and E with 10, 8, 5, 4, and 3 citations, respectively, the h index is equal to 4 because the 4th publication has 4 citations and the 5th has only 3. In contrast, if the same publications have 25, 8, 5, 3, and 3, then the index is 3 because the fourth paper has only 3 citations.

f(A)=10, f(B)=8, f(C)=5, f(D)=4, f(E)=3 → h-index=4
f(A)=25, f(B)=8, f(C)=5, f(D)=3, f(E)=3 → h-index=3

According to the definition in bold the first example should have an h of 4, because the 4th publication (sorted from most to least citations) is equal to that position and the 5th publication has a value of 3. By the same token, the second example should have an h of 5, because the 3th publication (sorted from most to least citations) is bigger than the position and the 4th publication has a value of 3, therefore is less than the position.


Additionally, this goes in accordance to the explanation and examples found in the university of waterloo and the university of wageningen.

David Eppstein since you reverted my change to the page, would you care to clarify why is the second example correct even though it does not conform to the definition? — Preceding unsigned comment added by Chibby0ne (talkcontribs) 02:06, 12 March 2018 (UTC)

The h-index is the position, not the value at that position. In the second example, "the last position in which..." is the third position, so h=3. —David Eppstein (talk) 02:13, 12 March 2018 (UTC)
Oh I see now. Thanks for the clarification. Perhaps the definition should bolden the word position, because I failed to see this subtlety with the given numbers of the examples, in two different occasions. --Chibby0ne (talk) 02:21, 12 March 2018 (UTC)

To Remove Certain Statements that are not well sourced[edit]

This article has some bias and statements made without concrete sources and representation. The statement related to a certain H-index is equivalent to election to National Academy of Sciences is flaw. Also the source to that statement is not accessible publicly. This does not represent formal view of the National Academy of Sciences and should be removed. Comments welcome.

Also, h-index does not reflect top work of authors is a fact and it is clear from the graph. H index took the middle of the spread. So, @Xxanthippe: editor Xxanthippe should not revert things right away without communications with other editors. And there is nothing wrong stating facts and helping readers to understand. 2405:800:9030:2C47:28:AD81:44C5:1EB5 (talk) 15:14, 16 June 2019 (UTC)

Unless you can make a case against Ivars Peterson's piece in Science News, you ought not be removing cited content claiming only that it's biased. Do you have a citation for NAS's formal view? Chris Troutman (talk) 15:34, 16 June 2019 (UTC)
@Chris troutman:Yes, the NAS will be informed. A person's published view in Science News does not represent the whole National Academy of Sciences's formal view. I believe it is too early to even allow such general statements to be made. Election to the National Academy of Sciences are based on many factors, not merely a h-index.— Preceding unsigned comment added by 2405:800:9030:2c47:28:ad81:44c5:1eb5 (talkcontribs) 11:47, 16 June 2019 (UTC)
But that section of text starts with "Hirsch suggested that..." We're not construing this as the official NAS view nor are we intoning that membership is based upon h-index. The citation is one person's view, which informs the reader. Chris Troutman (talk) 15:51, 16 June 2019 (UTC)