User:Rockpocket/Ref desk stats

From Wikipedia, the free encyclopedia

This page documents occasional statistical surveys of Wikipedia:Reference desk, its contributors and its impact on the encyclopaedia. Unless otherwise stated, the work here was carried out by Rockpocket (talk · contribs). Please feel free to make changes to this page if you spot a mistake, however I would prefer large scale changes in content be discussed on the talk page first. If you would like the raw data to do further analysis, drop me a note on my talk page.

Query/response analysis, October 2007[edit]

Figure 1. The rate of questions asked is linear with respect to those left unanswered.

Inspired by a request on the desk itself, I decided to determine the numbers of questions asked at the Reference Desk and the proportion that were answered, as a function of time and subject. To do this I sampled all questions asked in a 4 week (28 day) period between October 1 and October 28, 2007. Questions that were deemed inappropriate for the board and previously removed (either because they were examples of trolling or requests for medical/legal advice) were discounted, as were questions that were duplicated across boards (these were counted in the most suitable category).

Results[edit]

In total, 1860 questions were asked during the 4 week long study period. 75 of these questions remained unanswered (4%), providing a response rate of 96%. This breaks down:

  • 465 questions asked per week, 18.8 not answered
  • 66.4 questions asked per day, 2.7 not answered
  • 2.8 questions asked per hour, 0.1 not answered

This assumes the rate of answering questions is consistent with the rate they are asked. To ascertain this I plotted (Figure 1) the accumulative number of questions asked (x axis), against the accumulative number of questions that were not answered (y axis). Indeed, the relationship between the two does not significantly deviate from linear. These data demonstrates that unanswered questions are distributed equally across the 28 day sampling period, and that - beyond the first seven days after posting - additional time does not appear to increase the probability of a question being answered. This is an artifact of the archiving/transclusion system. Questions are only "active" on the desk for seven days before archiving, thus if a question is not answered a week after posting, it is unlikely it will ever be answered.

Figure 2. Questions asked by day of the week

Another possibility is that the query/response profile way vary by day during the week. Since the Ref Desk is a truly global site, the collective work week of the querents and responders is ill defined. For the purposes of this study, all questions were attributed to days according to Zulu time, thus the majority of questions could be attributed incorrectly by as much as -8 hours (contributors from the Pacific coast of the Americas) to +12 hours (New Zealanders). Figure 2 shows the mean number of questions asked on each weekday. The data is further sub divided to show the distribution of questions between subjects. The standard error is also shown (n=4). While there is a trend towards fewer questions being asked at weekends, the difference is not statistically significant.

Subjects[edit]

Historically, as single reference desk accommodated questions on any and every subject. In August 2005 this was split into 4 different desks, followed by others in November 2005, July 2006 and December 2006 as the service became more popular. Seven different desks are currently available covering Computing, Science, Mathematics, Humanities, Language, Entertainment and one for everything else.

Figure 3a shows the distribution of questions asked within these subjects. Humanities was the most popular subject with 421 questions (23%) closely followed by Science (414, 22%). Miscellaneous (346, 19%) and Computing (296, 16%) were next most popular, with Language (142, 8%), Entertainment (122, 6%) and Maths (119, 6%) significantly less popular. As Figure 2 demonstrates, this distribution is relatively consistent day to day, with Computing questions showing the most relative variability (everyone's computer works on a Tuesday, it would appear).

As a measure of how proficient Ref Desk volunteers are at providing answers for questions from each subject, I plotted unanswered questions in the same manner (Figure 3b). If each type of questions was answered equally well, the plots would be segmented in roughly the same proportions for each subject. This is the generally the case with the most obvious exception of Science, Language and Entertainment questions. Science and Language questions tend to be answered more frequently than average (less then 2.5% remained unanswered, compared to 4% overall). Most striking was the difference in the Entertainment questions. Off the 122 Entertainment questions asked, 15 remained unanswered (12.3%). Thus an Entertainment question is three times more likely to remain unanswered than a randomly selected question, and six times more likely than a question pertaining to Language. Possible reasons for this will be discussed in greater detail below. It is important to note, however, that while the relative proportion of unanswered questions is unusually high for Entertainment questions, the total number of unanswered Entertainment questions (15) is less than those unanswered Miscellaneous questions (17), and similar to Humanities (14) and Computing (13). This can be observed in Figure 3c, showing the mean number of questions asked by subject each day, subdivided into the amount answered (in colour) and unanswered (in black).

These data also demonstrate that the subjects segregate into three statistically significant groups based on popularity. Humanities and Science are not significantly different in terms of questions asked, but do differ from Miscellaneous and Computing (Student's t-test, P<0.05). Similarly, Miscellaneous and Computing do not differ from each other, but do differ significantly from Maths, Languages and Entertainment (Student's t-test, P<0.01). These three are not significantly different from each other.

Discussion[edit]

While the quantity of questions answered can be assessed, the quality of those answers are beyond the scope of these analyses. That would require a survey of those that asked the question - to assess whether the querent considered the answers helpful - and a third party analysis of whether the question was, in fact, answered correctly. Considering that would require research into over 1000 disparate subjects, it is unlikely to ever be considered on this size of sample. I considered a question answered if there was, what I considered to be, a good faith attempt to help the question. This amounted to a response that provided a link, information or opinion related to the question. It did not include requests for clarification of the original question, asides, additional questions without informing the first, notices the question was inappropriate or bad faith responses.

What is possible is a qualitative analysis of the questions that remained unanswered. These appear to divide into three different classes. Some questions appear to be answerable, in that - from a non-expert perspective - they do have an answer that is not outwith the normal scope of that which is provided on the desk.

Examples of this class of unanswered question
  • Maths:
  • Simple question: Is the strong topology on distribution space the same as the compact open topology? I think so but I am a little out of touch and hence need confirmation. Thank you in advance. twma 03:51, 9 October 2007 (UTC)
  • Suppose one had a power series w/radius of conv = 1, say, around z=0, say. If z were allowed to be quaternionic, then the boundary of convergence would be a 3 sphere of rad 1. I bet the subset of the boundary where the series converges could be quite beautiful and fascinating. Does anyone know of results/research in this area? It would be a great addition to put in this article.Rich 07:47, 25 September 2006 (UT --I've moved this here from article talk page. Thanks.--Rich Peterson130.86.14.25 03:17, 26 October 2007 (UTC)
  • Computing:
  • Can anyone tell me if Microsoft Publisher can be considered as a Word Processing software? —Preceding unsigned comment added by Shelaghmccormick (talkcontribs) 17:31, 5 October 2007 (UTC)
  • I was looking through Stepmania, and I was wondering, which one is the version that has the same steps as DDR? --JDitto 01:40, 5 October 2007 (UTC)
  • I heard that there are three types of advertising sugarmama has. 1) text messaging ads, 2) video ads, 3) filling forms. Can you explain a bit of the procedure of these three types of ads. If you know only one or two types, please explain. Thanks —Preceding unsigned comment added by 59.92.123.57 (talk) 19:24, 10 October 2007 (UTC)
  • I saw on wikipedia that support for .hlp help files was removed on windows vista to encourage use of newer help formats. What is this new format?? If possible tell me some programs to create this type of help. —Preceding unsigned comment added by 200.242.17.5 (talk) 01:15, 12 October 2007 (UTC)
  • Humanities:
  • So I was reading the "due diligence" section of the takeover article. What, if any, safeguards are in place to prevent a company from proceeding through the due diligence phase, and then backing out of the merger and using that newly gained information to eliminate that company as a competitor? I think this might have been a subject of a Dilbert cartoon. —Preceding unsigned comment added by 141.209.45.88 (talk) 16:09, 3 October 2007 (UTC)
  • What is the ticker symbol of ACl International? And, on LA Gear, is it ACL or ACI?§§§ —Preceding unsigned comment added by 76.222.101.109 (talk) 21:06, 3 October 2007 (UTC)
  • Hi, does anyone know the exact drum notation to Phil Collin's Both Sides of the Story? --Writer Cartoonist 00:07, 28 October 2007 (UTC)
  • Entertainment:
  • the second episode of spooks series six was broadcast on BBC3 last night half an hour after episode one finished on BBC1. Does the character Zafar Younis played by Raza Jaffrey die during this episode?? 88.111.45.33 13:01, 18 October 2007 (UTC)
  • I'm a big fan of the piano/wind band composer Caesar Giovannini. In fact, I'd say my favorite wind band piece ever is his "Jubilance: An Overture." It's an older piece, and I can't seem to find a recording of it. Anyone know where I can find one?72.219.143.150 03:40, 19 October 2007 (UTC)
  • Science:
  • Do jellyfish stings effect sharks? If so where can I locate this answer with more info? —Preceding unsigned comment added by 63.3.17.130 (talk) 00:06, 18 October 2007 (UTC)
  • Has anyone ever setup a camera and an Anemometer and recorded the movement of the Sailing stones and the wind speed? Clem 22:02, 19 October 2007 (UTC)
  • Miscellaneous:
  • In what year did Kaikora North change its name to Otane? —Preceding unsigned comment added by 58.28.136.29 (talk) 21:27, 22 October 2007 (UTC)
  • The Major League Soccer page says that the league is organized by the USSF and the CSA, but does the CSA have any say over how MLS is run? 71.36.181.218 19:10, 3 October 2007 (UTC)

The reason these questions are not answered may simply be that the individual(s) with expertise or experience in this area were not available or disinclined to answer. In a minority of cases it may be that potential respondents consider these questions of be homework (consider, for example, the computing request about the "three types of advertising sugarmama has"). It is currently against Ref desk policy to answer homework questions, though usually responders will note that and provide hints or direction rather than ignore the question (these I have considered as good faith answers for the sake of this analysis). Another reason for unanswered questions of this class may simply be due to misplacement. For example, the request for the LA Gear ticker may have been answered if placed on the Miscellaneous desk rather than the Humanities desk. Often helpful respondents will recommend the question be asked on the appropriate desk, or copy it there themselves. Another reason may be due to the phrasing of the question. Consider:

Due to its phrasing, it is very difficult to answer that question in the negative, as it requires complete knowledge of what has not happened, rather than an single example of knowledge about what has happened. Only an expert could answer "no" and be confident of being correct. A rephrasing of the question, for example

may have been likely to draw a helpful, if incomplete, response. Nonetheless, it is this class of unanswered question where the Ref desk could most easily improve its record of response. This is the largest class of unanswered question. Some questions are difficult to accurately class, nevertheless this class accounts for as much as half of those unanswered on the Ref desks.

A second class of unanswered question is those that require extremely specialist knowledge beyond what is likely available from Ref Desk volunteers. While these questions can be answered, to do so would probably require access to information not available online or through generally accessible books or other references sources. Occasionally such questions are answered when, by coincidence, an expert is available to respond. However, short of recruiting more "volunteers" to ensure there is sufficient pool of expertise, it is difficult to see how to ensure this class of question is answered more often. This is the second largest class, accounting for around one third of the unanswered questions

Examples of this class of unanswered question
  • Languages
  • Is there anyone here with access to a dictionary or other resources on the Hopi language? It's claimed that "Hakomi" means "How do you stand in relation to these many realms?", but I'm skeptical and would like confirmation. (See Talk:Hakomi.) --Alivemajor 09:17, 2 October 2007 (UTC)
  • Miscellaneous:
  • ESPN Used to have a Bottom Line sports update at the 28th and 58th minute of every hour. What is the reasoning for changing that to the 18th minute instead of the 28th minute?
  • Help Needed with getting licensed with the SEC, via the Series 22 Study material. There is not enough people taking this type of test as compared to other securities tests so the examination facilities don't put money into the 22 exam because it's not worth their time and effort. This makes it hard to find material to study the 22. I have some material and have already taken the test once and failed. I need more material via CD,DVD...ect. Is there anyone out there who can help me???? —Preceding unsigned comment added by Pittpat1 (talkcontribs) 16:13, 16 October 2007 (UTC)
  • Entertainment:
  • What is the US market revenue $ figure for feature FILMS shown on free tv/cable/pay tv? Growth rate in %? (IE, not theatrical box office, etc., just the market figures for films in these media. Not tv shows, etc.) What is the global market revenue for visual entertainment digital content in the internet/mobile market? (not info, business, music, etc. - just visual entertainment.) What is the projected growth rate?Thanks, Timothy —Preceding unsigned comment added by 65.184.49.135 (talk) 12:34, 24 October 2007 (UTC)
  • Can anyone provide me with a link to Nintendo DS demographics, especially want ages of DS players. Beetle120 10:28, 14 October 2007 (UTC)
  • How many 10th graders got accepted into Brooklyn Tech in 2006? Also, about how much do you need to score to get into Brooklyn Tech if you are taking the score as a 9th grader trying to enter 10th? —Preceding unsigned comment added by 24.189.57.235 (talk) 22:48, 21 October 2007 (UTC)
  • Humanities:
  • Microsoft/aQuantive deal, Google/DoubleClick deal. Which banks acted as advisors, or financed these deals? - 204.104.55.242 13:02, 1 October 2007 (UTC)
  • Do you know of anyone who has expressed a criticism or opposing views to those contained in Hayek's Why I Am Not a Conservative?? Your help is much appreciated, as I had trouble using Google Scholar and JSTOR (and unfortunately there is no separate article for this postscript of The Constitution of Liberty). Regards --Dami 23:13, 8 October 2007 (UTC)
  • How many Bengali-Canadians politicians are participating in the Ontario election? —Preceding unsigned comment added by 76.64.53.159 (talk) 02:27, 11 October 2007 (UTC)
  • I really appreciate any help finding resources to determine the distribution of average hours of vacation paid time off taken annually. Minimally I need it by month, but further resolution wouldn't hurt. An average for US would work, but I am looking for vacation taken (not awarded or available) by non-union health care workers in Texas. Thanks, Aqualinx 15:54, 11 October 2007 (UTC)
  • Years ago, somewhere in the mid 90s, I went to the Art Institute in Chicago. While there I saw a wool suit on a hanger which was displayed as a piece of art. It was as if someone took it out of their closet and hung it on the wall. I remember seeing an article about this work, maybe it was mentioned in the article about the artist, here at Wikipedia. Does anyone know of this piece and know what the name of it might be? Dismas|(talk) 12:36, 11 October 2007 (UTC)
  • Computing:
  • What I'm looking for is a comparison between a ray traced scene and an image (greyscale perhaps) representing the number of BSP-boundarys/triangles/bounding boxes/sum of all traversed per ray-pixel. Anyone got or seen anything like this? Cheers83.100.254.51 17:11, 13 October 2007 (UTC)

A final class of unanswered question are essentially unanswerable either because the question lacks context or coherence, or requests information the is realistically impossible to provide. Often these were generate requests for clarification, and will occasionally draw an answer based on what the responder thinks is being asked for. This class is the smallest, accounting for around 15% of the unanswered questions. Interestingly, these appear to be enriched in the Entertainment, especially, and Miscellaneous desks, perhaps reflecting the trivial nature of some of the questions.

Examples of this class of unanswered question
  • Entertainment:
  • i am about having my own stage drama academy.pls,give me the names and personnels of stage drama and the instruments.also,give me the personnels and the faculties of stage drama —Preceding unsigned comment added by Sagbastar (talkcontribs) 15:58, 22 October 2007 (UTC)
  • I was busy doing other things and I spotted a link for an angel game that needed an article, and I wanted to get my son playing it with a view to writing an article with his help, so if anyone knows the one I am talking about I would be grateful if they could let me know, thank you.--DitsyDaisy 20:54, 14 October 2007 (UTC)
  • Hello, I am trying to find out what the largest street hockey game (played on ice) is. Like, for example if someone flooded the street in the winter and it froze or if there was ice naturally present for playing on. If this information is not available I would also take any info on the largest street hockey game ever played (ie longest street, most players etc.) Thanks in advance! —Lost —Preceding unsigned comment added by 69.77.165.34 (talk) 13:49, 4 October 2007 (UTC)
  • In The Sisterhood of the Traveling Pants is there a poem that on of the girls is writing to a guy that they are with or married to. I was wondering if anyone could help me. I was at a wedding and the maid of honor read a poem and it was from a movie about sisters, but that's all I know. —Preceding unsigned comment added by 69.210.98.23 (talk) 21:53, 3 October 2007 (UTC)
  • Miscellaneous:
  • please find jurnals of training220.247.225.30 08:40, 9 October 2007 (UTC)
  • Hi, Can anyone put suggestion regarding Media Gateway control protocol for me at. Thanks and regards Upendra —Preceding unsigned comment added by Ubhatnagar (talkcontribs) 07:30, 12 October 2007 (UTC)
  • Computing:
  • Somewhere that present me the suitable devices of my given requirements and characteristics for PC?Flakture 18:06, 5 October 2007 (UTC)

Conclusion[edit]

This study of the Ref Desk efficiency demonstrated an impressive 96% response rate over 1860 questions in 28 days. An analysis of those questioned not answered suggests this could increase to around 98% with the current repertoire of Ref desk responders making a concerted effort to ensure questions were not left unanswered. The additional 2% would be more challenging to address. The two subjects that show the weakest response rate is Entertainment, by some margin, followed by Miscellaneous. However, these also attract the most unanswerable questions. If these are discounted then there is little difference in answering rate by subject. Thus it can be concluded the Wikipedia Reference Desk functions remarkably well in terms of directing querents to information, though the quality of that information can not be easily evaluated. In addition, during the sample period, six encyclopaedia articles were created or significantly improved as a direct result of questions being asked and answers provided (This rounds out a total of 68 articles created or significantly improved in 10 months since records have been kept).

See also[edit]