In an article published recently in the Journal of Documentation, library researcher Pnina Shachaf analyzed the quality of answers at the Wikipedia Reference desk. The reference desk serves as an open forum for visitors to ask questions about any topic not directly related to Wikipedia itself (the Help desk answers questions about the site); anyone is welcome to help answer questions. This paper is the first to study Wikipedia's reference desk. It found that Wikipedia volunteers performed as well or better than traditional library reference desk services on most quality measures, providing a similar level of service.
Shachaf's study analyzes the quality of Wikipedia Reference desk answers in the context of other, similar studies. A literature review is given of other studies of online question and answer boards, such as Yahoo! Answers, as well as a brief review of classic studies of the effectiveness of traditional library reference services. Shachaf notes that collaborative Q&A sites are a new model for reference, and that research to examine their quality is still new and rarely takes account of findings from traditional reference research.
The study uses content analysis to analyze reference desk answers on three measures: reliability ("a response that is accurate, complete, and verifiable"); responsiveness ("promptness of response"); and assurance ("a courteous signed response that uses information sources") (pp. 982). These are based on a metric called the SERVQUAL measures that have been extensively used in other studies of library reference services. They also map to the basic guidelines that are given to question-answerers on the Wikipedia reference desk (for instance, to sign responses).
The data sample used was from April 2007, and analysis was done on 77 questions with a total of 357 responses, out of the 2,095 questions received in April 2007 (or an average of 299 transactions for each of the seven topical reference desks). Shachaf notes that "on average, the Wikipedia Reference Desk received 70 requests per day and users provided an average of 4.6 responses for each request" (pp 980). Shachaf first analyzed whether the questions were asked and answered by "experienced" (determined for the purposes of this study as an editor with a userpage) or "novice" (without a userpage) users, finding that 85% of answers were provided by "experienced" users.
Of the questions analyzed, Shachaf found that most questions were answered quickly (on average, the first response was given after four hours); that answers were signed with Wikipedia usernames; and that 92% of the questions given a partial or complete answer. 63% of the questions were answered completely. Of the factual questions where the coders were able to determine accuracy, it was found that 55% of the answers were accurate, 26% were not accurate, and in 18% of the cases, there was no consensus reached on the reference desk. 55% is comparable to studies of the accuracy of traditional one-on-one reference.
The sources used in reference desk answers were also examined. The sources used in a sample of 210 interactions were analyzed; Wikipedia articles were referred to in 93% of these transactions and account for 44% of the references listed. Sources such as journals, databases, and books were very rarely used. This is a major difference from answers provided in traditional library reference services; librarians tend to use and cite sources, including traditional information sources such as journals and databases.
Shachaf compares these statistics to traditional library reference services. Overall, answers at the Wikipedia reference desk are comparable to library reference services in accuracy, responses are on average posted more quickly than emails to libraries are replied to, questions are answered more completely at the Wikipedia reference desk than via library virtual reference services, and thank-yous from question askers are received at the same rate. The conclusion is that "The quality of answers on the Wikipedia Reference Desk is similar to that of traditional reference service. Wikipedia volunteers outperformed librarians or performed at the same level on most quality measures" (pp. 989).
However, Shachaf cautions that these results are only achieved in the aggregate. Shachaf writes:
...while the amalgamated (group) answer on the Wikipedia Reference Desk was as good as a librarian's answer, an amateur did not answer at the same level as an expert librarian. Answering requests in this amateur manner creates a forest of mediocrity, and, at times, the "wisdom" of the crowd, not of individuals, reaches a higher level. For a user whose request received more than four answers, sorting out the best answer becomes a time consuming task. ... The quality of an individual message did not provide answers at the same level as individual librarians do, but an aggregated answer made it as accurate as a librarian's answer (pp.988–989).
Shachaf offers some ideas as to why the all-volunteer Wikipedia Reference Desk service might work as well as library reference services, including the possibilities that experienced question-answerers gain practice in answering reference questions similar to professional librarians; that the wiki itself is conducive to providing collaborative question-answering services (more so than most software used for library reference); that the type of questions being asked may differ from Wikipedia to libraries; and that (according to Shachaf, the most likely possibility) the collaborative aspects of the service, where answers can be expanded on, improved and discussed, helps improve answer quality. She concludes that more research is needed into the nature of online Q&A boards staffed by volunteers.
^Shachaf, Pnina. (2009). "The paradox of expertise: is the Wikipedia Reference Desk as good as your library?." Journal of Documentation, v. 65 (6). pp. 977–996. . Not available freely online.
^Traditional library reference is understood in this context as a questioner interacting one-on-one with a professionally trained librarian, either in-person at a library reference desk, or via email/chat/phone.
^Note that it is quite difficult to determine accuracy for most reference transactions, since answers may be partially accurate or have qualitative situational differences; 55% accuracy is a standard estimate based on studies of in-person reference desk interactions in the 1980s (citation to Hernon and McClure (1986) and later analyses given by Shachaf).