Talk:Data library

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Untitled[edit]

This is a collaborative space for data services professionals to contribute commentary on this entry. Note, I read elsewhere you should sign your contributions with 4 tildas on the talk pages. ~

Would it help to add a link to the IASSIST blog discussion on data librarianship at http://iassistblog.org/?p=29  ? -Robin Rice


Data are becoming increasingly important in the world. Research studies are being duplicated because the original data was never preserved, resulting in millions of dollars being spent needlessly. Academic institutions are beginning to question whether they own the data collected in their faculty's research in the same way that they own patents arising from such research. Publishers are beginning to assert their ownership of data that form the basis of articles they publish. Data are becoming an economic commodity.

Being able to find the right data in a usable form (e.g., in digital format rather than on magnetic tape) is invaluable to researchers. That's the function of data libraries: to preserve and make available data from previous research to inform future research. Data libraries manage data with the owners' permission, providing an infrastructure to support its continued use and value. --Michele Hayslett


Those are excellent points, Michelle. I've just added a link to the Open Data article which touches on some of that. Someone else has added links to digital preservation and digital curation, both good ideas. (Not sure if that was a wikipedia editor or a data librarian?)

I've also re-jigged the Services section to try to take into account most of what was discussed on the blog (see above), and to expand some of the original points there.

Also, does anyone have any favourite references to do with data libraries? I've added IQ in general and the historical IASSIST bibliography, but it would be good to get more particular articles listed. Rcrice 18:21, 30 November 2006 (UTC) Robin Rice[reply]


Some thoughts about the entry, though I haven't yet composed any text to bring these about! One is that I think that some of the "introduction" concern could be met by a simplest-terms, lay definition of dataset. Starting with social science data, the concept of using numbers to represent individual answers to surveys, and compiling all those numbers in a computer file to be analyzed with statistical software, is not something that's obvious if one is not involved in the social sciences. Expanding the definition to include all datasets, and yet keep it at a simplest-terms level for a lay reader... well, that's an even broader challenge. Which brings me to the additional thought: how broad should the data library entry be? Is it about social science data libraries, or some category that's broader than that? Being specific on that question could help, I think, in simplifying the data-set definition question. -- Joanne Juhnke


Hi Joanne - Data set is actually another wikipedia entry. It has been linked under Services, but you'll see that it's a pretty unsatisfactory definition and is now designated as disputed. I added a social science-type def in the talk section, but haven't tried to edit the entry at all. Someone has linked the first occurence of dataset in our entry to an LIS reference book definition, which is better. Would be good to have an authoritative definition from our field, perhaps from the ICPSR summer school data library course? Regarding your question about the scope of the data library entry - I think what is written under What is a Data Library is fairly complete and not limited to social sciences necessarily. Sometimes the term is used for things that are not organisations, such as Stata data library, but these are more like software code libraries, which are not libraries at all in the usual sense. It would be good to get to the point where we have satisfied the wiki editors that there is sufficient context and evidence of its importance - don't know what that will take! I hope they are monitoring progress. Rcrice 14:08, 3 December 2006 (UTC)Robin Rice[reply]


From an ARL press release, 11/14/2006, NSF and ARL Team on Groundbreaking Workshop on Digital Data Stewardship Final Report Now Available (http://www.arl.org/arl/pr/standtesttimesdsc.html):

"Research and education communities are straining under a deluge of data." said Berman, "The preservation of this data and its transformation into useful and usable information is critical for new discovery in virtually every community...The final workshop report, 'To Stand the Test of Time: Long-Term Stewardship of Digital Data Sets in Science and Engineering' (Washington, DC: ARL, 2006) provides a wealth of information on the issues of digital preservation and a sourcebook of related reports. The report is now available on the ARL Web site (http://www.arl.org/info/frn/other/ottoc.html)." (emphasis added)

Also,

from Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century (Pre-publication Draft Approved by the National Science Board May 26, 2005, subject to final editorial changes) (http://www.nsf.gov/nsb/documents/2005/LLDDC_report.pdf).

"It is exceedingly rare that fundamentally new approaches to research and education arise.

Information technology has ushered in such a fundamental change. Digital data collections are at the heart of this change. They enable analysis at unprecedented levels of accuracy and sophistication and provide novel insights through innovative information integration. Through their very size and complexity, such digital collections provide new phenomena for study. At the same time, such collections are a powerful force for inclusion, removing barriers to participation at all ages and levels of education...


• Long-lived digital data collections are powerful catalysts for progress and for democratization of science and education. Proper stewardship of research requires effective policy in order to maximize their potential.
• The need for digital collections is increasing rapidly, driven by the exponential increase in the volume of digital information..."

--Michele Hayslett (NCSU), 9:41, 5 Dec 2006.