Data curation is a term used to indicate management activities related to organization and integration of data collected from various sources, annotation of the data, and publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for reuse and preservation. Data curation includes "all the processes needed for principled and controlled data creation, maintenance, and management, together with the capacity to add value to data". In science, data curation may indicate the process of extraction of important information from scientific texts, such as research articles by experts, to be converted into an electronic format, such as an entry of a biological database. The term is also used in the humanities, where increasing cultural and scholarly data from digital humanities projects requires the expertise and analytical practices of data curation. In broad terms, curation means a range of activities and processes done to create, manage, maintain, and validate a component.
Definition and practice
According to the University of Illinois' Graduate School of Library and Information Science, "Data curation is the active and on-going management of data through its lifecycle of interest and usefulness to scholarship, science, and education; curation activities enable data discovery and retrieval, maintain quality, add value, and provide for re-use over time."
Deep background on data libraries appeared in a 1982 issue of the Illinois journal, Library Trends. For historical background on the data archive movement, see "Social Scientific Information Needs for Numeric Data: The Evolution of the International Data Archive Infrastructure."
This term is sometimes used in context of biological databases, where specific biological information is firstly obtained from a range of research articles and then stored within a specific category of database. For instance, information about anti-depressant drugs can be obtained from various sources and, after checking whether they are available as a database or not, they are saved under a drug's database's anti-depressive category. Enterprises are also utilizing data curation within their operational and strategic processes to ensure data quality and accuracy.
Projects and studies
The Dissemination Information Packages (DIPS) for Information Reuse (DIPIR) project is studying research data produced and used by quantitative social scientists, archaeologists, and zoologists. The intended audience is researchers who use secondary data and the digital curators, digital repository managers, data center staff, and others who collect, manage, and store digital information.
- Data archaeology
- Data degradation
- Data format management
- Data governance
- Data management
- Data stewardship
- Data wrangling, low-level activities to parse and reformat data
- Informationist, an individual with extensive industry expertise, acute familiarity with organizational structures and processes, deep domain level information mastery and information systems technical savvy
- Renée J. Miller, “Big Data Curation” in 20th International Conference on Management of Data (COMAD) 2014, Hyderabad, India, December 17-19, 2014
- Bio creative Glossary at http://biocreative.sourceforge.net/biocreative_glossary.html
- "An Introduction to Humanities Data Curation" by Julia Flanders and Trevor Muñoz http://guide.dhcuration.org/intro/
- Pilin Glossary at http://www.pilin.net.au/Project_Documents/Glossary.htm
- Cragin, Melissa; Heidorn, P. Bryan; Palmer, Carole L.; Smith, Linda C. (2007). "An Educational Program on Data Curation". ALA Science & Technology Section Conference. Retrieved 7 October 2013.
- Heim, Kathleen M. (editor), Library Trends 30 (3) Winter 1982: Data Libraries for the Social Sciences. Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign.
- Kathleen M. Heim, "Social Scientific Information Needs for Numeric Data: The Evolution of the International Data Archive Infrastructure." in Collection Management 9 (Spring 1987): 1-53.
- E. Curry, A. Freitas, and S. O’Riáin, “The Role of Community-Driven Data Curation for Enterprises,” in Linking Enterprise Data, D. Wood, Ed. Boston, MA: Springer US, 2010, pp. 25-47. ISBN 978-1-4419-7664-2
- Dissemination Information Packages for Information Reuse (DIPIR) project http://www.oclc.org/research/themes/user-studies/dipir.html
- Curation of ecological and environmental data: DataONE
- Data management tools and services spanning multiple scientific disciplines: DataConservancy