User:Daniel Mietchen/Talks/Naturalis hackathon 2014

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Wikimedia projects, open science and biodiversity information

Open science[edit]


Sharing research with the world as soon as it is recorded.


What have others said on the topic?

I want publishers to publish my workflows. — Philip E. Bourne

...thinking outside the paper doesn't come naturally! --David De Roure

what I think we need in scholarship is the web, but editable — Peter Sefton

There is but one journal: The scientific literature. — Richard Gordon

Science is already a wiki if you look at it a certain way. It’s just a highly inefficient one — the incremental edits are made in papers instead of wikispace, and significant effort is expended to recapitulate existing knowledge in a paper in order to support the one to three new assertions made in any one paper. — John Wilbanks. Illustration: papers and wikispace.

What if everyone in the world were in your lab – a ‘hive mind’ of sorts, but composed of countless creative intellects rather than mindless worker ants, and one in which resources, reagents and effort could be shared, along with ideas, in a manner not dictated by institutional and geographical constraints? — Chris Patil and Vivian Siegel

Somewhere at the fringe of science, someone will start using wiki publishing for science publishing. — John Schmidt (2006)

Whenever danger exists of unnecessarily duplicating efforts to solve problems, ought not scientists try to discover whether the experiments have been performed elsewhere? Ought not all scientists be concerned about rapid publication and wide distribution of results, and even of experiments under way, so as to avoid waste? When a scientist in one field discovers evidence of methods which he cannot use but which may be useful in other fields, ought he not inform others about it? If a new and better technique has been discovered in one field, ought not scientists in other fields investigate its workability, or adaptability, in their fields? When a newly confirmed discovery in one field implies need for revising assumptions or conclusions in another field, do not scientists in the one field have a duty to publicize it and scientists in the other field a duty to hasten to inform themselves about it? — Archie J. Bahm

Better still, if you assert something said in another paper, sod the citation, transclude the relevant text, with a full electronic citation allowing you to verify it. — Christopher Gutteridge

The current Open Access model is provisioning for legacy genres and formats of scholarly communication. That's great for archival purposes, but this is not the next real destination for scholarly discourse. Why? Because consequential intellectual work takes place in myriad ways outside of traditional scholarly genres, that's why, and the digital realm is ready to capture, organize, value, and disseminate those other ways of generating knowledge. — Gideon Burton

The internet allows for a much more powerful system than the current journal system, much more powerful than even an open journal system.
Some things I'd like to see in a unified online open system
  • Hyperlinking between papers
  • Discussion threads for papers
  • Collaborative mark ups of papers, so that difficult papers can be communally dissected and fleshed out, or so that students can work through a paper and provide a mark up to ease the reading for other students
  • An ongoing wiki for every subfield, detailing current outstanding problems, papers to read to get up to speed, most recent progress, etc, as well as curating accepted knowledge. Wikis should also be able to be marked up by students, so that difficult material can be broken down and fleshed out for the sake of other students. — TheEzEzz

So, we had the idea that you do your systematic review before you do your research; you do your research, and then if you haven't changed much, you haven't really made a big impact, whereas if you've actually shifted things one way or the other and made it more precise then you have. — Elizabeth Wager

What I wonder is why professors don't curate [pages on] Wikipedia and add course materials and open access sections of textbooks, much of which they post online anyways. We aren't really seeing the potential that you would hope for with all of the Web 2.0 tools out there. We aren't seeing the academic community take advantage of them as much as other subsets of the community. — David Lipman

The Internet represents an opportunity to change this system, one which has created a 300-year-old, collective long-term memory, into something new and more efficient, perhaps adding in a current, collective short-term working memory at the same time. With new online tools, scientists could begin to share techniques, data and ideas online to the benefit of all parties, and the public at large. — Robert J. Simpson, paraphrasing Michael Nielsen

While scientists have gloried in the disruptive effect that the Web is having on publishers and libraries, with many fields strongly pushing open publication models, we are much more resistant to letting it be a disruptive force in the practice of our disciplines. — James Hendler

Technological revolutions are a privileged moment in which old customs and legitimations may be put under scrutiny. The world of academy should not risk at missing this opportunity to rethink about the fundamental aims and responsibilities of our profession. — Gloria Origgi

I’m feel­ing frus­trated. What else can you feel when the sys­tem is bro­ken, you know that sys­tem must change, but there is lit­tle incen­tive for those per­pet­u­at­ing the sys­tem to change it for the better. — Steven Bell

Wikipedia is probably the most robust Petri dish we have for actually studying the process of words and contributions, because it is auditable. — Peter Frishauf

How cool would it be to fork articles, a la Github. - Jason Priem

What can you do with what you know? — Dale Dougherty

Healthcare is one of the areas where open data will potentially take off soonest and have the biggest impact. — Tim O'Reilly

Reality today[edit]

Adélie penguins are identified and weighed each time they cross the automated weighbridge on their way to and from the sea. The data are recorded but not made public, not even in or alongside the "publication".
  • Two videos providing the basis for an article[2] but not published along with it.
A water droplet surviving an attempt to be cut by a knife. A water droplet cut by a knife.
  • Inconsistent XML as a Barrier to Reuse of Open Access Content[3]

How things could be[edit]

A specification anyone can edit:

  1. Dynamics: Research is a process. The scientific journal of the future provides a platform for continuous and rapid publishing of workflows and other information pertaining to a research project, and for updating any such content by its original authors or collaboratively by relevant communities. Eventually, all scientific records should have a public version history or a public justification for not having one.
    The research cycle
    • example
      • Version of record
      • Updates automatically
      • Editable format (SVG)
      • Data and code on GitHub under open licenses
    • PLOS Computational Biology Topic Pages (list)
    • The workflows include writing of research documents, as piloted by the Biodiversity Data Journal.[4]
  2. Scope: Data come in many different formats. The scientific journal of the future interoperates with databases and ontologies by way of open standards and concentrates on the contextualization of knowledge newly acquired through research, without limiting its scope in terms of topic or methodology.
    Paratype of the centipede Eupolybothrus cavernicolus, a species described[5] with its full transcriptome, micro CT and DNA barcoding, in addition to a morphological description.
  3. Access: Free access to scientific knowledge, and permissions to re-use and re-purpose it, are an invaluable source for research, innovation and education. The scientific journal of the future provides legally and technically barrier-free access to its contents, along with options for re-use and re-purposing that are stated clearly for both humans and machines.
    This image of Xanthichthys ringens is sourced from an open-access scholarly article[6] licensed for re-use (details).
  4. Replicability: Open access to all relevant core elements of a publication facilitates the verification and subsequent re-use of published content. The scientific journal of the future requires the publication of detailed methodologies — including all data and code — that form the basis of any research project.
    NIH plans to enhance reproducibility[7]
  5. Review: The critical, transparent and impartial examination of information submitted by the professional community enhances the quality of publications. The scientific journal of the future supports post-publication peer review, and qualified reviews of submitted content shall always be made public.
  6. Presentation: Digitization opens up new opportunities to provide content, such as through semantic and multimedia enrichment. The scientific journal of the future adheres to open Web standards and creates a framework in which the technological possibilities of digital media can be exploited by authors, readers and machines alike, and content remains continuously linkable.
    Demo of interactive taxonomic paper from ZooKeys. Courtesy of Rod Page.
  7. Transparency: Disclosure of conflicts of interest creates transparency. The scientific journal of the future promotes transparency by requiring its editorial board, the editors and the authors to disclose both existing and potential conflicts of interest with respect to a publication and to make explicit their contributions to any publication.
  8. Sustainability: Resources are limited. Ecological considerations are reflected in the design and production of the scientific journal of the future.
  9. Flexibility: Innovation is stifled by inflexible rules. Exceptions to the above rules are possible if justified in public.

Future of publishing[edit]


Wikimedia projects[edit]


Key policies[edit]

Open knowledge[edit]

Semantic approaches[edit]

MediaWiki API[edit]

Data dumps[edit]



See also wikidata:Wikidata:Tools/External tools.





  1. ^ Lescroël, A. L.; Ballard, G.; Grémillet, D.; Authier, M.; Ainley, D. G. (2014). Descamps, Sébastien (ed.). "Antarctic Climate Change: Extreme Events Disrupt Plastic Phenotypic Response in Adélie Penguins". PLoS ONE. 9 (1): e85291. doi:10.1371/journal.pone.0085291. PMC 3906005. PMID 24489657.
  2. ^ Yanashima, R.; García, A. A.; Aldridge, J.; Weiss, N.; Hayes, M. A.; Andrews, J. H. (2012). Docoslis, Aristides (ed.). "Cutting a Drop of Water Pinned by Wire Loops Using a Superhydrophobic Surface and Knife". PLoS ONE. 7 (9): e45893. doi:10.1371/journal.pone.0045893. PMC 3454355. PMID 23029297.
  3. ^ Mietchen, D.; Maloney, C.; and Moskopp, N. D. (2013) Inconsistent XML as a Barrier to Reuse of Open Access Content. Journal Article Tag Suite Conference (JATS-Con) Proceedings 2013.
  4. ^ Smith, V.; Georgiev, T.; Stoev, P.; Biserkov, J.; Miller, J.; Livermore, L.; Baker, E.; Mietchen, D.; Couvreur, T. L. P.; Mueller, G.; Dikow, T.; Helgen, K. M.; Frank, J. I.; Agosti, D.; Roberts, D.; Penev, L. (2013). "Beyond dead trees: integrating the scientific process in the Biodiversity Data Journal". Biodiversity Data Journal. 1: e995. doi:10.3897/BDJ.1.e995.
  5. ^ Stoev, P.; Komerički, A.; Akkari, N.; Liu, S.; Zhou, X.; Weigand, A. M.; Hostens, J.; Hunter, C. I.; Edmunds, S. C.; Porco, D.; Zapparoli, M.; Georgiev, T.; Mietchen, D.; Roberts, D.; Faulwetter, S.; Smith, V.; Penev, L. (2013). "Eupolybothrus cavernicolus Komerički & Stoev sp. N. (Chilopoda: Lithobiomorpha: Lithobiidae): The first eukaryotic species description combining transcriptomic, DNA barcoding and micro-CT imaging data". Biodiversity Data Journal. 1: e1013. doi:10.3897/BDJ.1.e1013.
  6. ^ Williams, J. T.; Carpenter, K. E.; Van Tassell, J. L.; Hoetjes, P.; Toller, W.; Etnoyer, P.; Smith, M. (2010). Gratwicke, Brian (ed.). "Biodiversity Assessment of the Fishes of Saba Bank Atoll, Netherlands Antilles". PLoS ONE. 5 (5): e10676. doi:10.1371/journal.pone.0010676. PMC 2873961. PMID 20505760. CC0 full text media metadata
  7. ^ Collins, F. S.; Tabak, L. A. (2014). "Policy: NIH plans to enhance reproducibility". Nature. 505 (7485): 612. doi:10.1038/505612a. PMC 4058759. PMID 24482835.
  8. ^ "Reviewer's Comments". Journal of Applied Behavior Analysis. 7 (3): 497–1453. 1974. doi:10.1901/jaba.1974.7-497b.
  9. ^ Mietchen, D.; Keupp, H.; Manz, B.; Volke, F. (2005). "Non-invasive diagnostics in fossils - Magnetic Resonance Imaging of pathological belemnites". Biogeosciences. 2 (2): 133. doi:10.5194/bg-2-133-2005.


This talk was given on March 18, 2014 at the Data enrichment hackathon at Naturalis in Leiden.