From Wikipedia, the free encyclopedia
Jump to: navigation, search
Web address
Commercial? No
Type of site
Available in English
Launched 2012
Alexa rank
positive decrease 11,291,062 (December 2013)[1]
Current status Offline

SciDiver is an academic paper search engine for the physical sciences. The service currently maintains an index over arXiv, the preprint service for mathematics, physics, astronomy, computer science, quantitative finance and related disciplines; expansion to additional repositories is expected in the course of the site's continued development.


SciDiver makes searchable more of the structured content within scientific literature by retaining additional information from the source paper, including figures and tables, their relationship to associated captions, paper metadata, and section headings. The interface offers an option to search captions separately from the main text; relevant figures are then displayed directly in the browser. Individual hits of interest can be further explored to reveal other content from the paper including remaining figures and captions, abstract text and keyword information.

Pseudo Natural Language search[edit]

SciDiver offers a 'near natural language' search interface in which you can us the '/' character to indicate two or words should be semantically linked in the text.


The SciDiver search engine was developed by Dr Andrew Guzman to explore ideas in how scientific information could be made more quickly and accurately available. Search to the level of a scientific paper or its abstract is useful but taking a user to the relevant section(s) or figure(s) within a paper can save time. Natural language or semantic search engines already exist but composing natural language queries can be difficult and can leave the query too specific. The familiar bag of words query interface was extended to include a semantic relationship designator symbol '/' to allow a user to specify an explicit relationship between terms in the query. The inclusion of the extension to the search syntax makes the document induction process more expensive as text has to be parsed and parts of speech tagged before being indexed however indexing only semantically relevant relations between words leaves query times in line with standard inverted index search systems.


  1. ^ " Site Info". Alexa Internet. Retrieved 2013-12-01. 

External links[edit]