List of SIMILE projects

From Wikipedia, the free encyclopedia
Jump to: navigation, search

The following is a list of SIMILE projects.

The SIMILE tools assist in the storage, querying, transformation and mapping of very large collections of RDF data. The tools developed within SIMILE are meant to allow people who are not Semantic Web developers to create ontologies which describe their specialized metadata, create RDF and convert other types of metadata into RDF. These open source tools are designed to be scalable and provide for cross-community sharing of metadata at low cost.


Longwell is a faceted browser which enables the user to visualize and browse any RDF data set, allowing the user to quickly build a user-friendly web site out of the RDF data without requiring the user to write any RDF code. Facets are metadata fields considered important for a given data set. In its default configuration, the collection of facets is returned along the right-hand side of the page, and clicking on any facet causes the refinement of facets in relation to the data retrieved. Longwell then displays only the subset of the data which meet those restrictions. This appears on the left-hand side of the page. Previously selected restrictions can be removed, which causes a broadening of the subset of items displayed.

Piggy Bank[edit]

Piggy Bank is a Firefox extension which enables the user to collect information from the Web, save it for future use, tag it with keywords, search and browse information collected, retrieve saved information, share collected information and install screen scrapers. Piggy Bank gathers RDF data where it is available, and where it is not available, it generates it from HTML by using screen scrapers. This incremental approach to the realization of the Semantic Web vision allows the user to save and tag information gathered from web pages without having to cut, paste and label the various products of their browsing. By clicking on the keyword they have used to tag particular types of item, the user can view all of those items together within her browser, without having to open other applications. Users can also deposit saved data in the Semantic Bank, where other users can browse it and add their own contributions. This pooling of keywords underlies services such as Flickr and, where communities can collaborate to build a taxonomy for shared data. These taxonomies, which emerge as information is accumulated, are known as folksonomies.


Solvent is a Firefox extension that enables the user to write screen scrapers for Piggy Bank.


Gadget is an XML inspector which enables the user to condense large amounts of well-formed XML data.


Welkin is a graph-based RDF visualizer. It graphs RDF data sets, allowing the user to visualize the global shape and clustering characteristics of the data, which can aid them in mentally modeling it, seeing how it connects and identifying mappings between the set and possible ontologies. A particular data cluster which stands out when graphed might well be missed when browsed at closer range.


Fresnel is a vocabulary for specifying how RDF graphs are presented. Fresnel addresses the problem that currently, each RDF browser and visualization tool decides, on an ad hoc basis, what information in an RDF graph is presented and how to present it. Fresnel uses the concepts of lenses and formats. Lenses determine which properties are displayed and how they are ordered. Formats control how resources and properties are presented.


Timeline is a tool for visualizing events over time. It can be populated by pointing it at an XML file


Exhibit is technology that enables developers to provide browsing of faceted classifications in a web browser.


Referee is a program that crawls the links that point to its user's pages. It extracts metadata from those pages and the text around the links that pointed to its user's pages, converting it, if need be, into RDF format. Referee discriminates between the pages that refer to the user's pages and the comments, meaning the text immediately surrounding the link. It generates a data graph, allowing it to display the fact that, for example, exactly the same comment in relation to its user's pages appears on more than one page, which is the container of the comment. A page can have more than one comment, and a comment can appear on more than one page. This can be illustrated in a data graph, but would not be possible with a data tree, such as is generated by the XML data model.


The RDFizer project is a directory of tools for converting various data formats into RDF. MIT Libraries provides a home for some of these tools. RDFizers are a group of tools that allows the transformation of existing data into an RDF representation. Given a database of interest, these tools can often - when the data formats are highly structured -convert the data into an RDF representation without human intervention, first determining what ontology to use to express the information. Where semantic relationships are implicit, the RDFizers will not be as successful without human input. The SIMILE project has built RDFizers that convert from the following formats:

  • JPEG Joint Photographic Experts Group (Digital Photo-METADATA).
  • MARC United States Library of Congress MAchine-Readable Cataloging of bibliographic data.
  • MODS Metadata Object Description Schema for bibliographic element sets.
  • OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting.
  • OCW Open Course Ware
  • EMail
  • BibTeX a tool for formatting lists of references usually associated with LaTex documents.
  • Flat
  • Weather
  • Java is an object-oriented applications programming language
  • Javadoc tool for generating API documentation into HTML format from Java source code.
  • Subversion or SVN is a software revision control system.
  • Random


Crowbar is a web scraping environment based on the use of a server-side headless Mozilla-based browser. It is used as a research prototype to investigate how to enable the running of Piggy Bank Javascript scrapers from the command line and thus automate web site scraping.