Lucene Geographic and Temporal (LGTE) is an information retrieval tool developed at Technical University of Lisbon which can be used as a search engine or as evaluation system for information retrieval techniques for research purposes. The first implementation powered by LGTE was the search engine of DIGMAP, a project co-funded by the community programme eContentplus between 2006 and 2008, which was aimed to provide services available on the web over old digitized maps from a group of partners over Europe including several National Libraries.
The tool LGTE is built in Java Programming Language around the Lucene library for full-text search and introduces several extensions for dealing with geographical and temporal information. The package also includes utilities for information retrieval evaluation, such as classes for handling CLEF/TREC (Cross Language Evaluation Forúm/Text Retrieval Conference) topics and document collections.
Technically LGTE is a layer on the top of Lucene and provides an extended Lucene API to integrate several services like snippets generation, query expansion, and many others. The LGTE provides the chance to implement new probabilistic models. The API depends on a set of modifications at the Lucene level, originally created by the researchers of the University of Amsterdam in a software tool named Lucene-lm developed by the group of Information and Language Processing Systems (ILPS). At the time, the tool was tested with success for the Okapi BM25 model, and a multinomial language model, but also includes divergence from randomness models.
The LGTE 1.1.9 and later versions also provide the possibility to isolate the index fields in different index folders. Another recent feature is the configuration of Hierarchic Indexes using foreign key fields. This gives the chance to create scores for example based on the text of the sentence combined with the general score of the entire page.
- Provides Isolated Fields using different folders
- Provides Hierarchic indexes through foreign key fields
- Provides classes to parse documents using Yahoo PlaceMaker
- Provides a simple and effective abstraction layer on top of Lucene
- Supports integrated retrieval and ranking with basis on thematic, temporal and geographical aspects.
- Supports the Lucene standard retrieval model, as well as the more advanced probabilistic retrieval approaches.
- Supports Rochio Query Expansion.
- Provides a framework for IR evaluation experiments (e.g. handling CLEF/TREC topics).
- Includes a Java alternative to the trec_eval tool, capable of performing significance tests over pairs of runs.
- Includes a simple test application for searching over the Braun Corpus or the Cranfield Corpus.
- Jorge Machado, Bruno Martins, José Borbinha, Gilberto Pedrosa "LGTE: Sistema aberto de Recuperação de Informação Textual, Geográfica e Temporal",II JORNADAS SASIG, Évora, 2-4 November 2009.
- Jorge Machado, Bruno Martins, José Borbinha "Experiments with N-Gram Prefixes on a Multinomial Language Model versus Lucene’s off-the-shelf ranking scheme and Rocchio Query Expansion (TEL@CLEF Monolingual Task)", European Conference of Digital Libraries/Cross Language Evaluation Forum of Cross Language Evaluation Forum, Corfu Greece, 2009.
- Jorge Machado, Gilberto Pedrosa, José Borbinha "LGTE: Lucene Extensions for Geo-Temporal Information Retrieval", European Conference on Information Retrieval/Workshop for Geographic Information on the Internet, Toulouse, 2009
- Jorge Machado, Gilberto Pedrosa, José Borbinha "User interface for a geo-temporal search service using DIGMAP components", in Springer LNCS proceedings of European Conference of Digital Libraries, Corfu Greece, 2009.
- Jorge Machado, Gilberto Pedrosa, José Borbinha "Experiments on a Multinomial Language Model versus Lucene’s off-the-shelf ranking scheme and Rochio Query Expansion (TEL@CLEF Monolingual Task) ", European Conference of Digital Libraries/in Springer LNCS proceedings of Cross Language Evaluation Forum, Ahrus, 2008.