Google Search Appliance
||This article appears to be written like an advertisement. (August 2010)|
The Google Search Appliance is a rack-mounted device providing document indexing functionality that can be integrated into an intranet, document management system or web site using a Google Search-like interface for end-user retrieval of results. The operating system is based on CentOS. The software is produced by Google and the hardware is manufactured by Dell and current generations are based on Dell's PowerEdge R710.
The device is supplied in two models: a 2U model (GB-7007) capable of indexing up to 10 million documents, and a 5U (2U plus 3U storage) model (GB-9009) that is capable of indexing up to 30 million documents. Later versions of the software allow the connecting of multiple appliances to offer searching "millions or billions" of documents. Sales are operated on a licensing scheme which starts as a two-year contract for maintenance, support and software updates.
The Google Search Appliance contains Google search technologies and a means of configuring and customizing the appliance. The appliance also comes with a T-shirt.
Other features include
- Support for Google Analytics and Google Sitemaps functionality
- Search capabilities that include searching web content, other file types (e.g. html, pdf, office documents), databases (Oracle, MySQL, Microsoft SQL Server, IBM DB2, Sybase) and content management systems (EMC Documentum, FileNet, Open Text LiveLink, Microsoft SharePoint)
- Indexing (crawling) of search-able content, configured by specifying URLs to crawl. Search patterns can also be included to limit the information that is being searched and searching can be customized by using the OneBox API
- Result sets displayed with a Google-like appearance. The default behavior can be customized by using XSL Transformations
- Keywords that return specific results when specific keywords are used. Example: Associate Cell Phone with http://SampleCellProvider.com so whenever someone searches for cell phone your link will appear at the top of the search no matter where it would normally appear in the result set
- Synonyms that give alternate terms for your search. E.g. when user types “cell phone”. Search will add suggestions e.g. “mobile phone” to the result set
- Cached results, where each result item will include a "cached" link next to each result item. By clicking on the user will be able to view an HTML version of the page / document which means that the actual document does not need to be opened
- Statistics in result sets containing number of results returned, duration of search, document title, url of document, date modified.
- Highlighted search terms to show search hits and allows you to see words in context without having to open documents.
- Grouping similar results to hide duplicates.
- Specifying document types in result sets
- Sorting of result sets by date or relevance
- Multiple appliances can be linked together to scale to billions of documents.
- Physical hardware can be distributed across multiple locations.
Minimal support infrastructure and sysadmin staff is needed as quoted on their web site “…doesn’t need a tech support baby-sitter. You simply plug it in, configure it, and let it run…”. The device does come with a web based administrative console that can be used to make configuration changes where needed. Additional customisation is possible through a Representational State Transfer (REST) API that allows for automation of tasks. There are also existing modules that can be used for customization.
Software version 6.0 was released in June, 2009. This software runs on some hardware versions of the GB-1001 model (all units with an "S5" prefix in their "Appliance ID"), and all GB-7007 and GB-9009 models. New features available in this software include:
- Customized and enhanced relevancy tuning to bias certain nodes’ and collections’ results.
- Administration APIs for .net and Java programmers to automate tasks
- Early binding to increase serving performance.
- Customization in SAML authentication and Authorization.
- Added user results to search results.
- Search-as-you-Type functionality.
- Query translation to 40 different languages.
- Replication of search results.
- Clustering multiple GSAs by using a new technology called (GSA)n makes it possible to index up to 1 billion documents.
The Google Search Appliance can be purchased in two separate versions based on the number of documents being indexed. Model G100, a 2U appliance, can index up to 20,000,000 documents. The G500 5U appliance can index up to 100,000,000 documents.
Google used to sell a 1U appliance (GB-1001) capable of indexing up to 5,000,000 documents, a half-rack cluster (GB-5005) of five 2U nodes capable of indexing up to 10,000,000 documents, and a full-rack cluster (GB-8008) of eight and later twelve nodes capable of indexing up to 30,000,000 documents. Some models were based on Dell PowerEdge 2950 2U rackmount servers.
The Google Mini was a smaller and lower-cost solution for small and medium-sized businesses to set up a search engine that allowed them to index and search up to 300,000 documents. As part of Google's spring cleaning 2012 the Google Mini was discontinued beginning July 31, 2012.
Google Search Appliance virtual edition for developers
For a brief period in 2008 Google offered a virtual version of the Google Search Appliance aimed at developers. The virtual edition could be downloaded free of charge and index up to 50,000 documents. It was soon discontinued for unknown reasons.
The Google Search Appliance is available in the United States, Canada, Europe, Japan, parts of Asia, the Middle East, North Africa and South America. If a person is interested in using the Google Search Appliance in another region, they can deploy the Google Search Appliance at a location or data center in the US, Canada, or Europe.
Even though Google search and Google Search Appliances have proven to have many advantages for organizations implementing them, some business analysts have suggested that Google Search Appliances may introduce two risks: for breaching the privacy acts, and for exposing the organization to commercial security risks.
- Computerworld - Google Releases New Versions of Its Search Appliance
- Information Week - Google Connects Search Appliances for Billion Document Indexing
- "CAST42". Retrieved 2008-05-14.
- Google Mini Homepage
- Google official blog
- The official Google Code blog
- Short Analysis of advantages and risks in implementing Google Search Appliances inside Government agencies
- Official web site for the Google Search Appliance
- Official web site for the Google Mini
- Documentation for the Google Search Appliance
- Review at SearchTools Analysis
- InfoWorld Review of Google Search Appliance, ISYS:web 8
- Example of an online Google Appliance - At MIT