Jump to content

Windows Search

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 221.128.147.172 (talk) at 11:08, 28 March 2008 (+ EFS support in WS4! and separated previous plans of WLSS and later WS4 announcement into separate paras.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Windows Search (known as Windows Desktop Search or WDS on Windows XP and Windows Server 2003) is an indexed desktop search platform released by Microsoft for the Windows operating system.[1] Windows Search for Windows Vista and Windows Server 2008 (also referred as Instant Search)[2] is a successor of the Windows Indexing Service, a remnant of the Object File System feature of the Cairo project which never materialized. Windows Search uses a different architecture and a new indexer compared to Indexing Service. For Windows XP, Windows Desktop Search is available as an add-on application.

Windows Search collectively refers to both Indexed Search on Windows Vista and WDS on Windows XP. They not only share a common architecture and indexing technology,[1] but also are API-compatible with one another.

Overview

Upon installation, Windows Search (and Windows Desktop Search) builds an index of the files on a user's hard drive. The initial creation of this index can take up to several hours, but this is a one-time event. Once the indexing is complete, Windows Search is able to use this index to search results more rapidly than it would take to search through all the files on the computer in real time. Searches are performed not only on file names, but also on the contents of the file (provided a proper handler for the file type is installed) as well as the keywords, comments and metadata the file might be tagged with. For instance, searching the computer for The Beatles would return a list of the Beatles music on the computer, as well as any e-mails and documents that include the phrase "The Beatles" in their titles or contents. Windows Search also features word-wheeled search (or search-as-you-type). It begins searching as soon as characters are entered in the search box, and keeps on refining and filtering the search results as more characters are typed in. As an advantage, this results in finding the required files even before the full search text is entered.

Windows Search supports IFilters, which are a set of interfaces that can be implemented for any file format. Once a file format has an associated IFilter, the IFilter is used to extract the text from files in that format.[3] Windows Search by default includes handlers for common filetypes, including Word documents, Excel spreadsheets, PowerPoint presentations, HTML documents, text files, MP3 and WMA music files, WMV, ASF and AVI videos, JPEG, BMP and PNG images, among others.[4] It uses property handlers to handle metadata from file formats. A property handler needs a property description and a schema for the property for WDS to index the metadata.[5] Protocol handlers are used for indexing specific data stores. For example, files are accessed using File System Protocol Handler, Outlook datastores using the Outlook Protocol Handler and IE cache using the IE History/Cache Protocol Handler.[6] Network shares can be added to the index by installing proper property-handlers.[7]

Architecture

Windows Search is implemented as a Windows Service which implements the Windows Search runtime and APIs, as well as acting as host for the index stores and controlling the components. The most important component of Windows Search is the Indexer, which crawls the file system periodically and creates and maintains the index of the data. It achieves this using three processes:[8]

  1. SearchIndexer.exe, which hosts the indexes and the list of URIs that require indexing, as well as exposes the external APIs that other applications use to leverage the Windows Search features.
  2. SearchProtocolHost.exe, which hosts the protocol handler. It runs with the least permission required for the protocol handler. For example, when accessing filesystem, it runs with the credentials of the system account, but on accessing network shares, it runs with the credentials of the user.
  3. SearchFilterHost.exe, which hosts the IFilter and property handlers to extract metadata and textual content. It is a low integrity process, which means that it does not have any permission to change the system settings. So, even if it encounters files with malicious content, and by any chance if they manage to take over the process, they will not be able to change any system settings.

The Indexer consists of two components, the Gatherer and the Merger,[9] the Gatherer retrieves the list of URIs that need to be crawled and invokes proper protocol handler to access the store that hosts the URI, and then the proper Property-handler (to extract metadata) and IFilter to extract the document text. Different indices are created during different runs; it is the job of the Merger to periodically merge the indices.[9] While indexing, the indices are generally maintained in-memory and the flushed to disk after a merge to reduce disk I/O. The metadata is stored in property stores, which is a database maintained by the ESE database engine.[9] The text is stored in a custom database built using Inverted Indices.[9] Apart from the indices and property store, another persistent data structure is maintained: the Gather Queue.[9] The Gather Queue maintains a FIFO queue the list of URIs that needs indexing; the Gatherer reads the list of this queue. The Indexer also includes another component, which is the Resource Monitor. It monitors the available resources, and controls the indexer. It has three states:[9]

Windows Search architecture
  1. Running: In this state, the indexer runs without any restrictions. The indexer runs in this state only when there is no contention for resources.
  2. Throttled: In this state, the crawling of URIs and extraction of text and metadata is deliberately throttled, so that the number of operations per minute are kept under a tight control. The indexer is in this state when there is contention for resources, for example, when other applications are running. By throttling the operations, it is ensured that the other operations are not starved off resources they might need.
  3. Backed off: In this state, no indexing is done. Only the Gather Queues are kept active so that items do not go unindexed. This state is activated on extreme resource shortage (less than 5 MB of RAM or 200 MB of disk space), or if indexing is configured to be disabled when the computer is on battery power, or if the indexer is manually paused.

Advanced Query Syntax

Windows Search queries are specifed in Advanced Query Syntax (AQS) which supports not only simple text searches but provides advanced query operations as well.[10] AQS defines certain keywords which can be used to refine the search query, such as specifying boolean operations on searched terms (AND, OR, NOT) as well as to specify further filters based on file metadata or file type. It can also be used to limit results from specific information stores like regular files, offline files cache, or email stores. File type specific operators are available as well.[11] WDS also supports wildcard searches.[12] It also includes several SQL-like operators like GROUP BY.

Programmability

The Windows Search index can be accessed programmatically using both managed as well as native code.[13] Native code connects to the index catalog by using a Data Source Object retrieved from the shell's Indexing Service OLE DB provider. Managed code use the MSIDXS ADO.NET provider with the index catalog name. A catalog on a remote machine can also be specified using a UNC path. The criteria for the search is specified using SQL, though some operators are restricted. The SQL query can either be created by hand, or by using an implementation of the ISearchQueryHelper interface. Windows Search provides implementations of the interface to convert an AQS or NQS queries to their SQL counterpart.[14][15]

The OLE DB/SQL API implements the functionality for searching and querying across the indices and property stores. It uses a variant of SQL to represent the query in (regular SQL with certain restrictions). Results are returned as OLE DB Rowsets.[9] Whenever a query is executed, the parts of the index it used, is temporarily cached so that further searches filtering the result set need not access the disk, to improve performance. Windows Search stores its index in an internal database file named Windows.edb that exists in the \All Users\Application Data\Microsoft\Search\Data\Applications\Windows\ folder inside Documents and Settings folder in Windows XP or Users in Windows Vista. The size of the database file depends on the size of the index. For example, an index of 230,000 files is around 800 MB in size.

The default catalog is called SystemIndex and it stores all the properties of indexed items with a predefined naming pattern. For example, the name and location of documents in the system is exposed as a table with the column names System. ItemName and System. ItemURL respectively.[16] An SQL query can directly refer these tables and index catalogues and use the MSIDXS provider to run queries against them. The search index can also be used via OLE DB, using the CollatorDSO provider.[17] However, the OLE DB provider is read-only, supporting only SELECT and GROUP ON SQL statements.

Windows Search also registers a search-ms application protocol, which can be used to represent searches as URIs.[18] The search parameters and filters are encoded in the URI using AQS, or its natural language counterpart, NQS. When the URI is invoked, Windows Search (which is registered as handler for the protocol) launches the Search Explorer with the results of the search. Windows Search is currently the default handler for the protocol, but with Windows Vista SP1, third party handlers will be able to register themselves as the protocol handler, so that searches can be performed using any search engine which the user has set as default, and not just Windows Vista.

The Windows Search service provides the Notifications API component to allow applications to "push" items that need indexing to the Windows Search indexer.[9] Applications use the component to supply the URIs of the items that need to be indexed, and the URIs are written to the Gather Queue, where they are read off by the indexer. Microsoft Office Outlook 2007, as well as Microsoft Office OneNote 2007 use this ability to index the items managed by them and use Windows Search to provide the in-application searching features. The Notifications API is also used by the internal USN Journal Notifier component of Windows Search, which monitors the Change Journal in an NTFS volume to keep track of files that has changed on the volume.[19] If the file is in a location indexed by Windows Search and does not have the FANCI (File Attribute Not Content Indexed) attribute set,[9] the Windows Search service is notified of its path via the Notification API.

Windows Search Configuration APIs are used to specify the configuration settings, such as the root of the URIs that needs to be monitored, setting the frequency of crawling or viewing status information like number of items indexed or length of the gather queue or the reason for throttling the indexer.[9][20] It also exposes APIs to register protocol handlers (via the ISearchProtocol() interface, property handlers (via the IPropertyStore() interface) or IFilter implementations (via the IFilter() interface). IFilter implementations allow only extraction of text, whereas IPropertyStore allows reading as well as modifying properties, both to the file and the property store database.[9]

Windows Desktop Search
Developer(s)Microsoft
Stable release
Operating systemWindows XP/Server 2003
LicenseProprietary EULA
WebsiteWindows Desktop Search Website

Windows Desktop Search is the implementation of Windows Search for Windows XP and Windows Server 2003. It offers word wheeling of searches, specified using the Advanced Query Syntax. By default, it comes with a number of IFilters for the most common file types - documents, audio, video as well as protocol handlers for Microsoft Outlook e-mails. Other protocol handlers and IFilters can be installed as needed.

User interface

File:WDS Deskbar.PNG
Windows Desktop Search Deskbar

The WDS functionality is exposed via a taskbar mounted deskbar. It provides a text field to type the query and the results are presented in a flyout pane. It also integrates as a Windows Explorer window. On selecting a file in the Explorer window, a preview of the file is shown in the right hand side of the window, without opening the application which created the file. Web searches can be initiated from both interfaces, but that will open the browser to search the terms using the default search engine.

The WDS deskbar also has the capability to create application aliases, which are short strings which can be set to open diiferent applications. This functionality is accessed by prefixing the ! character to the predefined string. For example "!calc" opens the Windows Calculator. This feature can also be used to create shortcut for URLs, which when entered, will open the specified URL in browser. It can also be used to send parametrized information over the URL, which are used to create search aliases. For example, "w text" can be configured to search "text" in Wikipedia.

Releases

WDS was initially released as MSN Desktop Search, as a part of the MSN Toolbar suite. It was re-introduced as Windows Desktop Search with version 2, while still being distributed with MSN Toolbar Suite.

File:WDS Preview.PNG
WDS preview pane showing thumbnails of search results

For Windows 2000, Windows XP and Windows Server 2003, it came in two flavors, one for home users and the other for enterprise use. The only difference between the two was that the latter could be configured via group policy. The home edition was bundled with MSN Toolbar, while the other was available as a stand alone application. Later, when MSN Toolbar was discontinued in favor of Windows Live Toolbar, the home edition of WDS was discontinued as well.

For Windows XP and Windows Server 2003, [21] version 3.0 of Windows Desktop Search was provided as a standalone release - separate from Windows Live Toolbar. WDS 3.01 is geared for pre-Windows Vista users, hence the indexer was implemented as a Windows Service, rather than as a per-user application, so that the same index as well as a single instance of the service can be shared across all users - thereby improving performance. WDS found itself in the midst of a controversy on October 25, 2007 when WDS 3.01 was automatically pushed out and installed to Windows systems when they updated themselves via WSUS. Microsoft hasn't yet responded on the situation.[22]

Windows Search
Developer(s)Microsoft
Operating systemWindows Vista/Server 2008
LicenseProprietary EULA
WebsiteWindows Vista Features: Instant Search

Windows Search is the indexed search platform in Windows Vista and Windows Server 2008, and offers a superset of the features provided by Windows Desktop Search, while being API compatible with it. Unlike WDS, it can seamlessly search indexed as well as non-indexed locations - for indexed locations the index is used and for non-indexed locations, the property handlers and IFilters are invoked on the fly as the search is being performed. This allows for more consistent results, though, at the cost of searching speed. Windows Search uses Group Policy for centralized management.[23]

Windows Search indexes offline caches of network shares, in addition to the file systems, Microsoft Outlook e-mail stores and Microsoft OneNote stores indexed by WDS.[1] Windows Search currently cannot index removable drives. Windows Search also supports queries against a remote index. This means if the file server, on which a network file share is hosted, is running either Windows Vista or Windows Server 2008, any searches against the share will be queried against the server's index and present the results to the client system, filtering out the files the user does not have access to. This procedure is transparent to the user.[1]

Unlike WDS, though, the Windows Search indexer performs the I/O operations with low priority, the process also runs with low priority. As a result, whenever other processes require the I/O bandwidth or processor time, it is able to pre-empt the indexer, thereby significantly reducing the performance hit associated with the indexer running in the background.

Windows Search supports natural language searches; so the user can search for things like "photo taken last week" or "email sent from Dave". However, this is disabled by default.[24] Natural language search expresses the queries in Natural Query Syntax (NQS), which is the natural language equivalent of AQS.

User interface

The search functionality is exposed using the search bars in the Start menu and the upper right hand corner of Windows Explorer windows, as well as Open/Save dialog boxes. When searching from the Start menu, the results are shown in the Start menu itself, overlapping the recently used programs. From the Start menu, it is also possible to launch an application by searching for its executable image name or display name. Searching from the search bars in Explorer windows replaces the content of the current folder with the search results. The Explorer windows can also render thumbnails in the search results if a Thumbnail Handler is registered for a particular file type. It can also render enhanced previews of items in a Preview Pane without launching the default application, if the application has registered a Preview Handler. This can provide functionality such as file type-specific navigation (such a browsing a presentation using next/previous controls, or seeking inside a media file).[25] Preview handlers can also allow certain kind of edits (such as highlighting a text snippet) to be performed from the preview pane itself. In the Control Panel, the search bar in the window can also search for Control Panel options. However, unlike WDS, Windows Search does not support creating aliases.

A combination of virtual and real folders in Windows Vista. The Virtual folders are recognizable by their distinctive icon and blue color.
File:Save As.png
A search can be saved to create a virtual folder (saved search) with the same query string as the original search

There is also a Search Explorer, which is an integrated Windows Explorer window that is used for searches. It presents the user interface to specify the search parameters, including locations and file types that should be searched, and certain operators, without crafting the AQS queries by hand. With Windows Vista SP1, third party applications will be able to override the Search Explorer as the default search interface so that the registered third party application will be launched, instead of bringing up the Search Explorer, when invoked by any means.[26] However, the Windows Search indexer will not be disabled after doing this, and the search bars will still continue to use the Windows Search index.

In Windows Search, which is part of Windows Vista, it is also possible to save a search query as a Virtual Folder, called a Saved Search or Search Folder,[1] which, when accessed, runs the search with the saved query and returns the results as a folder listing. Physically, a search folder is just an XML file (with a .search-ms extension) which stores the search query (in either AQS or NQS), including the search operators as well. Windows Vista also supports query composition, where a saved search (called a scope) can be nested within the query string of another search.[27] Search Folders are also distributable via RSS. They can also be shared as a SearchMelt, which is accessible over a network.[28] Accessing a SearchMelt over the network, like a regular Search Folder, makes the results of the search available as a virtual shared folder. The search will be performed on the machine which shares the SearchMelt, and will return only the results accessible from the network. However, by default, search folders are scoped for local use only; before sharing, they must be configured for remote access. Microsoft makes a SearchMelt Creator tool available for this as well.[29]

See also: Search functionality in Windows Vista

Windows Search 4.0

Template:Beta software

Screenshot of a beta version of Windows Live Search Center before it was rebranded.

Windows Search 4.0 is the successor to the Windows Search platform, for both Windows Desktop Search 3.0 on Windows XP as well as Instant Search on Windows Vista. It was initially developed as a research project as Windows Live Search Center,[30] codenamed Casino or OneView, which would be able to aggregate searches from various local as well as remote indexes including the Windows Search index (both local as well as those of networked systems), the Windows RSS Platform common feed store, Microsoft Exchange and Microsoft SharePoint indexes among others,[31] as well as perform searches against web services,[32] which uses OpenSearch specification to make available the search results as a web feed, and present the results in a unified interface.[33]

Later, it was revealed that it will not be part of Microsoft's Windows Live services, rather it will be merged with the desktop search platform in Windows and released as its successor.[33]

The first beta of Windows Search 4.0 was released on March 27, 2008.[30] It includes numerous performance improvements to the indexer as well as brings new features, including previously Windows Vista-exclusive ones to Windows XP - including Group Policy integration, federation of searches to remote indexes, support for EFS-encrypted files and Windows Vista-style preview handlers that allow document-type specific browsing of documents in the preview pane.[34][35] It does not currently include the search federation capabilities of Windows Live Search Center - which allows searches to be delegated to remote web services using OpenSearch specification, like Microsoft Search Server. Windows Search 4.0 is supported on Windows XP, Windows Server 2003, Windows Vista, Windows Server 2008 as well as Windows Home Server.

References

  1. ^ a b c d e "Windows Search Technologies for Business Customers". Retrieved 2007-07-14.
  2. ^ "Windows Vista: Features Explained: Instant Search". Retrieved 2007-03-16.
  3. ^ "IFilter". Retrieved 2007-06-23.
  4. ^ "List of searchable file types". Retrieved 2007-06-23.
  5. ^ "Developing Property Handlers for Windows Search". Retrieved 2007-06-23.
  6. ^ Brandon Paddock. "FAQ: How does indexing work? What are IFilters and Protocol Handlers?". Retrieved 2007-06-23.
  7. ^ "Windows Desktop Search: Add-in for Files on Microsoft Networks". Retrieved 2007-07-14.
  8. ^ Brandon Paddock. "FAQ: Why does WDS / Windows Vista use so many processes?". Retrieved 2007-06-23.
  9. ^ a b c d e f g h i j k "Good Citizenship When Developing Background Services That Run on Windows Vista". Retrieved 2007-07-14.
  10. ^ "Advanced Query Syntax". MSDN TechNet. Retrieved 2007-06-23.
  11. ^ Nick White. "Advanced search techniques". Retrieved 2007-06-23.
  12. ^ "Seek and Ye Shall Find". Retrieved 2007-07-05.
  13. ^ "Searching data". Retrieved 2007-03-17.
  14. ^ "Development Platform Overview". MSDN. Retrieved 2007-10-12.
  15. ^ "Querying the Index programmatically". MSDN. Retrieved 2007-10-12.
  16. ^ Catherine Heller. "Windows Vista Search: Syntax Update". Retrieved 2007-06-23.
  17. ^ "Querying the Index Programmatically". Retrieved 2007-06-23. {{cite web}}: Unknown parameter |pubisher= ignored (|publisher= suggested) (help)
  18. ^ "Using the search-ms Protocol". Retrieved 2007-09-24.
  19. ^ "Change Journals (Windows)". Retrieved 2007-07-14.
  20. ^ "Managing the Index". MSDN. Retrieved 2007-10-12.
  21. ^ "Windows Desktop Search". Retrieved 2007-03-16.
  22. ^ "More gnashing of teeth after Microsoft update brings PCs to a standstill". Retrieved 2007-10-25.
  23. ^ "Windows Search". {{cite web}}: Unknown parameter |aceessdate= ignored (help)
  24. ^ "Natural Language Search in Windows Vista". Retrieved 2007-06-22.
  25. ^ "Windows Search 3.x". MSDN. Retrieved 2007-10-12.
  26. ^ "Overview of the Windows Vista desktop search changes in Windows Vista Service Pack 1". Retrieved 2007-07-14.
  27. ^ "Query Composition: Building a search upon another search". Retrieved 2007-06-22.
  28. ^ Nick White. "Searching, part III: Do you know what a SearchMelt is?". Retrieved 2007-06-23.
  29. ^ "SearchMelt Creator tool". Retrieved 2007-07-14.
  30. ^ a b Mary Jo Foley. "Microsoft releases first public test build of Windows Search 4.0". Retrieved 2008-03-28.
  31. ^ Brandon Paddock. "Where is YOUR stuff?". Retrieved 2007-06-14.
  32. ^ Brandon Paddock. "Open Search". Retrieved 2007-06-14. {{cite web}}: Text "Open Search" ignored (help)
  33. ^ a b Brandon Paddock. "The fate of codename "Casino"". Retrieved 2007-06-14.
  34. ^ Brandon. "Windows Search 4.0 Preview Release". Retrieved 2008-03-28.
  35. ^ "Description of Windows Search 4.0 and Multilingual User Interface Pack for Windows Search 4.0". Microsoft. Retrieved 2008-03-28.

See also