Portal:Web search engines
Portal maintenance status: (October 2018)
|
Introduction
A web search engine is a software system that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of web pages, images and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler. Internet content that is not capable of being searched by a web search engine is generally described as the deep web.
Selected general articles
- Voice search, also called voice-enabled, allows the user to use a voice command to search the Internet, or a portable device. Currently, voice search is commonly used in (in a narrow sense) "directory assistance", or local search. Examples include Google 411, Tellme directory assistance and Yellowpages.com's 1-800-YellowPages.
In a broader definition, voice search include open-domain keyword query on any information on the Internet, for example in Google Voice Search, Cortana, Siri and Amazon Echo. Given that voice-based systems are interactive, such systems are also called open-domain question answering systems. Read more... - The use of search engine technology is the main integration component in an information system. In a traditional business environment the architectural layer usually occupied by a relational database management system (RDBMS) is supplemented or replaced with a search engine or the indexing technology used to build search engines. Queries for information which would usually be performed using Structured Query Language (SQL) are replaced by keyword or fielded (or field-enabled) searches for structured, semi-structured, or unstructured data.
In a typical multi-tier or N tier architecture information is maintained in a data tier where it can be stored and retrieved from a database or file system. The data tier is queried by the logic or business tier when information is needed using a data retrieval language like SQL. Read more... - Semantic search seeks to improve search accuracy by understanding the searcher's intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Semantic search systems consider various points including context of search, location, intent, variation of words, synonyms, generalized and specialized queries, concept matching and natural language queries to provide relevant search results. Major web search engines like Google and Bing incorporate some elements of semantic search. In vertical search, LinkedIn publishes their semantic search approach to job search by recognizing and standardizing entities in both queries and documents, e.g., companies, titles and skills, then constructing various entity-aware features based on the entities.
Guha et al. distinguish two major forms of search: navigational and research. In navigational search, the user is using the search engine as a navigation tool to navigate to a particular intended document. Semantic search is not applicable to navigational searches. In research search, the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about and is trying to get to. Rather, the user is trying to locate a number of documents which together will provide the desired information. Semantic search lends itself well with this approach that is closely related with exploratory search. Read more... - Multisearch is a multitasking search engine which includes both search engine and metasearch engine characteristics with additional capability of retrieval of search result sets that were previously classified by users. It enables the user to gather results from its own search index as well as from one or more search engines, metasearch engines, databases or any such kind of information retrieval (IR) programs.
Multisearch is an emerging feature of automated search and information retrieval systems which combines the capabilities of computer search programs with results classification made by a human.
Multisearch is a way to take advantage of the power of multiple search engines with a flexibility not seen in traditional metasearch engines. To the end user, a multisearch may appear to be just a customizable search engine; however, its behind-the-scenes technology enables it to put a face to the search process and to retrieve and display also a results set which has been classified by a human during a multisearch session and automatically included in the documents index.
There are additional features available in many search engines and metasearch engines, but the basic idea is the same: reducing the amount of time required to search for resources by improvement of the accuracy and relevance of individual searches as well as the ability to manage the results. Read more... - Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process in the context of search engines designed to find web pages on the Internet is web indexing.
Popular engines focus on the full-text indexing of online, natural language documents. Media types such as video and audio and graphics are also searchable. Read more... - Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interesting. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
Text analysis involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics. The overarching goal is, essentially, to turn text into data for analysis, via application of natural language processing (NLP) and analytical methods. Read more... - Enterprise search is the practice of making content from multiple enterprise-type sources, such as databases and intranets, searchable to a defined audience .
"Enterprise search" is used to describe the software of search information within an enterprise (though the search function and its results may still be public). Enterprise search can be contrasted with web search, which applies search technology to documents on the open web, and desktop search, which applies search technology to the content on a single computer. Read more... - A search aggregator is a type of metasearch engine which gathers results from multiple search engines simultaneously, typically through RSS search results. It combines user specified search feeds (parameterized RSS feeds which return search results) to give the user the same level of control over content as a general aggregator.
Soon after the introduction of RSS, sites began publicising their search results in parameterized RSS feeds. Search aggregators are an increasingly popular way to take advantage of the power of multiple search engines with a flexibility not seen in traditional metasearch engines. To the end user, a search aggregator may appear to be just a customizable search engine and the use of RSS may be completely hidden. However, the presence of RSS is directly responsible for the existence of search aggregators and a critical component in the behind-the-scenes technology. Read more... - A vertical search engine is distinct from a general web search engine, in that it focuses on a specific segment of online content. They are also called specialty or topical search engines. The vertical content area may be based on topicality, media type, or genre of content. Common verticals include shopping, the automotive industry, legal information, medical information, scholarly literature, job search and travel. Examples of vertical search engines include the Library of Congress, Mocavo, Nuroa, Trulia and Yelp.
In contrast to general web search engines, which attempt to index large portions of the World Wide Web using a web crawler, vertical search engines typically use a focused crawler which attempts to index only relevant web pages to a pre-defined topic or set of topics. Some vertical search sites focus on individual verticals, while other sites include multiple vertical searches within one search engine. Read more... - Collaborative search engines (CSE) are Web search engines and enterprise searches within company intranets that let users combine their efforts in information retrieval (IR) activities, share information resources collaboratively using knowledge tags, and allow experts to guide less experienced people through their searches. Collaboration partners do so by providing query terms, collective tagging, adding comments or opinions, rating search results, and links clicked of former (successful) IR activities to users having the same or a related information need. Read more...
- Social search is a behavior of retrieving and searching on a social searching engine that mainly searches user-generated content such as news, videos and images related search queries on social media like Facebook, LinkedIn, Twitter, Instagram and Flickr. It is an enhanced version of web search that combines traditional algorithms. The idea behind social search is that instead of ranking search results purely based on semantic relevance between a query and the results, a social search system also takes into account social relationships between the results and the searcher. The social relationships could be in various forms. For example, in LinkedIn people search engine, the social relationships include social connections between searcher and each result, whether or not they are in the same industries, work for the same companies, belong the same social groups, and go the same schools, etc.
Social search may not be demonstrably better than algorithm-driven search. In the algorithmic ranking model that search engines used in the past, relevance of a site is determined after analyzing the text and content on the page and link structure of the document. In contrast, search results with social search highlight content that was created or touched by other users who are in the Social Graph of the person conducting a search. It is a personalized search technology with online community filtering to produce highly personalized results. Social search takes many forms, ranging from simple shared bookmarks or tagging of content with descriptive labels to more sophisticated approaches that combine human intelligence with computer algorithms. Depending on the feature-set of a particular search engine, these results may then be saved and added to community search results, further improving the relevance of results for future searches of that keyword. The principle behind social search is that human network oriented results would be more meaningful and relevant for the user, instead of computer algorithms deciding the results for specific queries, . Read more... - Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual. User queries can range from multi-sentence full descriptions of an information need to a few words.
Document retrieval is sometimes referred to as, or as a branch of, text retrieval. Text retrieval is a branch of information retrieval where the information is stored primarily in the form of text. Text databases became decentralized thanks to the personal computer and the CD-ROM. Text retrieval is a critical area of study today, since it is the fundamental basis of all internet search engines. Read more... - A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information which must be consulted, akin to other techniques for managing information overload.
The most public, visible form of a search engine is a Web search engine which searches for information on the World Wide Web. Read more...
Desktop search tools search within a user's own computer files as opposed to searching the Internet. These tools are designed to find information on the user's PC, including web browser history, e-mail archives, text documents, sound files, images, and video. A variety of desktop search programs are now available; see this list for examples. Most desktop search programs are standalone applications. Desktop search products are software alternatives to the search software included in the operating system, helping users sift through desktop files, emails, attachments, and more.
Desktop search emerged as a concern for large firms for two main reasons: untapped productivity and security. According to analyst firm Gartner, up to 80% of some companies' data is locked up inside unstructured data — the information stored on an user's PC, the directories (folders) and files they've created on a network, documents stored in repositories such as corporate intranets and a multitude of other locations. Moreover, many companies have structured or unstructured information stored in older file formats to which they don't have ready access. Read more...
A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web, typically for the purpose of Web indexing (web spidering).
Web search engines and some other sites use Web crawling or spidering software to update their web content or indices of others sites' web content. Web crawlers copy pages for processing by a search engine which indexes the downloaded pages so users can search more efficiently. Read more...- Z39.50 is an international standard client–server, application layer communications protocol for searching and retrieving information from a database over a TCP/IP computer network. It is covered by ANSI/NISO standard Z39.50, and ISO standard 23950. The standard's maintenance agency is the Library of Congress.
Z39.50 is widely used in library environments, often incorporated into integrated library systems and personal bibliographic reference software. Interlibrary catalogue searches for interlibrary loan are often implemented with Z39.50 queries. Read more... - A web search query is a query that a user enters into a web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are often plain text or hypertext with optional search-directives (such as "and"/"or" with "-" to exclude). They vary greatly from standard query languages, which are governed by strict syntax rules as command languages with keyword or positional parameters. Read more...
A Web query topic classification/categorization is a problem in information science. The task is to assign a Web search query to one or more predefined categories, based on its topics. The importance of query classification is underscored by many services provided by Web search. A direct application is to provide better search result pages for users with interests of different categories. For example, the users issuing a Web query “apple” might expect to see Web pages related to the fruit apple, or they may prefer to see products or news related to the computer company. Online advertisement services can rely on the query classification results to promote different products more accurately. Search result pages can be grouped according to the categories predicted by a query classification algorithm. However, the computation of query classification is non-trivial. Different from the document classification tasks, queries submitted by Web search users are usually short and ambiguous; also the meanings of the queries are evolving over time. Therefore, query topic classification is much more difficult than traditional document classification tasks. Read more...
A web search engine is a software system that is designed to search for information on the World Wide Web. The search results are generally presented in a line of results, often referred to as search engine results pages (SERPs). The information may be a mix of web pages, images and other types of files. Some search engines also mine data available in databases or open directories. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time information by running an algorithm on a web crawler.
Internet content that is not capable of being searched by a web search engine is generally described as the deep web. Read more...- A selection-based search system is a search engine system in which the user invokes a search query using only the mouse. A selection-based search system allows the user to search the internet for more information about any keyword or phrase contained within a document or webpage in any software application on their desktop computer using the mouse.
Traditional browser-based search systems require the user to launch a web browser, navigate to a search page, type or paste a query into a search box, review a list of results, and click a hyperlink to view these results. Three characteristic features of a selection-based search system are that the user can invoke search using only their mouse from within the context of any application on their desktop (for example Microsoft Office, Adobe Reader, Mozilla Firefox, etc.), receive categorized suggestions which are based on the context of the user-selected text (or in some cases the wisdom of crowds), and view the results in floating information boxes which can be sized, shared, docked, closed and stacked on top of the document that has the user’s primary focus. Read more... - Distributed web crawling is a distributed computing technique whereby Internet search engines employ many computers to index the Internet via web crawling. Such systems may allow for users to voluntarily offer their own computing and bandwidth resources towards crawling web pages. By spreading the load of these tasks across many computers, costs that would otherwise be spent on maintaining large computing clusters are avoided. Read more...
OpenSearch is a collection of technologies that allow publishing of search results in a format suitable for syndication and aggregation. It is a way for websites and search engines to publish search results in a standard and accessible format.
OpenSearch was developed by Amazon.com subsidiary A9 and the first version, OpenSearch 1.0, was unveiled by Jeff Bezos at the O'Reilly Emerging Technology Conference in March, 2005. Draft versions of OpenSearch 1.1 were released during September and December 2005. The OpenSearch specification is licensed by A9 under the Creative Commons Attribution-ShareAlike 2.5 License. Read more...- Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ web crawlers for automated capture due to the massive size and amount of information on the Web. The largest web archiving organization based on a bulk crawling approach is the Internet Archive which strives to maintain an archive of the entire Web.
The International Web Archiving Workshop (IWAW), begun in 2001, has provided a platform to share experiences and exchange ideas. The later founding of the International Internet Preservation Consortium (IIPC), in 2003, has greatly facilitated international collaboration in developing standards and open source tools for the creation of web archives. These developments, and the growing portion of human culture created and recorded on the web, combine to make it inevitable that more and more libraries and archives will have to face the challenges of web archiving.National libraries, national archives and various consortia of organizations are also involved in archiving culturally important Web content. Read more... - Search engine optimization (SEO) is the process of affecting the online visibility of a website or a web page in a web search engine's unpaid results—often referred to as "natural", "organic", or "earned" results. In general, the earlier (or higher ranked on the search results page), and more frequently a website appears in the search results list, the more visitors it will receive from the search engine's users; these visitors can then be converted into customers. SEO may target different kinds of search, including image search, video search, academic search, news search, and industry-specific vertical search engines. SEO differs from local search engine optimization in that the latter is focused on optimizing a business' online presence so that its web pages will be displayed by search engines when a user enters a local search for its products or services. The former instead is more focused on national or international searches.
As an Internet marketing strategy, SEO considers how search engines work, the computer programmed algorithms which dictate search engine behavior, what people search for, the actual search terms or keywords typed into search engines, and which search engines are preferred by their targeted audience. Optimizing a website may involve editing its content, adding content, doing HTML, and associated coding to both increase its relevance to specific keywords and to remove barriers to the indexing activities of search engines. Promoting a site to increase the number of backlinks, or inbound links, is another SEO tactic. By May 2015, mobile search had surpassed desktop search. In 2015, it was reported that Google is developing and promoting mobile search as a key feature within future products. In response, many brands are beginning to take a different approach to their Internet marketing strategies. Read more... - A focused crawler is a web crawler that collects Web pages that satisfy some specific property, by carefully prioritizing the crawl frontier and managing the hyperlink exploration process. Some predicates may be based on simple, deterministic and surface properties. For example, a crawler's mission may be to crawl pages from only the .jp domain. Other predicates may be softer or comparative, e.g., "crawl pages about baseball", or "crawl pages with large PageRank". An important page property pertains to topics, leading to topical crawlers. For example, a topical crawler may be deployed to collect pages about solar power, swine flu, or even more abstract concepts like controversy while minimizing resources spent fetching pages on other topics. Crawl frontier management may not be the only device used by focused crawlers; they may use a Web directory, a Web text index, backlinks, or any other Web artifact.
A focused crawler must predict the probability that an unvisited page will be relevant before actually downloading the page. A possible predictor is the anchor text of links; this was the approach taken by Pinkerton in a crawler developed in the early days of the Web. Topical crawling was first introduced by Filippo Menczer Chakrabarti et al. coined the term focused crawler and used a text classifier to prioritize the crawl frontier. Andrew McCallum and co-authors also used reinforcement learning to focus crawlers. Diligenti et al. traced the context graph leading up to relevant pages, and their text content, to train classifiers. A form of online reinforcement learning has been used along with features extracted from the DOM tree and text of linking pages, to continually train classifiers that guide the crawl. In a review of topical crawling algorithms, Menczer et al. show that such simple strategies are very effective for short crawls, while more sophisticated techniques such as reinforcement learning and evolutionary adaptation can give the best performance over longer crawls. It has been shown that spatial
information is important to classify Web documents. Read more... - Wide Area Information Server (WAIS) is a client–server text searching system that uses the ANSI Standard Z39.50 Information Retrieval Service Definition and Protocol Specifications for Library Applications" (Z39.50:1988) to search index databases on remote computers. It was developed in the late 1980s as a project of Thinking Machines, Apple Computer, Dow Jones, and KPMG Peat Marwick.
WAIS did not adhere to either the standard or its OSI framework (adopting instead TCP/IP) but created a unique protocol inspired by Z39.50:1988. Read more... - Evaluation measures for an information retrieval system are used to assess how well the search results satisfied the user's query intent. Such metrics are often split into kinds: online metrics look at users' interactions with the search system, while offline metrics measure relevance, in other words how likely each result, or search engine results page (SERP) page as a whole, is to meet the information needs of the user. Read more...
- Federated search is an information retrieval technology that allows the simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engines, databases or other query engines participating in the federation. The federated search then aggregates the results that are received from the search engines for presentation to the user. Federated search can be used to integrate disparate information resources within a single large organization ("enterprise") or for the entire web.
Federated search, unlike distributed search, requires centralized coordination of the searchable resources. This involves both coordination of the queries transmitted to the individual search engines and fusion of the search results returned by each of them. Read more... - Representational State Transfer (REST) is a software architectural style that defines a set of constraints to be used for creating web services. Web services that conform to the REST architectural style, termed RESTful web services, provide interoperability between computer systems on the Internet. RESTful web services allow the requesting systems to access and manipulate textual representations of web resources by using a uniform and predefined set of stateless operations. Other kinds of web services, such as SOAP web services, expose their own arbitrary sets of operations.
"Web resources" were first defined on the World Wide Web as documents or files identified by their URLs. However, today they have a much more generic and abstract definition that encompasses every thing or entity that can be identified, named, addressed, or handled, in any way whatsoever, on the web. In a RESTful web service, requests made to a resource's URI will elicit a response with a payload formatted in either HTML, XML, JSON, or some other format. The response can confirm that some alteration has been made to the stored resource, and the response can provide hypertext links to other related resources or collections of resources. When HTTP is used, as is most common, the operations available are GET, POST, PUT, DELETE, and other predefined CRUD HTTP methods. Read more... - Human flesh search engine (Chinese: 人肉搜索; pinyin: Rénròu Sōusuǒ) is a Chinese term for the phenomenon of distributed researching using Internet media such as blogs and forums. It is similar to the concept of "doxing", a practice often associated with the social activist group Anonymous. Both human flesh search engine and doxing have generally been stigmatized as being for the purpose of identifying and exposing individuals to public humiliation, sometimes out of vigilantism, nationalist or patriotic sentiments, or to break the Internet censorship in the People's Republic of China. More recent analyses, however, have shown that it is also used for a number of other reasons, including exposing government corruption, identifying hit and run drivers, and exposing scientific fraud, as well as for more "entertainment"-related items such as identifying people seen in pictures. A categorization of hundreds of Human flesh search (HFS) episodes can be found in the 2010 IEEE Computer paper A Study of the Human Flesh Search Engine: Crowd-Powered Expansion of Online Knowledge.
The system is based on massive human collaboration. The name refers both to the use of knowledge contributed by human beings through social networking, and to the fact that the searches are usually dedicated to finding the identity of a human being who has committed some sort of offense or social breach online. People conducting such research are commonly referred to collectively as "Human Flesh Search Engines". Read more...
A metasearch engine (or aggregator) is a search tool that uses another search engine's data to produce its own results from the Internet. Metasearch engines take input from a user and simultaneously send out queries to third party search engines for results. Sufficient data is gathered, formatted by their ranks and presented to the users.
Metasearch engines have their own sets of unique problems. All of the websites stored on search engines are different, which draws irrelevant content. Problems such as spamming reduces result accuracy. The process of fusion aims to tackle this issue and improve the engineering of a metasearch engine. Read more...- An image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images. Most traditional and common methods of image retrieval utilize some method of adding metadata such as captioning, keywords, or descriptions to the images so that retrieval can be performed over the annotation words. Manual image annotation is time-consuming, laborious and expensive; to address this, there has been a large amount of research done on automatic image annotation. Additionally, the increase in social web applications and the semantic web have inspired the development of several web-based image annotation tools.
The first microcomputer-based image database retrieval system was developed at MIT, in the 1990s, by Banireddy Prasaad, Amar Gupta, Hoo-min Toong, and Stuart Madnick. Read more... - The robots exclusion standard, also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned. Robots are often used by search engines to categorize websites. Not all robots cooperate with the standard; email harvesters, spambots, malware, and robots that scan for security vulnerabilities may even start with the portions of the website where they have been told to stay out. The standard is different from but can be used in conjunction with, Sitemaps, a robot inclusion standard for websites. Read more...
- Natural-language user interface (LUI or NLUI) is a type of computer human interface where linguistic phenomena such as verbs, phrases and clauses act as UI controls for creating, selecting and modifying data in software applications.
In interface design, natural-language interfaces are sought after for their speed and ease of use, but most suffer the challenges to understanding wide varieties of ambiguous input.
Natural-language interfaces are an active area of study in the field of natural-language processing and computational linguistics. An intuitive general natural-language interface is one of the active goals of the Semantic Web. Read more... - Search engine marketing (SEM) is a form of Internet marketing that involves the promotion of websites by increasing their visibility in search engine results pages (SERPs) primarily through paid advertising. SEM may incorporate search engine optimization (SEO), which adjusts or rewrites website content and site architecture to achieve a higher ranking in search engine results pages to enhance pay per click (PPC) listings. Read more...
- A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.
Video search engines are computer programs designed to find videos stored on digital devices, either through Internet servers or in storage units from the same computer. These searches can be made through audiovisual indexing, which can extract information from audiovisual material and record it as metadata, which will be tracked by search engines. Read more... - Local search is the use of specialized Internet search engines that allow users to submit geographically constrained searches against a structured database of local business listings. Typical local search queries include not only information about "what" the site visitor is searching for (such as keywords, a business category, or the name of a consumer product) but also "where" information, such as a street address, city name, postal code, or geographic coordinates like latitude and longitude. Examples of local searches include "Hong Kong hotels", "Manhattan restaurants", and "Dublin car rental". Local searches exhibit explicit or implicit local intent. A search that includes a location modifier, such as "Bellevue, WA" or "14th arrondissement", is an explicit local search. A search that references a product or service that is typically consumed locally, such as "restaurant" or "nail salon", is an implicit local search.
Local searches typically trigger Google to return organic results and a local 3-pack. More local results can be obtained by clicking on “more places” under the 3-pack. The list of results one obtains is also called the Local Finder. Read more...
Selected search engines
- Sogou, Inc. is a public company, founded on 9 August 2010 by Wang Xiaochuan. It is the owner and developer of Sogou (Chinese: 搜狗; pinyin: Sōugǒu; literally: "searching dog") search engine, Sogou Input and Sogou browser. According to iResearch statistics, Sogou accounted for 16.7% of the mobile search market share in China. It is headquartered in Beijing, China. The offices of Sogou are located on the southeast corner of Tsinghua University.
Sogou listed on the New York Stock Exchange on November 9, 2017, under the ticker “SOGO”. The night before the IPO, Tencent owned 44% of the Sogou while Sohu owned 38%. In the quarter prior to the IPO, Sogou reported 55% year over year revenue growth of 55% and 22% quarter over quarter growth. Read more... - Dogpile is a metasearch engine for information on the World Wide Web that fetches results from Google, Yahoo!, and Yandex, and includes results from several other popular search engines, including those from audio and video content providers. Read more...
Ask.com (originally known as Ask Jeeves) is a question answering-focused e-business founded in 1996 by Garrett Gruener and David Warthen in Berkeley, California.
The original software was implemented by Gary Chevsky from his own design. Warthen, Chevsky, Justin Grant, and others built the early AskJeeves.com website around that core engine. In late 2010, facing insurmountable competition from more popular search engines, the company outsourced its web search technology and returned to its roots as a question and answer site. Douglas Leeds was elevated from president to CEO in 2010. Read more...
The home page of the English Wikipedia
A home page or a start page is the initial or main web page of a website or a browser. The initial page of a website is sometimes called main page as well. Read more...
Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, search engine, cloud computing, software, and hardware. Google was founded in 1998 by Larry Page and Sergey Brin while they were Ph.D. students at Stanford University in California. Together they own about 14 percent of its shares and control 56 percent of the stockholder voting power through supervoting stock. They incorporated Google as a privately held company on September 4, 1998. An initial public offering (IPO) took place on August 19, 2004, and Google moved to its headquarters in Mountain View, California, nicknamed the Googleplex. In August 2015, Google announced plans to reorganize its various interests as a conglomerate called Alphabet Inc. Google is Alphabet's leading subsidiary and will continue to be the umbrella company for Alphabet's Internet interests. Sundar Pichai was appointed CEO of Google, replacing Larry Page who became the CEO of Alphabet.
The company's rapid growth since incorporation has triggered a chain of products, acquisitions, and partnerships beyond Google's core search engine (Google Search). It offers services designed for work and productivity (Google Docs, Google Sheets, and Google Slides), email (Gmail/Inbox), scheduling and time management (Google Calendar), cloud storage (Google Drive), social networking (Google+), instant messaging and video chat (Google Allo, Duo, Hangouts), language translation (Google Translate), mapping and navigation (Google Maps, Waze, Google Earth, Street View), video sharing (YouTube), note-taking (Google Keep), and photo organizing and editing (Google Photos). The company leads the development of the Android mobile operating system, the Google Chrome web browser, and Chrome OS, a lightweight operating system based on the Chrome browser. Google has moved increasingly into hardware; from 2010 to 2015, it partnered with major electronics manufacturers in the production of its Nexus devices, and it released multiple hardware products in October 2016, including the Google Pixel smartphone, Google Home smart speaker, Google Wifi mesh wireless router, and Google Daydream virtual reality headset. Google has also experimented with becoming an Internet carrier. In February 2010, it announced Google Fiber, a fiber-optic infrastructure that was installed in Kansas City; in April 2015, it launched Project Fi in the United States, combining Wi-Fi and cellular networks from different providers; and in 2016, it announced the Google Station initiative to make public Wi-Fi available around the world, with initial deployment in India. Read more...
Baidu, Inc. (Chinese: 百度; pinyin: Bǎidù, anglicized /ˈbaɪduː/ BY-doo), incorporated on 18 January 2000, is a Chinese multinational technology company specializing in Internet-related services and products and artificial intelligence, headquartered at the Baidu Campus in Beijing's Haidian District. It is one of the largest AI and internet companies in the world. The holding company of the group was incorporated in the Cayman Islands. Baidu was established in 2000 by Robin Li and Eric Xu. Baidu is currently ranked 4th overall in the Alexa Internet rankings.
Baidu's Global Business Unit, formed under the name of DU Group or DU Apps Studio, is an app developer with various apps and services. It has over 2 billion active users worldwide. Baidu also provides an official international and Chinese version of its popular online digital distribution services Baidu App Store and Shouji Baidu respectively, both hosting downloadable content and applications. Baidu's advertisement platform is DU Ad Platform. Read more...- Yooz.ir (in Persian: یوز lit. Cheetah) is an Iranian search engine. Iran's Ministry of Communication and IT claims the search engine is capable of supporting up to one billion Persian websites, and it has currently indexed over 1 billion web pages.
Yooz search engine has 100,000 hits and more than 60,000 searchers per day. Read more... - SAPO or Sapo may refer to: Read more...
- Ecosia is a web search engine based in Berlin, Germany, which donates 80% of its surplus income to non-profit conservationist organizations, with a focus on tree planting. Ecosia considers itself a "social business", is CO2-neutral and claims to support full financial transparency, and is certified by B-Lab as a benefit corporation. According to Ecosia, over 42.4 million trees have been planted with its support as of 17 November 2018. Read more...
AOL (stylized as Aol., formerly a company known as AOL Inc. and originally known as America Online) is a web portal and online service provider based in New York City. It is a brand marketed by Oath, a subsidiary of Verizon Communications.
The service traces its history to an online service known as PlayNET, which hosted multi-player games for the Commodore 64. PlayNET licensed their software to a new service, Quantum Link (Q-Link), who went online in November 1985. PlayNET shut down shortly thereafter. The initial Q-Link service was similar to the original PlayNET, but over time Q-Link added many new services. When a new IBM PC client was released, the company focussed on the non-gaming services and launched it under the name America Online. The original Q-Link was shut down on November 1, 1995, while AOL grew to become the largest online service, displacing established players like CompuServe and The Source. By 1995, AOL had about 20 million active users. Read more...- Youdao (有道) is a search engine released by Chinese internet company NetEase (網易) in 2007. It is the featured search engine of its parent company's web portal, 163.com, and lets users search for web pages, images, news, music, blogs, Chinese-to-English dictionary entries, and more. Read more...
|thumb]]
Qwant is a French company that was founded by security specialist Éric Leandri, investor Jean Manuel Rozan and search-engine expert Patrick Constant in 2011. It launched its eponymous web search engine in July 2013. It claims not to employ user tracking, and it doesn't personalize search results in order to avoid trapping users in a filter bubble.
The website processes well over 10 million search requests per day and over 50 million individual users a month worldwide, spread over its three main entry points: the normal homepage, a 'lite' version and a 'Qwant Junior' portal for children that filters results. Read more...- Scout may refer to: Read more...
Startpage is a search engine which highlights privacy as its distinguishing feature. It was previously known as the metasearch engine Ixquick, Startpage being then a variant service. Both sites were merged in 2016.
Founded by David Bodnick in 1998, Ixquick is owned by Surfboard Holding BV of the Netherlands, which acquired the internet company in 2000. Ixquick and its sibling project Startpage.com reached their latest record (28-day average) daily direct queries of 5.7 million on 2 February 2015. Read more...- GenieKnows Inc., a privately owned vertical search engine company based in Halifax, Nova Scotia. Like many internet search engines, its revenue model centers on an online advertising platform and B2B transactions. It focuses on a set of niche search markets, or verticals, including health search, video games search, and local business directory search. Read more...
- Gigablast is a web search engine founded in 2000. In 2015, it claimed to have indexed over 12 billion web pages, and received billions of queries per month.
The search engine source code is written in the programming languages C and C++. It was released as open-source software under the Apache License version 2, in July 2013. Read more...
Parsijoo (Persian: پارسیجو) is an independent knowledge-base Internet company, operating as a search engine for the Persian language. As of 2016, Parsijoo had 600,000 hits and 120,000 searches per day. Parsijoo is Iran's second most visited search engine after Google. Read more...- Blackle is a website powered by Google Custom Search and created by Heap Media, which aims to save energy by displaying a black background and using grayish-white font color for search results. Blackle claims having saved nearly 6 MWh of electrical energy up to December 2016, a claim currently under dispute. For comparison, the average American household consumes 11 MWh of electrical energy per year. Read more...
- EXALEAD /ɛɡˈzæliːd/ is a software company, created in 2000, that provided search platforms and search-based applications (SBA) for consumer and business users. The company is headquartered in Paris, France, and is a subsidiary of Dassault Systèmes (French pronunciation: [daˈso]). Read more...
Yandex N.V. (/ˈjʌndɛks/; Russian: Яндекс, IPA: [ˈjandəks]) is a Russian multinational corporation specializing in Internet-related products and services, including search and information services, eCommerce, transportation, navigation, mobile applications, and online advertising. Yandex provides over 70 services in total.
Incorporated in the Netherlands, Yandex primarily serves audiences in Russia and the Commonwealth of Independent States. The company founders and most of the team members are located in Russia. The company has 18 commercial offices worldwide.
It is the largest technology company in Russia and the largest search engine on the internet in Russian, with a market share of over 52%. The Yandex.ru home page is the 4th most popular website in Russia. It also has the largest market share of any search engine in the Commonwealth of Independent States and is the 5th largest search engine worldwide after Google, Baidu, Bing, and Yahoo!. Read more...- Lycos, Inc., is a web search engine and web portal established in 1995, spun out of Carnegie Mellon University. Lycos also encompasses a network of email, webhosting, social networking, and entertainment websites. Read more...
- Picollator - Internet search engine that performs search for web sites and multimedia by visual query (image) or text, or a combination of visual query and text. Picollator recognizes objects in the image, obtains their relevance to the text and vice a versa, and searches in accordance with all information provided. Read more...
Searx (/sɜːrks/) is a free metasearch engine, available under the GNU Affero General Public License version 3, with the aim of protecting the privacy of its users. To this end, Searx does not share users' IP addresses or search history with the search engines from which it gathers results. Tracking cookies served by the search engines are blocked, preventing user-profiling-based results modification. By default, Searx queries are submitted via HTTP POST, to prevent users' query keywords from appearing in webserver logs. Searx was inspired by the Seeks project, though it does not implement Seeks' peer-to-peer user-sourced results ranking.
Each search result is given as a direct link to the respective site, rather than a tracked redirect link as used by Google. In addition, when available, these direct links are accompanied by "cached" and/or "proxied" links that allow viewing results pages without actually visiting the sites in question. The "cached" links point to saved versions of a page on archive.org, while the "proxied" links allow viewing the current live page via a Searx-based web proxy. In addition to the general search, the engine also features tabs to search within specific domains: files, images, IT, maps, music, news, science, social media, and videos. Read more...- Bing may refer to:
- Bing Crosby (1903–1977), American singer
- Bing (search engine), a web search engine developed by Microsoft
Sputnik 1 (/ˈspʊtnɪk/ or /ˈspʌtnɪk/; "Satellite-1", or "PS-1", Простейший Спутник-1 or Prosteyshiy Sputnik-1, "Elementary Satellite 1") was the first artificial Earth satellite. The Soviet Union launched it into an elliptical low Earth orbit on 4 October 1957, orbiting for three weeks before its batteries died, then silently for two more months before falling back into the atmosphere. It was a 58 cm (23 in) diameter polished metal sphere, with four external radio antennas to broadcast radio pulses. Its radio signal was easily detectable even by radio amateurs, and the 65° inclination and duration of its orbit made its flight path cover virtually the entire inhabited Earth. This surprise success precipitated the American Sputnik crisis and triggered the Space Race, a part of the Cold War. The launch ushered in new political, military, technological, and scientific developments.
Tracking and studying Sputnik 1 from Earth provided scientists with valuable information. The density of the upper atmosphere could be deduced from its drag on the orbit, and the propagation of its radio signals gave data about the ionosphere. Read more...
Naver (Hangul: 네이버) is a South Korean online platform operated by Naver Corporation. It debuted in 1999 as the first web portal in Korea to develop and use its own search engine. It was also the world's first operator to introduce the comprehensive search feature, which compiles search results from various categories and presents them in a single page. Naver has since added a multitude of new services ranging from basic features such as e-mail and news to the world's first online Q&A platform Knowledge iN.
As of September 2017, the search engine handled 74.7% of all web searches in South Korea and had 42 million enrolled users. More than 25 million Koreans have Naver as the start page on their default browser and the mobile application has 28 million daily visitors. Naver is also constantly referred to as 'the Google of South Korea'. Read more...
Seznam.cz (or just Seznam, which means directory in English) is a web portal and search engine in the Czech Republic. Founded in 1996 by Ivo Lukačovič in Prague as the first web portal in the Czech Republic. Seznam started with a search engine and an internet version of yellow pages. Today, Seznam runs more than 15 different web services and associated brands. Seznam had more than 6 million real users per month at the end of 2014. Among the most popular services, according to NetMonitor, are its homepage seznam.cz, email.cz, search.seznam.cz and its yellow pages firmy.cz.
In the Czech market, Seznam.cz was until 2008 in competition with the portals Centrum.cz and Atlas.cz. These two portals, despite merging in 2008, no longer appear likely to overcome Seznam in the near future. Seznam's biggest competitor now appears to be global Google, especially in the area of full-text search. Read more...
Screenshot of DuckDuckGo home page as of August, 2017
DuckDuckGo (DDG) is an Internet search engine that emphasizes protecting searchers' privacy and avoiding the filter bubble of personalized search results. DuckDuckGo distinguishes itself from other search engines by not profiling its users and by deliberately showing all users the same search results for a given search term, and emphasizes returning the best results, rather than the most results, generating those results from over 400 individual sources, including crowdsourced sites such as Wikipedia, and other search engines like Bing, Yahoo!, and Yandex. In November 2018, it had 29,661,659
daily direct searches on average.
The company is based in Paoli, Pennsylvania, in Greater Philadelphia, and has 55 employees. The company name is a reference to the children's game duck, duck, goose. Read more...
YaCy (pronounced "ya see") is a free distributed search engine, built on principles of peer-to-peer (P2P) networks. Its core is a computer program written in Java distributed on several hundred computers, , so-called YaCy-peers. Each YaCy-peer independently crawls through the Internet, analyzes and indexes found web pages, and stores indexing results in a common database (so called index) which is shared with other YaCy-peers using principles of P2P networks. It is a free search engine that everyone can use to build a search portal for their intranet and to help search the public internet clearly.
Compared to semi-distributed search engines, the YaCy-network has a decentralised architecture. All YaCy-peers are equal and no central server exists. It can be run either in a crawling mode or as a local proxy server, indexing web pages visited by the person running YaCy on his or her computer. (Several mechanisms are provided to protect the user's privacy). Access to the search functions is made by a locally running web server which provides a search box to enter search terms, and returns search results in a similar format to other popular search engines. Read more...- Excite (stylized as excite) is an internet portal launched in December 1995 that provides a variety of content including news and weather, a metasearch engine, a web-based email, instant messaging, stock quotes, and a customizable user homepage. The content is collated from over 100 different sources.
Excite's portal and services are owned by Excite Networks, but in the US, Excite is a personal portal, called My Excite, which is operated by Mindspark and owned by IAC Search and Media. Read more...
Yahoo! is a web services provider headquartered in Sunnyvale, California and owned by Verizon Communications through Oath Inc.. The original Yahoo! company was founded by Jerry Yang and David Filo in January 1994 and was incorporated on March 2, 1995. Yahoo was one of the pioneers of the early Internet era in the 1990s.
It was globally known for its Web portal, search engine Yahoo! Search, and related services, including Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Finance, Yahoo! Groups, Yahoo! Answers, advertising, online mapping, video sharing, fantasy sports, and its social media website. At its height it was one of the most popular sites in the United States. According to third-party web analytics providers Alexa and SimilarWeb, Yahoo! was the most widely read news and media website – with over 7 billion views per month – ranking as the sixth-most-visited website globally in 2016. Read more...
Need help?
Do you have a question about Web search engines that you can't find the answer to?
Consider asking it at the Wikipedia reference desk.
Topics
| Active | |
|---|---|
| Inactive | |
Associated Wikimedia
The following Wikimedia Foundation sister projects provide more on this subject:
Wikibooks
Books
Commons
Media
Wikinews
News
Wikiquote
Quotations
Wikisource
Texts
Wikiversity
Learning resources
Wiktionary
Definitions
Wikidata
Database
- What are portals?
- List of portals

