Talk:Faceted search

The following Wikipedia contributor may be personally or professionally connected to the subject of this article. Relevant policies and guidelines may include conflict of interest, autobiography, and neutral point of view.

Dtunkelang (talk · contribs)

Taxonomy

In the distinction made for "multiple classifications" vs. a "single, pre-determined, taxonomic order," I'm not sure that "taxonomic" should be used to describe a hierarchical type of organization, since taxonomies are not generally required to be "single" or "pre-determined" or hierarchical as is implied here. 69.91.164.31 (talk) 19:23, 20 January 2009 (UTC)[reply]

Can you suggest more precise wording? The distinction is an important one, and I think the only issue is finding the precise words to describe it. Dtunkelang (talk) 18:41, 2 February 2009 (UTC)[reply]

I don't think faceted-search implies this distinction at all! How about faceted search as a method of classifying search results according to one or more methods of extracting patters from the the results. Examples include patterns derived from text-clustering the results, linked attributes of results, classifications of results to external data such as taxonomies and folkonomies, etc. (In my opinion) back in the day Oren Etzioni's Grouper was doing faceted search via simple clustering of results, as was Northern Light faceting results via a pre-computed taxonomy (the loose definition - see below). I'm also not sure that the act of doing an imprecise classification into a taxonomy implies that a object could not have a measure of fitness into more than one node of the taxonomy... or that the word 'taxonomy' must be taken to mean that nodes/leaves of the tree must live in only one place in the tree. Nealrichter —Preceding undated comment added 15:19, 21 May 2009 (UTC).[reply]

Marti Hearst makes a distinction between clustering and faceted search in "Clustering versus faceted categories for information exploration" that I agree with. There is a looser category of exploratory search interfaces, but I don't think we should call them all faceted if they're not. What does faceted mean, if not that there are multiple facets? I'm not knocking other approaches, just trying to be precise. Dtunkelang (talk) 15:47, 21 May 2009 (UTC)[reply]

This smells like a straw-man argument to me. Are you making the assumption that clustering means each document lives in only one group? Or that one could not do a first level clustering based upon meta-data (like faceting uses). While I agree that vanilla clustering has the problems she described (my commercial implementations suffered from these issues) once meta-data is included as a heavy factor and multiple assignment is allowed the results can become indistinguishable from faceting. Both techniques are a function that produce a filtering/grouping of results by some label where the assignment of the labels to documents is governed by some extraction + classification task.

Maybe this boils down to the interpretation/semantics of the words 'clustering' and 'faceting'. If one assumes that faceting implies multiple membership (dimensions) in meta-data and clustering must be interpreted as being the application of some classic-ish clustering-algorithm then I agree with Hearst. Yet those two words have broader meanings. Nealrichter (talk) 16:39, 21 May 2009 (UTC)[reply]

Nealrichter, I actually think you have boiled down a useful technical description of the basic distinction. I realize there are other elements, but your formulation (with notes) works for me:

faceting implies multiple membership (dimensions) in meta-data and clustering must be interpreted as being the application of some classic-ish clustering-algorithm

We shouldn't ignore the other, broader, issues, but this is a darn good start. --Searchtools (talk) 18:10, 21 May 2009 (UTC)[reply]

Let me try a different tack: do we agree that faceted search assumes a faceted classification of the information being searched? If so, do we agree that not all ways of organizing result sets use faceted classification? Specifically, can we agree that using a pairwise document similarity measure to arrange documents into groups doesn't use a faceted classification? And that neither does using a single hierarchical topic organization? If I understand what you mean about faceting implying multiple membership, then we're all on the same page. Dtunkelang (talk) 18:38, 21 May 2009 (UTC)[reply]

I can agree with your points and I'm mostly on the same page with you. However one could define a pairwise document similarity that uses metadata as highly weighted feature in the comp. Example: Imagine kmeans that is seeded (non-randomly) with n buckets of n "facet-labels". Each document is classified (with high scoring of metadata) to one or more buckets and the means updated w/ the original seed marked as special. At the end we have groupings of documents and mean vectors still containing the marked seed yet with other related dimensions relevant to the cluster. After completion accumulate the means into a single list of unique dimensions and counts. Is this clustering or classification now? It certainly still has the property of a hill-climber like kmeans yet the outputs are a set of dimensions, counts & membership weights that can be used for faceting the documents (possibly with facets they didn't explicitly contain a-priori!). I'm cheating here, I wrote a paper and code (with co-workers) extending this method to a hierarchical clustering.

I guess I'm arguing for a definition of faceting that supports any suitable method of assigning documents to a set of n distinct dimensions and subsequently allowing a UI to filter based upon that membership... without being unnecessarily pejorative to 'clustering algorithms'. Nealrichter (talk) 19:21, 21 May 2009 (UTC)[reply]

I think the fundamental thing that faceted search requires is the faceted classification, like Dtunkelang said. That means that unlike the clustering approach outlined by Nealrichter above, a document does not live in exactly one location, but rather exists in multiple locations. If we had two facets, brand and type, then an Acme Widget exists in both the brand=Acme and type=widget locations. You don't have to decide if an Acme Widget is more "Acme" or more "Widget," it's both, because it 'is' both. However this doesn't explain where the facets come from. When we cluster documents together, we're just putting together statistically similar documents. There's no semantics about what gets put together. Where as with facets, the documents are placed into semantically meaningful ways. This is because a human is in the loop. A human says, "What type of metadata can we talk about? Let's group by that." Conversely, in clustering, a machine puts the documents into groups and then a human says, "Okay, so why are these documents grouped together?" We've seen before (I'm sure I can find a citation if requested.) that labeling clusters of documents is tricky, and it's even more tricky to get machines to do it well.

So in conclusion: No, clustering is not facets.

1. Documents are categorized into multiple orthogonal hierarchies, where as in clustering a document belongs to only one cluster. Even in hierarchical clustering, the document belongs to only one hierarchy.

2. Facets group the documents into semantically meaningful ways. We know this, because the facets that describe the document are manually chosen. In clustering, documents are automatically grouped based on some similarity metric, such as cosine similarity, or correlation, or probabilistic methods. This can lead to clusters that appear to be noisy when judged by human users.

128.114.60.40 (talk) 22:08, 21 May 2009 (UTC)[reply]

Please read it again. I made pains to state that clustering does not have to imply single membership. It's possible to build a faceting alg that uses a clustering approach and pre-labeled facets. There is a strong tendency to assume that clustering implies singular membership in a cluster and that the cluster labeling problem is too hard. Neither of these is iron clad true if you allow multiple membership in clusters and utilize the same meta-data used by faceted classification to produce labels for the clusters. The point of the kmeans derived algorithm was to demonstrate this idea. I'm arguing that 'clustering' should not be used as a straw-man to define faceting as NOT clustering.

We have a good working definition now that faceting means multiple membership of documents to multiple possibly orthogonal facets and that the facets should have some semantic meaning. —Preceding unsigned comment added by Nealrichter (talk • contribs) 23:45, 21 May 2009 (UTC)[reply]

New distinction via Dtunkelang (private communication). Facets are defined to be key-value pairs where document can be associated with multiple keys and each key may have multiple value assignments. Machine assignment is allowable. Since classic clustering approaches (bag of words) are not structured into multiple keys it's not faceting. This definition does NOT preclude usage of clustering algorithms to infer or generate new key-value assignments to documents. While faceting is a form of result refinement, not all result refinement is faceting. Nealrichter (talk) 02:47, 22 May 2009 (UTC)[reply]

The very first question on the page is about the distinction between faceted metadata and taxonomy. Maybe we could make the differences more specific, like this 'facets might include topics, subjects, or concepts (like traditional taxonomies), but facets are not limited to those elements'. For example, few taxonomies would include structures for price, size, compatibility or date, while these are common facets in online catalogs. That way we could remove the question of taxonomy structure and concentrate on the meaningful differences between various data structures. --Searchtools (talk) 18:10, 21 May 2009 (UTC)[reply]

Razorbase External Link

razorbase as a faceted browser —Preceding unsigned comment added by 76.73.133.188 (talk) 15:06, 19 May 2009 (UTC) Regarding the addition of razorbase.com to the list, the article says that FBs "allow users to explore by filtering available information". Go to the home page, type "Bill Clinton", and clicked the 'named' linked, then choose 'connected to'. Now in the resulting page, click the Categories tab, then click one of the blue right arrows. Now you have a list of things related to Bill Clinton that only belong in that category. You got to this by filtering based on Category (you can also do it based on information via the 'About' tab). Click the 'Your query' link to view the filters, the controls there allow you to remove and add filters.[reply]

I talked with Sherman Monroe (who wrote the previous comment) about the razorbase.com link. I still am not convinced either that it is a faceted browser or that, even if it were, it would be an appropriate external link. We've moved our private dialogue into the talk page to promote public discussion. Dtunkelang (talk) 15:12, 19 May 2009 (UTC)[reply]

PrismaStar

Looks like I'm having a little reversion war with 78.105.108.216 over the inclusion of the following sentence: "Newer solutions employing faceted search are increasingly being offered to retailers by companies such as PrismaStar. Such solutions can enable faceted search results to be ordered based on their relevance, rather than simply filtered in or out entirely." I think it's spammy, and that PrismaStar, which is marked as an orphan, isn't notable enough for inclusion. I'm being accused of WP:CONFLICT because of my past affiliation with Endeca and my present one with Google. Perhaps others without any real or perceived conflicts of interest can chime in. Dtunkelang (talk) 14:10, 7 December 2009 (UTC)[reply]

Company Mentions

It seems that a recurring issue on all Wikipedia entries related to search is that companies want to be mentioned in the entry (see the previous two sections as examples). I've included only a handful of companies in this entry that are not only notable enough to have Wikipedia entries, but have established associations with faceted search. Since I have a past affiliation with Endeca, I could be accused of bias, but I count on others to keep the entry honest.

Nonetheless, I refuse to let this entry become overrun with mentions from companies that don't meet the above criteria--that quickly devolves into spam. I'd sooner remove all company mentions--and even mentions of open-source software if those are controversial too. I've been maintaining this page with something of an iron fist, but I'm open to discussion here if anyone disagrees with my approach. Dtunkelang (talk) 16:03, 30 December 2009 (UTC)[reply]

That does it. I've eliminated all references to companies, including ones I feel are worth including. Hopefully everyone can live with this as a fair solution. I refuse to let this entry become a cesspool of spam. Dtunkelang (talk) 01:57, 5 January 2010 (UTC)[reply]

External Links

I'm concerned that any site that is an "example of faceted search" might show up in the external links. Can we agree on a standard of notability and/or content type? Dtunkelang (talk) 18:19, 7 May 2010 (UTC)[reply]

I'm going to start taking a hard line on external links: no links to pages that are just example applications, and no purely commercial links. Links should either be to free, open-source software or to educational materials. Wikipedia is not a sales and marketing tool. Dtunkelang (talk) 13:37, 13 May 2010 (UTC)[reply]

WP:UNDUE concerns

Over half the citations and a big gob of the text are devoted to the work of one researcher, plus colleagues. One of the cites might be to self-published work. Yakushima (talk) 12:17, 4 September 2011 (UTC)[reply]

Too technical

The wording of this article is recursive, which is very poor English. First, the article needs to begin with a simple definition of faceted search. "Faceted search, also called faceted navigation or faceted browsing, is a technique for accessing information organized according to a faceted classification system..." This clause "according to a faceted classification system" is using "faceted" to described a "faceted search", which is horrible English because you have not defined the term "faceted". When using technical terms, you need to define what they mean, otherwise you lose all the people who want to read this article, but do not have the technical background relating specifically to this topic.

An introduction should be simple, not delving deep into the topic. That should be saved for the following paragraphs.

Wikipedia is going to lose its interest to a wide variety of stakeholders if the articles become too academic with nothing for the lay person to read and understand. Save the heavy academic writing for later in the article.

It is truly shameful that a person can't look at this article and grasp in the first two sentences what "faceted browsing" or "faceted searching" or "faceted navigation" means in simple terms. I consider that a real laziness on the part of the author.

IT personnel have been accused for decades of not being able to communicate well. This article is a clear example of that.

As a software engineer and consultant with over 20 years experience, I have dealt with stakeholders from all walks of life - from secretaries to CEO's, managers, factory floor assembly workers and engineers, etc. Those are the people Wikipedia needs to reach and this article surely does not communicate to them. — Preceding unsigned comment added by 84.24.63.85 (talk • contribs) 06:10, 31 December 2014‎

FS different from faceted classification

I disagree with the statement that faceted search is search against data organized with a faceted classification. There are very very few actual faceted classifications in use, and most online sites with facets are simply using regular metadata attributes to provide limits. So this article either needs to limit itself to data that for which faceted classification has been applied, or it needs to drop the part about mass market search, since that is merely about offering useful limiters on the page. Also, there's nothing faceted about WorldCat AFAIK, other than their use of FAST, but because FAST does not link the facets it is actually a removal of facets rather than an application of them. In other words, I think this article is highly problematic, at best. Even the references are poor. I seriously doubt the information presented here. LaMona (talk) 20:07, 8 May 2015 (UTC)[reply]

Aren't facets just database views, searches, and ordering?

These database concepts are ancient. Why does this new terminology exist?

The language is abstract and ambiguous:

"A faceted classification system classifies each information element along multiple explicit dimensions, called facets, enabling the classifications to be accessed and ordered in multiple ways rather than in a single, pre-determined, taxonomic order.[1]"

So what is:

an information element?
a dimension of an element?
a classification? (probably a specific value of a "dimension")

Do facets only apply to information elements stored in a predefined order? (implied as part of the definition here)

This is a classic example of computer science trying to be excessively abstract to prove sophistication and just muddling up a simple concept. Just say "search attributes" like they did in the 1980's. — Preceding unsigned comment added by BenjaminGSlade (talk • contribs) 11:31, September 20, 2021 (UTC)

Your points do indeed deserve more attention, and seem to be somewhat related to other comments above.

At the very least, a reference to something like Search attributes, Parametric search (user interface)/Parametric search (Information science), or Multidimensional search seems warranted.

(Most of the items listed at Category:Search algorithms appear specifically focused on computer programming theory, for example. Perhaps Category:Library cataloging and classification and Category:Knowledge representation may provide more leads?)

Related: Does Wikipedia have an existing, non-technical, outline or summary of Information search methods that we could link to here? Jim Grisham (talk) 00:08, 23 July 2022 (UTC)[reply]

Faceted semantic browsers?

This page was long ago moved from the title Faceted browser, but all of the information regarding non-website applications appears to have since been removed.

(see, e.g., the last revision prior to page move; c. 2008)

A subsection titled Faceted browsers or Faceted semantic browsers should probably be re-added to the current article. Does anyone know of a good reason why that information was removed and therefore why it should not be restored in some fashion?

Over time, many academic references, descriptions, and other ‘non-spammy’ details were removed over the years, e.g.:

another example of removed content

Finally, there was also a separate article titled Informative Faceted Searching (see last revision prior to blanking and redirect ) that was redirected here - it is not clear how much of that article’s content was actually migrated, however.

- Jim Grisham (talk) 01:50, 23 July 2022 (UTC)[reply]