Talk:Bitmap index

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Databases / Computer science  (Rated C-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Databases, a collaborative effort to improve the coverage of database related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Computer science (marked as Mid-importance).
 
edit·history·watch·refresh Stock post message.svg To-do list for Bitmap index:

To be merged with Index (database). Please post comments by 5/19/2007. SqlPac 04:17, 17 May 2007 (UTC)

Priority 8

Something to note?[edit]

Should we note that bitmap indexes are good for equality operations in WHERE clauses, but not so good for less than or greater than operations? - Ta bu shi da yu 15:45, 10 November 2007 (UTC)

Huh, in what way aren't they "good" for < or >? -- intgr [talk] 23:20, 10 November 2007 (UTC)
"Bitmap indexes are also not suitable for columns that are primarily queried with less than or greater than comparisons. For example, a salary column that usually appears in WHERE clauses in a comparison to a certain value is better served with a B-tree index. Bitmapped indexes are only useful with equality queries, especially in combination with AND, OR, and NOT operators." Oracle Concepts document. Reread what they do and I'm sure you'll work out why this is the case. - Ta bu shi da yu 08:42, 11 November 2007 (UTC)
Bitmap indexes can be generated for any logical operation; e.g. you can have a bitmap index for the expression 'salary > 10000' and every time you query WHERE salary > 10000, this bitmap will be useful. I have no idea why Oracle's documentation singles out equality operations as the only use case for bitmap indexes — it isn't. -- intgr [talk] 21:12, 11 November 2007 (UTC)
I suppose that it's to do with a higher cardinality. My understanding of bitmap indexes is that the higher the cardinality (or percentage of unique elements) then the less efficient in storing the data in the index. Could be wrong. - Ta bu shi da yu 02:05, 12 November 2007 (UTC)
Ok I see where's the confusion. When the Oracle documentation talks of bitmap indexes, they actually imply that there will be a separate bitmap for every possible value in an indexed column; for example, one for gender='M', another for gender='F' — for a cardinality of two, 2 bitmaps will be created. When you perform either of these queries, the planner recognizes these expressions and uses the appropriate one of the indexes.
In a similar way, you could create bitmap indexes for two expressions salary > 10000, salary > 20000, and there would be just two bitmaps. However, to use this approach, you have to create indexes for both of the expressions manually, because the DBMS cannot know which expressions your queries will be using. And if the query expression doesn't exactly match any of the index expressions, no indexes can be used. -- intgr [talk] 11:34, 12 November 2007 (UTC)
Ah, I follow. Maybe this is something we should add to the article. - Ta bu shi da yu 12:19, 12 November 2007 (UTC)

Oracle uses BBC to compress bitmap index, it stores each compressed bitmap for each distinct value. This kind of bitmap doesn't support range queries well. Chan had introduced range or interval encoding schemes for range queries and two-sided range queries, which can improve range query performance. In the above example," A>10000", if using range encoding, it only needs two bitmap to answer that query. In oracle it may have to use 10000 bitmaps, so it is slow.

zhuo wang —Preceding unsigned comment added by 218.25.35.20 (talk) 07:43, 10 June 2008 (UTC)

Oracle's compressed bitmap indexes in fact works very well even when the indexed column has many distinct values. This is one of the main points of reference # 1.

Compressed bitmap indexes are very efficient for range queries such as "A > 10000", reference #4 and [1] give proofs in terms of theoretical analysis and timing measurements. In fact, the timing measurements published there are exclusively on range queries (not equality queries). Even though it might involve ORing a lot of bitmaps to answer such a query, each bitmap must be very small (after compression) in this case. Since these bitwise logical OR operations are very efficiently, the answer can be computed quickly.

A historical note: in Dr. O'Neil's paper on Model 204, Query 3 involves a range condition "Ownername > 'Z'" -- even the first commercial implementation of bitmap index did not shy away from range conditions. -- user:oaf2 21:02, 26 August 2008 (UTC)

Usage[edit]

See Prof. Lemire's comment on the usability of bitmap indices. --Ragib (talk) 15:01, 20 August 2008 (UTC)

I just read the article by Vivek Sharma that Daniel Lemire linked to in his post. I think that anyone who is editing this page would do well to read it. Dtunkelang (talk) 15:19, 20 August 2008 (UTC)

Dubious[edit]

For a table with n columns, the total number of distinct indexes to satisfy all possible queries is n!

Not that it matters much, but shouldn't this be 2n-1? Set of all possible indexes is the set of all column subsets, i.e. a power set (minus one, because you can't have a zero-column index). GregorB (talk) 15:16, 15 April 2009 (UTC)

You also have to consider the order of the columns. According to the PostgreSQL documentation about multicolumn indexes:
The query planner can use a multicolumn index for queries that involve the leftmost column in the index definition plus any number of columns listed to the right of it, without a gap. For example, an index on (a, b, c) can be used in queries involving all of a, b, and c, or in queries involving both a and b, or in queries involving only a, but not in other combinations.
This means that, to be able to answer every possible query using indexes, you need one for each possible column permutation, that is, n! --Pezezin (talk) 21:30, 5 May 2009 (UTC)


Order of columns is relevant not only for PostgreSQL. You cannot easily use an index that accesses columns A,B,C (in that order) to execute a query that uses B and C: http://richardfoote.wordpress.com/2008/03/10/index-skip-scan-does-index-column-order-matter-any-more-warning-sign/ --194.139.55.62 (talk) 05:52, 12 May 2009 (UTC)

You're correct, e.g. SQL Server and Oracle will use the index for any leading set of columns. However, my line of thinking was this: a WHERE clause can constrain an n-column query result in at most 2n-1 ways. For each of these WHERE clauses you need exactly one index, covering all the columns used by the clause. So, even if databases could not use leading sets of columns (i.e. even if your WHERE clause columns had to match precisely those in the index), 2n-1 indexes would still be enough. GregorB (talk) 15:06, 10 June 2009 (UTC)

Correct number is C(n/2, n). See here and here for the strict proof (in Russian) Abolen (talk) 18:17, 2 July 2009 (UTC)

Thanks! I've expanded the formula in the text - being a little rusty, it wasn't really obvious to me what C(n/2, n) meant, a condition likely shared by many readers. GregorB (talk) 18:51, 2 July 2009 (UTC)

Reasons for removing the formula[edit]

My point for removing the formula is that it's really irrelevant. What needs to be explained, is that the number of indexes grows very quickly with the number of columns, and this can be explained better in words than with the formula given in the article.

Nobody is going to whip out a calculator and wonder "How many indexes will I need to create in order to satisfy all possible queries to this table?". In practice, the number of different possible queries that make sense is always lower than the theoretical number; for instance, rarely will anyone specify *both* a person's name and email address in a query, because either of these already identifies a person almost uniquely. Even if the user specifies both, an index on either the name or email column will satisfy the query quickly enough because the number of results is so small anyway.

Further, you often have date columns that are exclusively used for range queries -- so they always need to be last in any useful index. Not to mention full text indexes. So the formlua is pretty much useless anyway. -- intgr [talk] 13:47, 13 October 2009 (UTC)

While none will create all possible indexes for 20 columns indeed, this formula is still of interest to many users, since they don't know how big the number is after it had "grown very fast". And yes, many will whip out a calculator and count, just to make sure they don't really want to create this many indexes. This is not a guide to the table indexing, this is an encyclopedia article and it is perfectly valid to cover some theoretical concepts here, even if they are not of immediate practical use. Abolen (talk) 16:56, 13 October 2009 (UTC)


Copyedit[edit]

WikiProject Guild of Copy Editors
WikiProject icon A version of this article was copy edited by lfstevens, a member of the Guild of Copy Editors, on March, 2011. The Guild welcomes all editors with a good grasp of English and Wikipedia's policies and guidelines to help in the drive to improve articles. Visit our project page if you're interested in joining! If you have questions, please direct them to our talk page.
 

Enjoy.

Adding information/examples of how the bitmap is utilized, not just its structure.[edit]

So while the article describes the structure of the bitmap index with an example, as well as many enhancements/variations of it in other sections, it doesn't describe how they are utilized. The analogy would be performing a binary search with a tree structure, really exemplifies what the benefits are of the structure. In other words, there is no example that explains how a database engine might utilize a bitmap index to perform a join.

I tried following this linked reference #2, but the link is broken: http://www.dwoptimize.com/2007/06/101010-answer-to-life-universe-and.html

207.156.50.129 (talk) 23:09, 2 March 2012 (UTC)