What are composite indexes?
Database engines (e.g. MySQL, PostgreSQL, etc.) can create composite indexes on table data. That is, indexes on multiple columns at once, instead of indexing a single column. For instance, MySQL allows indexes of up to 16 columns. A multiple-column index can be thought of as a sorted array that hold values that are created by concatenating the values of the indexed columns. A composite index definition looks like this:
1 2 3 |
index( column_A, column_B, column_C ) |
When should you use a composite index in a database?
The simple answer is to use composite indexes when you are using queries that benefit from it. Digging a bit deeper, you should multiple-column indexes for queries that require joining, filtering, and sometimes selecting by testing all the columns in the index at once.
You can also take advantage of composite indexes during queries test just the first (leftmost) column on the index, the first two columns, the first three columns, and so on. So the above index will also accelerate queries that require testing rows for
1 2 3 4 5 |
column_A, column_B, column_C column_A, column_B column_A |
This optimization past assumes that there are no better indexes on those specific fields, such as a single-field index on field A. On the other hand, this composite index will not be useful for queries that search for:
1 2 3 |
column_A, column_C |
Notice how column_B is missing. MySQL cannot use the index to optimize lookups if the columns do not form a prefix of the index, so composite indexes such as the above cannot be used to speed up this query. In consequence, if no other indexes exist, the optimizer will turn to perform a full-table scan.
This means that if you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table without the problems derived from using several single-column indexes.
What happens when other indexes on those columns exist?
When performing a query on column_A, column_B, column_C and a multiple-column index exists on those fields, the rows can be fetched directly. Now, if separate (single-column) indexes exist on column_A, column_B and column_C, the optimizer attempts to use Index Merge optimization (see “Index Merge Optimization” on the MySQL manual). In other words, the optimizer attempts to find the most restrictive index by deciding which index excludes more rows and using that index to fetch the rows. In the end, it is the optimizer who determines whether which indexes are used during the query.
Know your data
In the end it’s your data and (how your application needs to retrieve it) what defines how indexes should be created to optimize queries without wasting resources (memory, CPU time and disk space). Sometimes composite indexes are best, but sometimes it’s better to create separate indexes and rely on index-combined searches.
For instance, if your data is a set of geographic data you may want to check what spatial databases have to offer before moving ahead an using a composite index for indexing latitude/longitude data.
Conversely, if your workload includes a mix of queries that sometimes involve only column X, sometimes only column Y, and sometimes both columns, you might choose to create two separate indexes on X and Y, relying on index combination to process the queries that use both columns.
You could also create only the composite index on (X, Y). This index would typically be most efficient than index combination for queries involving both columns but, as stated above, it would be almost useless for queries involving only column Y.
A combination of the multi-column index and separate indexes on Y would serve reasonably well in these cases. For queries involving only X, the multi-column index could be used, though it would be larger and hence slower than an index on X alone.
The last alternative you should point to is to create all three indexes, but this is probably only reasonable if the table is searched much more often than it is updated and all three types of query are common. Since indexes are updated every time data changes, it can skyrocket resources consumption so for tables that change very often.