The data processing performance is one of the most significant factors that affect the fluent user experience of data driven applications. Badly designed queries for database CRUD operations could easily become a bottleneck and significantly reduce the responsiveness of any application. That is why developers should invest some time for query planning and database schema optimization. One of the most powerful tools in the hands of database developers that helps optimize and speed up the data retrieval process is usage of well-prepared indices. With this post, I will try to introduce the database indices in general and their support in Data Storage for Windows 8

 Database indices are much the same as book index, providing the database with quick jump points on where to find the required data for query execution. A database table can have one or more indices associated with it. The query planner of SQLite deduces which one of them to use during the query processing in order to minimize the execution time. It is up to the developer to create adequate indices for the needs of prepared statements. Of course, the SQLite engine implicitly prepares some optimizations (like automatic table indices on integer primary keys), but they are not enough for complex queries and there is field for further optimization left. 

 Generally, the improvement is not noticeable with small tables but can be significant for large tables. While most people are aware of the usefulness of indices, not everyone is aware of the performance pitfalls associated with creating large amount of indices in a database. Indices can slow down the performance of insert, update and delete queries because the database engine has to maintain indices as well as the related tables. The question that emerges is when to create the indices. There is a general rule that anything that is used to limit the number of required query results should be indexed.

In order to clarify this let us look at some basic examples demonstrating index support in Data Storage for Windows 8. Below are two definitions of domain classes “Person” and “IndexedPerson”. Both contain the same five fields, but the latter additionally has two index definitions – “NameIdx” and “ComplexIdx”. 

class Person
        public long ID { get; set; }
        public string Name { get; set; }
        public int Age { get; set; }
        public DateTime BornDate { get; set; }
        public int PersonHeight { get; set; }  
   class IndexedPerson
        public long ID { get; set; }
        [DatabaseIndex("NameIdx", SortOrder.Asc, 0, false)]
        [DatabaseIndex("ComplexIdx", SortOrder.Asc, 0, false)]
        public string Name { get; set; }
        [DatabaseIndex("ComplexIdx", SortOrder.Asc, 1, false)]
        public int Age { get; set; }
        public DateTime BornDate { get; set; }
        [DatabaseIndex("ComplexIdx", SortOrder.Asc, 2, false)]
        public int PersonHeight { get; set; }  

Next, I will create and measure the execution time of some queries on these two tables in order to prove the concept of performance improvements because of indices. For this test, I will create a demo application that uses the “Person” and “Indexed Person” as entity classes and allows us to measure the exact time needed for query execution. You can find the sources of this application on the following link: 

First, I will fill in both tables with 30 000 records of same data.

Insertion times slightly differ for indexed and non-indexed tables and this is because of the extra processing commands required for indices initialization and maintenance. The results show about 6% of slower insertion time for indexed table. We should consider this in scenarios with massive data insertions or updates. In any case, we can say that this delay is fully acceptable, as we will see the performance benefits for select queries that are mostly used in real applications.

The first query will be a simple select query on person name:

SELECT * FROM <(Indexed)Person> WHERE Name = “Person Name 4505”;

This query is executed about 6 times faster on indexed table “IndexedPerson” than the same one but on non-indexed table. 

Let us explain what is happening for this simple query on indexed table. The SQLite query planer checks the “where” clause and decides to use the index “NameIdx” because it is created on “Name” field only and this makes it the best candidate for this query. Here is an abstract view of the relation between “NameIdx” index and “IndexedPerson” table:

The index contains a sorted list of the person names along with references to the records that contain these values in corresponding table. The SQLite engine can quickly find the required value in an index by searching in the sorted list and then use the memory reference to jump directly to the record in the original table. Without an index, the engine must sequentially search the entire database table row by row to find the record with value “Peter” for “Name” field. 

The same results are observed for the following query:

SELECT * FROM < (Indexed) Person> WHERE Name = ' Person Name 4505' AND Age == 30 AND PersonHeight == 170;

In this case, the SQLite engine uses “ComplexIdx” index and the execution time is about 16 times shorter. It is interesting to see that for the last two queries the execution time for non-indexed table is the same. That is because in the both cases the table should be fully traversed in order the required records to be found.

Next, we will remove the index “NameIdx” and run the following query:

SELECT * FROM < (Indexed) Person> WHERE Age == 30 AND PersonHeight == 170 AND Name == ' Person Name 4505';

The execution is again about 15 times faster, because the “ComplexIdx” index is used by SQLite engine even if the field order is mixed. This is because all three fields are combined with “AND”.

The most interesting results are observed for the following queries:

SELECT * FROM < (Indexed) Person> WHERE Age == 30 AND PersonHeight ==170;

SELECT * FROM < (Indexed) Person> WHERE Age == 30 AND PersonHeight >= 170;

Тhe execution time in this case is 1.5 times slower for indexed table, thus we can conclude that SQLite engine uses one of  the available indices but it is not well configured and does not improve the performance and even the execution is slightly slower. To prove this let us remove the “Name” field from “ComplexIdx” index and run again the queries. This action results in two times faster execution. Now “ComplexIdx” is well configured for these two queries and is used appropriately. The order and position of fields in an index is important and should correspond to the order in “WHERE“ clause.

All queries mentioned above use “AND” as comparison operator. Let us look at some queries with “OR” operator.

SELECT * FROM < (Indexed) Person> WHERE Name = 'Peter' OR Age == 79;

SELECT * FROM < (Indexed) Person> WHERE Age == 79 OR PersonHeight >= 109;

The execution for both the indexed and non-indexed table takes the same amount of time, proving that indices do not help at all in this case.

Let us compare the following two queries:

SELECT * FROM < (Indexed) Person> WHERE Name = 'Peter' AND Age == 79 OR PersonHeight == 109;

SELECT * FROM < (Indexed) Person> WHERE Name = 'Peter' AND (Age == 79 OR PersonHeight == 109);

The execution of the first query takes the same time for the both tables. The second one is executed 10 times faster on the indexed table than on the non-indexed. The conclusion from this test is that if we group the OR statement in parenthesis the indices are applied and greatly improve the performance.

Let us summarize the conclusions we have made until now:

  • If your “WHERE” conditions are based on one field at a time (for example: Name = “Peter”) then create an index on these fields
  • Generally the order of fields in conditions is significant for index performance, that is why the indices should use the fields in the same order as they appear in “WHERE” clause
  • If the logical operator in where clause is “AND”, then build a complex index on all fields used in condition.
  • If the logical operator in where clause is OR, indices do not improve the performance. In this case, they are useless, so avoid them.
  • If the where clause contains both “AND” and “OR” operators, then consider grouping the “OR” conditions.
Index support in Data Storage gives to the developers a powerful tool to improve the execution time of queries. It is important to understand the database engine behavior in order to define indices correctly. Developers should measure the worst-case execution time of queries in their application and finely tune up the related indices. For more information about how SQLite query planner uses the indices, you can check the following documentation article: SQLite Query Planning.
You can download the test project from here.

Real results from query execution:

Related Posts


Comments are disabled in preview mode.