Indexing
Indexing is the mechanism that enables JonoonDB to do efficient lookups, aggregations and sorting. Without indexes JonoonDB must do a full collection scan i.e scan all documents in the collection. JonoonDB's approach to indexing is very different to traditional databases. This is what sets it apart from the pack as well.
In traditional databases the practical number of indexes that can be created on a table is very low. The reason is simple, as you create more that few indexes the insert performance becomes so slow that the whole database comes to a halt. In JonoonDB all indexes are maintained in memory using fast in-memory data structures. This enables JonoonDB to maintain fast insert performance even with dozens of indexes. This of-course means that your indexes should fit in memory for good performance but that is a requirement that you will find in all leading databases.
The second big difference is how JonoonDB query planner can use multiple indexes in a single query plan. Many leading databases can only use 1 index in a given query plan. JonoonDB can use multiple indexes in a single query plan.
The third difference is the extensible indexing design in JonoonDB. JonoonDB is designed from the grounds up where different type of indexes can be developed and added. This means that more and more index implementations will be added in future. This gives a tremendous amount of flexibility and you as a user can write your own index implementation if you want. You will never be locked in to what a given database provides. This is one of the important features that enables JonoonDB to be a one size fits all database.
The following index implementations exist in JonoonDB.
- InvertedCompressedBitmap: In this data structure. The values are mapped to the document ids in which they exist. For example if you have a field State then value such as CA will be mapped to the document ids which have field State = CA. Hence the word inverted. The document ids are stored as compressed bitmap which enables huge space saving if used on the right field. Further all values are stored in a sorted tree data structure which enables efficient range based lookups. For example if you have a column Age and you want to do a query like Age > 10 and Age < 20. You should almost always use this index type for low cardinality (less than 50K distinct values but always measure it yourself) columns but even with high cardinality columns they can yield superior performance. Here is a link to an article that talks about bitmap indexes and offers good advice and benchmarks.
- Vector: This index data structure is a simple vector (array) of values as they exist in the documents. The index of the vector is the document id and content at that index location is the actual value. This is a good default if you want to arrange your data as a column store. The column oriented data results in really fast scans and aggregations. Its also much faster to insert data in this data structure.
The following code snippet builds on top of the tutorial example and shows how to create an index.
vector<IndexInfo> indexes; indexes.push_back(IndexInfo("idx_name", // Name IndexType::VECTOR, // Type "age", // Indexed Field true) // IsAscending ); indexes.push_back(IndexInfo("idx_age", // Name IndexType::INVERTED_COMPRESSED_BITMAP, // Type "age", // Indexed Field true) // IsAscending ); db.CreateCollection("character", // collection name SchemaType::FLAT_BUFFERS, // collection schema type schema, // collection schema indexes // indexes to create );