Tutorial-JonoonDB

Tutorial

This tutorial assumes that you are familiar with serialization library flatbuffers and know how to use it. The complete code for this tutorial is available on github.

Prerequisite

You would need the flatbuffers compiler flatc and its header files. You can build it from sources by following directions at this link.
You would need the JonoonDB library.

Limitations related to flatbuffers

There are some limitations with regards to using flatbuffers in JonoonDB. Most of these limitations will be removed in future releases of JonoonDB. These limitations are:

Currently you cannot query union and vector fields using SQL. They can be part of your schema and JonoonDB will store your data fine but you won't be able to use any of the union or vector fields in your SQL.
Currently unsigned 64 bit integer (ulong) is not supported for storage or sql queries. Internally JonoonDB has a signed 64 bit integer type and no unsigned 64 bit integer. If this is important for you then let us know and we may start supporting it. Until then the recommendation is to use long type instead.

Open a database

A JonoonDB database has a name which corresponds to a file system directory. All of the contents of database are stored in this directory. The following example shows how to open a database, creating it if necessary:

#include "jonoondb/jonoondb_api/database.h"

using namespace std;
using namespace flatbuffers;
using namespace jonoondb_tutorial;
using namespace jonoondb_api;

// Open a database with default options
Database db("/path/to/db",     // path where db files will be created
            "game_of_thrones"  // database name              
);

If you want to raise an error if the database already exists, use the other constructor overload of jonoondb_api::Database class.

Options opt;
opt.SetCreateDBIfMissing(false);
Database db("/path/to/db",     // path where db files will be created
            "game_of_thrones", // database name 
            opt                // options             
);

Create a collection

In JonoonDB collections are like tables. A given database can have 1 or more collections. Each collection also has a schema which specifies the collection fields and their types. Currently JonoonDB only supports flatbuffers schema type.

Schema

Lets look at the flatbuffers schema that we will use for our tutorial.

namespace jonoondb_tutorial;

table Actor {
  name: string;
  date_of_birth: string;
  birth_city: string;
}

table Character {
  name:string;
  house:string;  
  played_by:Actor;
  age:int;
  first_seen:string;  
}

root_type Character;

Before proceeding any further we need to generate few files using the flatbuffers compiler flatc. Save the schema shown above in a text file and name it character.fbs. Next compile this file using the following command.

flatc -c -b --schema character.fbs

This will generate a header file character_generated.h and binary flatbuffers schema file character.bfbs. The character_generated.h file has helper functions to generate flatbuffers object for Character type. The chracter.bfbs has the same schema shown above but in binary form. This is required because internally JonoonDB uses flatbuffers reflection mechanism to read flatbuffer objects and for reflection we need the binary schema.

The code below reads the binary flatbuffers schema from the file characters.bfbs and creates a collection.

auto schema = ReadFile("path/to/character.bfbs");
vector<IndexInfo> indexes;
db.CreateCollection("character",                  // collection name   
                    SchemaType::FLAT_BUFFERS,     // collection schema type
                    schema,                       // collection schema
                    indexes                       // indexes to create
);

The indexes parameter specifies the indexes that should be created for this collection. The use of indexes is covered in the indexing section here. For now we will not create any indexes and pass an empty vector. ReadFile() is just reading the entire binary file and returning its contents. You can look at its implementation here.

Insert single document

First we will construct a flatbuffer object. We are using the CreateActor() and CreateCharacter() functions that were generated by the flatc compiler inside character_generated.h. Next we construct a object of type Buffer that we pass to the db.Insert() function.

#include "character_generated.h"

FlatBufferBuilder fbb;
auto actor = CreateActor(fbb, fbb.CreateString("Peter Dinklage"),
                         fbb.CreateString("Morristown"),
                         fbb.CreateString("1969-06-11"));
auto obj = CreateCharacter(fbb, fbb.CreateString("Tyrion Lannister"),
                           fbb.CreateString("Lannister"),
                           actor,
                           39, fbb.CreateString("Winter is Coming"));
fbb.Finish(obj);
Buffer tyrion(reinterpret_cast<char*>(fbb.GetBufferPointer()), // Buffer pointer
                  fbb.GetSize(), // Buffer size
                  fbb.GetSize(), // Buffer capacity
                  nullptr);      // Deleter func ptr, nullptr means don't delete memory
db.Insert("character",   // collection name in which to insert
          tyrion         // data that is to be inserted
);

One important thing to note here is how the Buffer object "tyrion" was constructed. Buffer class objects can be of two types. They either own the underlying memory in which case they will delete the underlying buffer on destruction. The second type is just a view on top of some memory and on destruction it does not delete any memory. Here we are using the latter type by specifying the function pointer to deleter as nullptr.

Insert multiple documents / Bulk Insert

The MultiInsert() function is optimized for loading large number of documents into the database. It is way faster than Insert() function and should be the preferred way to load data into the database.

std::vector<Buffer> characters; // vector to hold all documents to be inserted
fbb.Clear(); // This is necessary if we want to reuse flatbufferbuilder
actor = CreateActor(fbb, fbb.CreateString("Kit Harington"),
                    fbb.CreateString("London"),
                    fbb.CreateString("1986-12-26"));
obj = CreateCharacter(fbb, fbb.CreateString("Jon Snow"),
                      fbb.CreateString("Stark"),
                      actor,
                      21, fbb.CreateString("Winter is Coming"));
fbb.Finish(obj);
characters.push_back(
    Buffer(reinterpret_cast<char*>(fbb.GetBufferPointer()), // Buffer pointer
           fbb.GetSize()  // Buffer size
    )
);

fbb.Clear();
actor = CreateActor(fbb, fbb.CreateString("Aidan Gillen"),
                    fbb.CreateString("Dublin"),
                    fbb.CreateString("1968-04-24"));
obj = CreateCharacter(fbb, fbb.CreateString("Petyr Baelish"),
                      fbb.CreateString("Baelish"),
                      actor,
                      51, fbb.CreateString("Lord Snow"));
fbb.Finish(obj);
characters.push_back(
  Buffer(reinterpret_cast<char*>(fbb.GetBufferPointer()), // Buffer pointer
         fbb.GetSize()  // Buffer size
  )
);

db.MultiInsert("character",   // collection name in which to insert
               characters     // data that is to be inserted
);

Note that here we are constructing the Buffer using a different constructor overload. This type of Buffer has its own underlying memory that it will delete on destruction. This was necessary here because we are reusing the FlatbufferBuilder object fbb. We do that by calling fbb.clear() before constructing every new character. Hence the underlying fbb memory is no longer valid after this call, so we create a copy of it before the clear() call.

There are other more optimized ways to go about this as well for example instead of creating new object of type Buffer, you can have a reusable pool of Buffer objects. Another way could have been that you use a different FlatbufferBuilder object to construct each character and then use the technique we used above in the Insert() call. Remember we created the Buffer by specifying the deleter function pointer as nullptr. The approach that will work best for you depends on your application but all of these could be a viable solution depending on your needs.

Querying data

JonoonDB supports quering the documents using SQL. Consider the following example:

auto rs = db.ExecuteSelect("SELECT name, house, age "
                           "FROM character;");
while (rs.Next()) {
  auto name = rs.GetString(rs.GetColumnIndex("name"));
  auto house = rs.GetString(rs.GetColumnIndex("house"));
  auto age = rs.GetInteger(rs.GetColumnIndex("age"));
}

ExecuteSelect() function can be used to issue SELECT statements. The functions returns a Resultset object. The rs.Next() moves to the next document in the resultset and will keep returning true until there are more documents avaiable in the resultset.

Here is another example where we are using the some query constraints:

rs = db.ExecuteSelect("SELECT name, house, age "
                      "FROM character "
                      "WHERE age > 10 AND house = 'Stark';");
while (rs.Next()) {
  auto name = rs.GetString(rs.GetColumnIndex("name"));
  auto house = rs.GetString(rs.GetColumnIndex("house"));
  auto age = rs.GetInteger(rs.GetColumnIndex("age"));
}

Getting The Raw Documents

The queries written above gives you the data as structured resultset. What if you want to get back the raw document blob that you inserted? Each collection has a virtual hidden column named _document. When used in a query this evalutes to the raw document blob that was originally inserted. For example:

rs = db.ExecuteSelect("SELECT _document FROM character;");
while (rs.Next()) {
  auto doc = rs.GetBlob(rs.GetColumnIndex("_document"));      
}

This will return the original document that we inserted.