Understanding MongoDB performance - Part 2 Indexes

0 25
Avatar for Skimo
Written by
2 years ago

In part 1, we learned how hardware affects MongoDB performance. We also saw what indexes are and how are they stored on disk.

In this part, we will dive deep into types of indexes and their usage.

First in the list are:

Single field Indexes

These are simplest indexes in MongoDB.

Syntax: db.<collection>.createIndex({<field>: <direction>})

<collection> will be replaced with actual collection name in above query.

Similarly, <field> will be replaced by actual field name

Lets say we have a people collection in some database which contains following fields


first_name
last_name
ssn
job
address
email
...

Run the following query and examine the results:

db.people.find({ssn: "720-38-5636"}).explain("executionStats")

Results:


{
    "queryPlanner" : {
        ...
        "winningPlan" : {
            "stage" : "COLLSCAN",
            "filter" : {
                "ssn" : {
                    "$eq" : "720-38-5636"
                }
            },
            "direction" : "forward"
        },
        "rejectedPlans" : []
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 1,
        "executionTimeMillis" : 698,
        "totalKeysExamined" : 0,
        "totalDocsExamined" : 50474,
        ...
    },
    ...
}

See the value of winningPlan, the stage is COLLSCAN, this means the data is fetched via a column scan which means that MongoDB had to go over each row and check for the specified column.

Also see, executionTimeMillis is the time the query took and totalDocsExamined is the number of documents the MongoDB server had to examine/check to find the result.

Now, lets see how adding an index affects our query.

Let's create an index on ssn field by running:

db.people.createIndex({ssn: 1})

Now, again run the previous query and examine the output.


{
    "queryPlanner" : {
        ...
        "winningPlan" : {
            "stage" : "FETCH",
            "inputStage" : {
                "stage" : "IXSCAN",
                ...
                "direction" : "forward",
                "indexBounds" : {
                    "ssn" : [ 
                        "[\"720-38-5636\", \"720-38-5636\"]"
                    ]
                }
            }
        },
        "rejectedPlans" : []
    },
    "executionStats" : {
        "executionSuccess" : true,
        "nReturned" : 1,
        "executionTimeMillis" : 5,
        "totalKeysExamined" : 1,
        "totalDocsExamined" : 1,
        ...
        }
    },
    ...
}

Again, See the value of winningPlan, the stage is IXSCAN which means that data is fetched by scanning index.

Also see the value of executionTimeMillis, it says 5 milliseconds, which is way much smaller than 698.

totalDocsExamined is 1 which means MongoDB had to go through only 1 document to fetch the results.

See, how performant are indexes. We just saved a lot of time for getting the same result.

MongoDB  also allows to create index and query on a sub-document and fields of a sub-document. But we should never create indexes on entire sub-document because then we have to query on the entire sub-document. Instead we should index a field of a sub-document.

Indexes also work on ranges, $in queries and if we query on two fields where only one field is indexed.

  • Run: db.people.find({ssn: {$in:["001-29-9184", "177-45-0950", "265-67-9973"]}, last_name: {$gte: "H"}}).explain("executionStats") and examine the results

Sorting

  • In memory - Data is copied from disk to RAM and sort is applied, hence slow and expensive. 32 MB is limit and when its reached, sorting is aborted.

  • Using indexes - Since fields are already ordered by the field specified during index creation, there is no need to sort. Hence its much faster and efficient. E.g. if we have an ascending index on ssn, the records will be automatically sorted.

    Try the following:

    db.people.find({}).sort({ssn: 1}).explain()

    db.people.find({}).sort({first_name: 1}).explain()

    And examine the results

When sorting by an indexed field, asc and dsc sort will have the same effect.

We just learned about Simple indexes. Now lets learn about more powerful index which is

Compound Index

An index on two or more fields is called compound index. For example, index created on both last_name and first_name will be called compound index.

db.people.createIndex({last_name: 1, first_name: 1})

Compound indexes are also stored in one dimension.

Analogy for above example can be seen in phone books

Now before we move forward, here is a task for you.

  1. Create a comments collection.

  2. Insert some data(can be copied from here)

  3. Run a find query db.comments.find({name: "alias odio sit", email: "Lew@alysha.tv"}).explain("executionStats")

  4. Examine the results

  5. create a simple index on name, run the same query and check results

  6. create a compound index on name and email, run the same query and check results.

Okay, so, this is too much to digest in one go. Take your time to understand and run the queries.

Sponsors of Skimo
empty
empty
empty

There is still so much to learn. We'll continue with other types of indexes in upcoming part.

Stay tuned and happy learning... : ).

1
$ 0.00
Sponsors of Skimo
empty
empty
empty
Avatar for Skimo
Written by
2 years ago

Comments