Most of you must be using MongoDB for storing data. But do you know how performant is your MongoDB?
In this part(part 1), we'll understand the aspects on which MongoDB performance is dependent and the things to be considered while using MongoDB.
Hardware consideration
The first thing is hardware(if you wanna setup your own machine or buy a machine). And the few things to consider are:
Memory
More memory(RAM) means more performance. Memory is used for the following operations:
Aggregations
Index traversing
Write operations
Query engine (to retrieve query results)
Connections (each connection roughly requires 1MB space)
CPU
Each process in our system requires CPU. MongoDb tries to use all available CPU cores. More CPUs means more concurrent request can be satisfied hence more performance. Some operations such as Page compressions, Data calculations, Aggregation framework operations, Map reduce, etc. require CPU.
IO(HDDs and SSDs)
Data are stored to disk for persistence. IOs with more IOPS(Input/ Output Operations Per second) can lead to better performance.
Our disk architecture also affects our performance. Mostly used architecture for disks is RAID (redundant array of independent disks) architecture. RAID architecture level 10 is most widely used and recommended for MongoDB.
You can read more about RAID here.
Network
Network bandwidth also plays an important role in MongoDB performance. Larger bandwidth more performance. Even the network switches, load balancer and firewall contribute to the performance.
Indexes
Next thing to consider is indexing your data.
Indexes are like indexes found in a book for quick reference/search of some text. Indexes are used to make our search/queries fast.
Just as without an index in the book, we have to go through each page to search the desired content, similarly without indexes, we have to scan each and every document in the collection to satisfy our search query.
Without indexes the number of documents scanned will increase linearly with the increase in the number of documents.
Indexed data is store in memory. The value of the field that we indexed is stored as the key and the reference of the actual corresponding document is stored as its value. If for some document, the value of indexed field is not present then the key “null” will be stored.
“_id” field is automatically indexed in every collection. Indexes can decrease write operation performance because the indexes might need to be adjusted upon a write operation
Now, let's see how data is stored on disk.
On disk, data is stored at path specified as dbPath
while running the mongod
server.
Run MongoDB server with following configuration and see the list of files at our dbPath
after inserting something into the database.
storage:
dbPath: /var/mongodb/db/mongo
systemLog:
path: /var/mongodb/db/mongo/mongo.log
destination: file
logAppend: true
net:
bindIp: 127.0.0.1, 192.168.103.100
port: 27000
security:
authorization: enabled
processManagement:
fork: true
List of files: Note that all data and index related files are here. This is how files are stored by default.
Now, let's see how we can change the default structure.
Shutdown the MongoDB server, delete and re-create the dbPath
folder, run again with following configuration, insert some data and list the files.
storage:
dbPath: /var/mongodb/db/mongo
# this means that for each database there will be a single/separate directory
# assigned to it.
directoryPerDB: true
systemLog:
path: /var/mongodb/db/mongo/mongo.log
destination: file
logAppend: true
net:
bindIp: 127.0.0.1, 192.168.103.100
port: 27000
security:
authorization: enabled
processManagement:
fork: true
List of files:
as you can see that for each database there is a separate directory.
Now again follow above steps and run MongoDB server with following configuration.
storage:
dbPath: /var/mongodb/db/mongo
directoryPerDB: true
# this means there will be separate directories for collection and indexes
wiredTiger:
engineConfig:
directoryForIndexes: true
systemLog:
path: /var/mongodb/db/mongo/mongo.log
destination: file
logAppend: true
net:
bindIp: 127.0.0.1, 192.168.103.100
port: 27000
security:
authorization: enabled
processManagement:
fork: true
List of files:
You can see that are separate directories for collection and indexes.
Having separate directories for indexes and collections is beneficial when we have multiple disks where one can be used for indexes and other can be used for collections. Symbolic links are created between multiple disks to use them as one and to access data. More concurrent requests can be satisfied.
So, that concludes our part 1. In part 2, we'll learn more about indexes, their types, and how to use them.
Stay tuned.... :)
And Happy Learning...
Very Informative, keep it up champion