MongoDB basics and CRUD
These are the notes I had taken while I was reading from a book, MongoDB The Definative Guide. If you want to learn mongoDB, there are several free online courses on Mongo university which go from the very basics to advanced concepts of database design.
- A document is a basic unit of data. It is roughly equivalent to a row in a relational database
- A collection can be thought as a table with a dynamic schema. Unlike relational database, you do not have a fixed schema.
- A database is compromised of different collections and the best thing is, a single instance of database can host multiple independent databases, which can have their own collections
- Every document has a unique identifier, “_id”, this is unique within a collection.
- It is type sensitive and case sensitive
A document is a ordered set of key, value pairs.
- Keys must not contain \0 (the null character). This is used to signify the end of a key
- . and $ characters should be used considerably
- NO DUPLICATE KEYS are allowed
A collection is a group of documents.
It is roughly equivalent to table in relational database
Group related types of document together, even though MongoDB doesn’t enforce it
- It is much faster to get a list of collections than to extract a list of the types in a collection. For example, if we had a “type” field in each document that specified whether the document was a “skim,” “whole,” or “chunky monkey,” it would be much slower to find those three values in a single collection than to have three separate collections and query the correct collection.
- Grouping documents of the same kind together in the same collection allows for data locality. Getting several blog posts from a collection containing only posts will likely require fewer disk seeks than getting the same posts from a collection containing posts and author data
- By putting only documents of a single type into the same collection, we can index our collections more efficiently.
system.* prefix and
$are reserved for internal collections and a reserved character. Don’t use it for naming
- Use of sub-collections like
blog.postsis highly encouraged. It can help to give a order to your data and makes organizing your data easier.
A database has its own permissions, and each database is stored in separate files on disk. A good rule of thumb is to store all data for a single application in the same database. Separate databases are useful when storing data for several application or users on the same MongoDB server.
- Database names are case-sensitive, even on non-case-sensitive file systems. To keep things simple, try to just use lowercase characters.
- Database names are limited to a maximum of 64 bytes
- One thing to remember about database names is that they will actually end up as files on your file-system
- reserved database name:
- admin: the root database, user can access all database
.insert()operation can be used to inset a document in a collection
findOne()can be used to read several documents and one document respectively
JSON is easy to understand, parse and remember
JSON expressive capabilities are limited:
- null - boolean - numeric - string - array - object
MongoDB adds data-types to name a few
- date - regular expression - embedded document - object id - binary data - code
new Date()to create a date object. not using
newand calling Date() returns a string. See JS’s ECMA Specification for how Date class works*
- Dates in db are stored as milliseconds since the epoch, they do not have local time zone settings associated with them.
- Time zone information can be stored as value for another key
in JSON represents an array*
- Arrays can hold different data types as values, in above example string, number, boolean, null. *
One of the great things about arrays in documents is that MongoDB “understands” their structure and knows how to reach inside of arrays to perform operations on their contents. This allows us to query on arrays and build indexes using their contents. For instance, in the previous example, MongoDB can query for all documents where 3.14 is an element of the “things” array. If this is a common query, you can even create an index on the “things” key to improve the query’s speed.
- Document can be used as a value for the key. This is called an embedded document.
- ED can be used to organize data in a more natural way
- Embedded documents are not JS objects, they support all the data types supported by the document as discussed above*
- As with arrays, MongoDB “understands” the structure of embedded documents and is able to reach inside them to build indexes, perform queries, or make updates
In a relational database, the previous document would probably be modeled as two separate rows in two different tables (one for “people” and one for “addresses”). With MongoDB we can embed the address document directly within the person document. When used properly, embedded documents can provide a more natural representation of information.
- The flip side to this is that there will be more data repetition with MongoDB*
- Suppose “addresses” were a separate table in a relational database and we needed to fix a typo in an address. When we did a join with “people” and “addresses,” we’d get the updated address for everyone who shares it. With MongoDB, we’d need to fix the typo in each person’s document. *
_id and ObjectIds
- Every document stored in MongoDB must have an
_idkey’s value can be of any type, but it defaults to an
- Every document must have unique
_idwithin a collection*
ObjectId is the default type for “_id”. The ObjectId class is designed to be lightweight, while still being easy to generate in a globally unique way across different machines. MongoDB’s distributed nature is the main reason why it uses ObjectIds as opposed to something more traditional, like an auto incrementing primary key: it is difficult and time-consuming to synchronize auto-incrementing primary keys across multiple servers. because MongoDB was designed to be a distributed database, it was important to be able to generate unique identifiers in a shared environment.
ObjectIduse 12 bytes of storage, which gives them a string representation of 24 hexadecimal digits: 2 digits for each byte.
- The first four bytes of an ObjectId are a timestamps in seconds since the epoch. Because the time-stamp comes first, it means that ObjectID’s will sort in roughly insertion order. This is not a strong guarantee but does have some nice properties, such as making ObjectIds efficient to index.*
- The next three bytes of an ObjectId are a unique identifier of the machine on which it was generated. This is usually a hash of the machine’s hostname. By including these bytes, we guarantee that different machines will not generate colliding ObjectIds.
- To provide uniqueness among different processes generating ObjectIds concurrently on a single machine, the next two bytes are taken from the process identifier (PID) of the ObjectId-generating process.
- These first nine bytes of an ObjectId guarantee its uniqueness across machines and processes for a single second. The last three bytes are simply an incrementing counter that is responsible for uniqueness within a second in a single process. This allows for up to 256^3 (16,777,216) unique ObjectIds to be generated per process in a single second.
- Every document stored in MongoDB must have an
Auto-generation of _id As stated previously, if there is no “_id” key present when a document is inserted, one will be automatically added to the inserted document. This can be handled by the MongoDB server but will generally be done by the driver on the client side. The decision to generate them on the client side reflects an overall philosophy of MongoDB: work should be pushed out of the server and to the drivers whenever possible. This philosophy reflects the fact that, even with scalable databases like MongoDB, it is easier to scale out at the application layer than at the database layer. Moving work to the client side reduces the burden requiring the database to scale.
Using the Shell
```$ help db.help()
show users ```