Sombriks Has A Plan



SQL / NoSQL quick comparison

Small samples using MySQL and MongoDB as subjects to compare a few chores that are similar in day to day database handling.

Basic management

The most basic tools to secure, interact and do other basic tasks on each database engine.

Installation

There are several ways to install mysql.

This is how to install it on fedora / red hat and compatible systems:

sudo npm install community-mysql community-mysql-server

Mongodb provides several installation options.

Just choose the proper download option that better suits your operating system.

Running the service

When installed on bare metal, it's just a matter of enable the service and start it:

For mysql:

sudo systemctl start mysqld

For mongodb:

sudo systemctl start mongod

Secure installation

Once mysql is installed, it's important to run mysql_secure_installation. It takes care of allow/deny root connections, remove anonymous users and test database.

Mongodb has a dedicated security checklist with several chores to perform after install it.

Most critical are:

Example creating a root user via mongosh:

db.createUser(
  {
    user: "root",
    pwd: passwordPrompt(), // or clear text password
    roles: [
      {role: "userAdminAnyDatabase", db: "admin"},
      {role: "readWriteAnyDatabase", db: "admin"}
    ]
  }
)

Example mongo configuration (/etc/mongod.conf) enforcing auth:

# mongod.conf
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log
storage:
  dbPath: /var/lib/mongo
  journal:
    enabled: true
processManagement:
  timeZoneInfo: /usr/share/zoneinfo
net:
  port: 27017
  bindIpAll: true
security:
  authorization: enabled

Note: mongosh still connects normally without any credentials, but it's denied to perform anything but authenticate:

use
admin
db.auth("root", passwordPrompt()) // or clear text password

Tools

The following tools are fine options to work with mysql databases:

There is no official cloud-ready service offering mysql databases specifically, but you can setup an oracle always free vm and spin up mysql on it.

These are the tools to work with mongodb more easily:

Mongodb Atlas is a nice online service which you can use for free to keep your data online.

Login

With mysql, there is a command line tool to log in into the databases:

mysql -h localhost -u root -ppassword 

With mongodb, there is mongosh, a command line tool to interact with the collections:

mongosh "mongodb://localhost" \ 
  --username root \ 
  --password password \
  --authenticationDatabase=admin

It's important to indicate the authenticationDatabase parameter to proper authenticate. It's a mongo thing.

Database creation

With mysql, it's a single command. You need to create a database before anything else in order to save your data:

create
database movies; 

With mongodb you don't need to create a database upfront.

Database selection/use

Once database is created, tell mysql you want to use it.

Pretty straightforward:

use
movies; 

You can tell mongodb (when using mongosh or any other means of connection) that you want to use a database.

It does not, however, create it. More details ahead.

use
movies; 

Creating tables/documents

Once you created you database and instructed mysql to use it, you can create a table.

A table is a blueprint of your data. It defines how it looks like.

-- sample create table from https://github.com/bbrumm/databasestar
CREATE TABLE movie
(
    movie_id     INT NOT NULL AUTO_INCREMENT,
    title        VARCHAR(1000)  DEFAULT NULL,
    budget       INT            DEFAULT NULL,
    homepage     VARCHAR(1000)  DEFAULT NULL,
    overview     VARCHAR(1000)  DEFAULT NULL,
    popularity   DECIMAL(12, 6) DEFAULT NULL,
    release_date DATE           DEFAULT NULL,
    revenue      BIGINT(20) DEFAULT NULL,
    runtime      INT            DEFAULT NULL,
    movie_status VARCHAR(50)    DEFAULT NULL,
    tagline      VARCHAR(1000)  DEFAULT NULL,
    vote_average DECIMAL(4, 2)  DEFAULT NULL,
    vote_count   INT            DEFAULT NULL,
    CONSTRAINT pk_movie PRIMARY KEY (movie_id)
);

There is no such thing for mongodb.

Mongodb documents are free-form, therefore there is no direct equivalent on it.

Some projects tries to define some schema-ish types at application level, like Spring Data and Mongoose, but in it's heart mongodb doesn't care about blueprints.

Relations between entities

During table definition you can refer to previously created tables and that way define a relation between them. It can be done creating foreign keys:

CREATE TABLE movie_company
(
    movie_id   INT DEFAULT NULL,
    company_id INT DEFAULT NULL,
    CONSTRAINT fk_mc_comp FOREIGN KEY (company_id) REFERENCES production_company (company_id),
    CONSTRAINT fk_mc_movie FOREIGN KEY (movie_id) REFERENCES movie (movie_id)
);

There is no equivalent on mongodb.

It is possible, however, to perform 'joins' between collections which results in similar output when building queries.

Data ingestion

Data dump/restore

When data volume is too high, both mysql and mongodb offer alternative ways to handle data ingestion.

Data restoration in mysql can be done by redirecting a sql script from prompt. It will restore all tables and data at once.

mysql -u root -ppassword -h localhost movies < movies.sql

Data restoration in mongodb must be done one collection at time. There is a 16MB file size limit.

mongoimport --uri mongodb://localhost/movies \
  --authenticationDatabase=admin \ 
  --jsonArray --username root --password password \
  --collection movie --file movie.json 

For data exportation there is mysqldump and mongoexport with similar behavior of their import counterparts.

Insert

On both tools data insertion is pretty straigtforward:

MySQL:

INSERT INTO movie_crew
    (movie_id, person_id, department_id, job)
VALUES (285, 2445, 3, 'Producer');

MongoDB:

db.movie_crew.insertOne({
  movie_id: 285,
  person_id: 2445,
  department_id: 3,
  job: 'Producer'
})

Delete

Deletion is also very clean:

MySQL:

delete
from movie_crew
where movie_id = 285
  and person_id = 2445;

MongoDB:

db.movie_crew.deleteOne({
  person_id: 2445,
  movie_id: 285
})

Update

Data update also goes with no bigger hassle:

MySQL:

update movie_crew
set department_id = 9
where movie_id = 285
  and person_id = 2445;

MongoDB:

db.movie_crew.updateOne({
  person_id: 2445,
  movie_id: 285
}, {
  $set: {
    department_id: 9
  }
})

One quick note on mongo: its important to use the $set operator otherwise the update operation will replace the selected document entirely.

More on mongo operators here.

Queries

Find out information from your data.

Select/Find

Full scan

MySQL:

select *
from movie;

MongoDB:

db.movie.find({})

Specific movies given part of title

MySQL:

select *
from movie
where title like '%History%';

MongoDB:

db.movie.find({title: /History/})

Movies given an actor

MySQL:

select *
from movie
where movie_id in (select movie_id
                   from movie_cast
                   where actor_id = 85);

MongoDB:

// build a pipeline
db.movie.aggregate([
  {
    $lookup: {
      from: "movie_cast",
      localField: "movie_id",
      foreignField: "movie_id",
      as: "join_data"
    }
  },
  {
    $match: {
      "join_data.person_id": 85
    }
  }
])

Note: this is where data denormalization could benefit mongo. instead of query across different collections, just duplicate data over the db.

This is also why some criticize so fiercely NoSQL databases.

A query under a denormalized document would be like this:

db.movie.find({"movie_cast.person_id": 85})

Where movie_cast isn't a standalone collection anymore but a embedded one inside the movie collection.

Join tables/collections

Joining data helps to produce more relevant information.

Example, lets see the crew member names:

MySQL:

select * from person natural join movie_crew;

Note:

Natural joins differ from regular joins because they try to join tables by matching every column with same name on both tables.

MongoDB:

db.person.aggregate([
  {
    $lookup: {
      from: "movie_crew",
      localField: "person_id",
      foreignField: "person_id",
      as: "crew"
    }
  },
  {
    $match: {
      "crew": {
        $exists: true,
        $not: {
          $size: 0
        }
      }
    }
  }
])

Note:

Again you must take care on how you model your schemaless data, since every time you need to perform aggregations and lookups there will be a huge performance cost.

Advanced data manipulation

Build more complex information, data manipulation and some statistic queries on your data.

count / min / max

Aggregation functions helps to extract information from data.

Example, let's figure out how many actors are in the database:

MySQL:

select count(distinct person_id)
from movie_cast;

MongoDB:

db.movie_cast.aggregate([
  {$group: {_id: '$person_id'}},
  {$count: 'count_distinct_person_id'}
])

Now let's figure out the movies with the biggest cast:

MySQL:

select title, count(person_id)
from movie_cast
         join movie on movie_cast.movie_id = movie.movie_id
group by title
order by count(person_id) desc

MongoDB:

db.movie_cast.aggregate([
  {$group: {_id: '$movie_id', count: {$sum: 1}}},
  {$sort: {"count": -1}}
])

This is the closest possible query. Denormalize for better results.

Views

Conceptually speaking, Views are a specific way to select data from the databases that are so common that there is this formalized way to do that without all the heavy-lifting of a complex join query.

In mysql a view can be declared like this:

create
or replace view 
    vw_movies_count_cast 
as
select m.*,
       mc.*
from movie m
         join
     movie_cast mc on m.movie_id = mc.movie_id

There are views in mongodb, and you can declare them like this:

db.createView(
  "movie_count_cast",
  "movie",
  [
    {
      $lookup: {
        from: "movie_cast",
        localField: "movie_id",
        foreignField: "movie_id",
        as: "cast"
      }
    }
  ]
)

Conclusion

This is just a fast ride on Relational vs non-relational databases and there is much left out of this guide.

For instance, mongo will beat mysql on every single direct query not involving $lookups into other collections. When it needs to do that, mysql beats mongo every time.

The key lesson here remains: mind your data modelling even if it's schemaless.

Happy Hacking.

References