Small samples using MySQL and MongoDB as subjects to compare a few chores that are similar in day to day database handling.
The most basic tools to secure, interact and do other basic tasks on each database engine.
There are several ways to install mysql.
This is how to install it on fedora / red hat and compatible systems:
sudo npm install community-mysql community-mysql-server
Mongodb provides several installation options.
Just choose the proper download option that better suits your operating system.
When installed on bare metal, it's just a matter of enable the service and start it:
For mysql:
sudo systemctl start mysqld
sudo systemctl start mongod
Once mysql is installed, it's important to run mysql_secure_installation
. It
takes care of allow/deny root connections, remove anonymous users and test
database.
Mongodb has a dedicated security checklist with several chores to perform after install it.
Most critical are:
Example creating a root user via mongosh:
db.createUser(
{
user: "root",
pwd: passwordPrompt(), // or clear text password
roles: [
{role: "userAdminAnyDatabase", db: "admin"},
{role: "readWriteAnyDatabase", db: "admin"}
]
}
)
Example mongo configuration (/etc/mongod.conf
) enforcing auth:
# mongod.conf
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
storage:
dbPath: /var/lib/mongo
journal:
enabled: true
processManagement:
timeZoneInfo: /usr/share/zoneinfo
net:
port: 27017
bindIpAll: true
security:
authorization: enabled
Note: mongosh still connects normally without any credentials, but it's denied to perform anything but authenticate:
use
admin
db.auth("root", passwordPrompt()) // or clear text password
The following tools are fine options to work with mysql databases:
There is no official cloud-ready service offering mysql databases specifically, but you can setup an oracle always free vm and spin up mysql on it.
These are the tools to work with mongodb more easily:
Mongodb Atlas is a nice online service which you can use for free to keep your data online.
With mysql, there is a command line tool to log in into the databases:
mysql -h localhost -u root -ppassword
With mongodb, there is mongosh, a command line tool to interact with the collections:
mongosh "mongodb://localhost" \
--username root \
--password password \
--authenticationDatabase=admin
It's important to indicate the authenticationDatabase parameter to proper authenticate. It's a mongo thing.
With mysql, it's a single command. You need to create a database before anything else in order to save your data:
create
database movies;
With mongodb you don't need to create a database upfront.
Once database is created, tell mysql you want to use it.
Pretty straightforward:
use
movies;
You can tell mongodb (when using mongosh or any other means of connection) that you want to use a database.
It does not, however, create it. More details ahead.
use
movies;
Once you created you database and instructed mysql to use it, you can create a table.
A table is a blueprint of your data. It defines how it looks like.
-- sample create table from https://github.com/bbrumm/databasestar
CREATE TABLE movie
(
movie_id INT NOT NULL AUTO_INCREMENT,
title VARCHAR(1000) DEFAULT NULL,
budget INT DEFAULT NULL,
homepage VARCHAR(1000) DEFAULT NULL,
overview VARCHAR(1000) DEFAULT NULL,
popularity DECIMAL(12, 6) DEFAULT NULL,
release_date DATE DEFAULT NULL,
revenue BIGINT(20) DEFAULT NULL,
runtime INT DEFAULT NULL,
movie_status VARCHAR(50) DEFAULT NULL,
tagline VARCHAR(1000) DEFAULT NULL,
vote_average DECIMAL(4, 2) DEFAULT NULL,
vote_count INT DEFAULT NULL,
CONSTRAINT pk_movie PRIMARY KEY (movie_id)
);
There is no such thing for mongodb.
Mongodb documents are free-form, therefore there is no direct equivalent on it.
Some projects tries to define some schema-ish types at application level, like Spring Data and Mongoose, but in it's heart mongodb doesn't care about blueprints.
During table definition you can refer to previously created tables and that way define a relation between them. It can be done creating foreign keys:
CREATE TABLE movie_company
(
movie_id INT DEFAULT NULL,
company_id INT DEFAULT NULL,
CONSTRAINT fk_mc_comp FOREIGN KEY (company_id) REFERENCES production_company (company_id),
CONSTRAINT fk_mc_movie FOREIGN KEY (movie_id) REFERENCES movie (movie_id)
);
There is no equivalent on mongodb.
It is possible, however, to perform 'joins' between collections which results in similar output when building queries.
When data volume is too high, both mysql and mongodb offer alternative ways to handle data ingestion.
Data restoration in mysql can be done by redirecting a sql script from prompt. It will restore all tables and data at once.
mysql -u root -ppassword -h localhost movies < movies.sql
Data restoration in mongodb must be done one collection at time. There is a 16MB file size limit.
mongoimport --uri mongodb://localhost/movies \
--authenticationDatabase=admin \
--jsonArray --username root --password password \
--collection movie --file movie.json
For data exportation there is mysqldump and mongoexport with similar behavior of their import counterparts.
On both tools data insertion is pretty straigtforward:
MySQL:
INSERT INTO movie_crew
(movie_id, person_id, department_id, job)
VALUES (285, 2445, 3, 'Producer');
MongoDB:
db.movie_crew.insertOne({
movie_id: 285,
person_id: 2445,
department_id: 3,
job: 'Producer'
})
Deletion is also very clean:
MySQL:
delete
from movie_crew
where movie_id = 285
and person_id = 2445;
MongoDB:
db.movie_crew.deleteOne({
person_id: 2445,
movie_id: 285
})
Data update also goes with no bigger hassle:
MySQL:
update movie_crew
set department_id = 9
where movie_id = 285
and person_id = 2445;
MongoDB:
db.movie_crew.updateOne({
person_id: 2445,
movie_id: 285
}, {
$set: {
department_id: 9
}
})
One quick note on mongo: its important to use the $set
operator otherwise the
update operation will replace the selected document entirely.
More on mongo operators here.
Find out information from your data.
MySQL:
select *
from movie;
MongoDB:
db.movie.find({})
MySQL:
select *
from movie
where title like '%History%';
MongoDB:
db.movie.find({title: /History/})
MySQL:
select *
from movie
where movie_id in (select movie_id
from movie_cast
where actor_id = 85);
MongoDB:
// build a pipeline
db.movie.aggregate([
{
$lookup: {
from: "movie_cast",
localField: "movie_id",
foreignField: "movie_id",
as: "join_data"
}
},
{
$match: {
"join_data.person_id": 85
}
}
])
Note: this is where data denormalization could benefit mongo. instead of query across different collections, just duplicate data over the db.
This is also why some criticize so fiercely NoSQL databases.
A query under a denormalized document would be like this:
db.movie.find({"movie_cast.person_id": 85})
Where movie_cast
isn't a standalone collection anymore but a embedded one
inside the movie
collection.
Joining data helps to produce more relevant information.
Example, lets see the crew member names:
MySQL:
select * from person natural join movie_crew;
Note:
Natural joins differ from regular joins because they try to join tables by matching every column with same name on both tables.
MongoDB:
db.person.aggregate([
{
$lookup: {
from: "movie_crew",
localField: "person_id",
foreignField: "person_id",
as: "crew"
}
},
{
$match: {
"crew": {
$exists: true,
$not: {
$size: 0
}
}
}
}
])
Note:
Again you must take care on how you model your schemaless data, since every time you need to perform aggregations and lookups there will be a huge performance cost.
Build more complex information, data manipulation and some statistic queries on your data.
Aggregation functions helps to extract information from data.
Example, let's figure out how many actors are in the database:
MySQL:
select count(distinct person_id)
from movie_cast;
MongoDB:
db.movie_cast.aggregate([
{$group: {_id: '$person_id'}},
{$count: 'count_distinct_person_id'}
])
Now let's figure out the movies with the biggest cast:
MySQL:
select title, count(person_id)
from movie_cast
join movie on movie_cast.movie_id = movie.movie_id
group by title
order by count(person_id) desc
MongoDB:
db.movie_cast.aggregate([
{$group: {_id: '$movie_id', count: {$sum: 1}}},
{$sort: {"count": -1}}
])
This is the closest possible query. Denormalize for better results.
Conceptually speaking, Views are a specific way to select data from the databases that are so common that there is this formalized way to do that without all the heavy-lifting of a complex join query.
In mysql a view can be declared like this:
create
or replace view
vw_movies_count_cast
as
select m.*,
mc.*
from movie m
join
movie_cast mc on m.movie_id = mc.movie_id
There are views in mongodb, and you can declare them like this:
db.createView(
"movie_count_cast",
"movie",
[
{
$lookup: {
from: "movie_cast",
localField: "movie_id",
foreignField: "movie_id",
as: "cast"
}
}
]
)
This is just a fast ride on Relational vs non-relational databases and there is much left out of this guide.
For instance, mongo will beat mysql on every single direct query not involving
$lookups
into other collections. When it needs to do that, mysql beats mongo
every time.
The key lesson here remains: mind your data modelling even if it's schemaless.
Happy Hacking.