To Sql or to NoSql?
Irrational relational, schemaless folks, tentacular graphs
& other horror stories.
~ # whoami
Fulvio Meden
Developer @
eMaze Networks
Student @ UniTS (information engineering)
First THing
I was told at Database lesson, was a horror story.
A horror story about the ancient times.
Don't know how ancient, but it sounded something like..
.. the age of Lovecraftian Old Gods.
It was something along these lines:
there was a class of database systems
in which the relations among data records
were implemented via pointers aimed directly at memory locations.
This was to tell us that
then came Codd
to bring forth word of his /1\d/ rules,
and there was light,
and the Old Databases were banished from this world,
and so on.
And then the
relational era
But peace was not achieved as among DBMS vendors a war arose, like feudal Lords with the users as their armies.
But that's not important, it has always be the same for languages, IDEs and OSs and it always will be.
But enough for the fairytales and on with something more technical.
Here are some wonderful things I was taught about Relational databases:
Pieces of information present in the database
have one and only one representation
To eliminate redundancy we use the
Normal Forms
to split big aggregates of information in more smaller ones interrelated to each other.
Splitting data across more tables look like a pain when retrieving it, but we were given
SQL
An elegant, declarative language that allows us to
ask the engine for data instead of tell an engine how to fetch it.
To compose interrelated data pieces to a bigger aggregate, SQL has the powerful
join
construct.
Vistas
are to be used to hide the real table structure, to protect it from the gaze of the outside world.
We might need to alter the table structure,
but we don't want the external api to change.
They're
ACID
(Atomic, Consistent, Isolation, Durable).
Managers and whoever pays for the system
LOVES to hear it.
And since when crossing the border between DBMS and object-oriented application mapping tables to objects was a pain the ass
ORM
frameworks were invented.
But then something happened
A manager walked in the DBA's room and said:
"it's slow."
"You know man, performances matters."
"Well, let's just add some resources to the DB machine. RDBMS are vertically scalable, you know. I'll need some cash to buy new hardware."
"I'm giving you ten dollars and deal with it."
So the DBA was forced to invent something else for that.
He called in the Analyst and they came to the following realization:
"We have to violate the one-representation rule and denormalize data. Introduce redundancy to enhance performances.
But this shall be done only when necessary, and to decide when it is, we'll create Access Esteem tables, to identify the most sensitive spots."
At first, the DBA trembled in fear to hear those words.
"and what about the consistency of the data?"
In the darkness of the room, the analyst grinned.
"RDBMS are powerful because they're programmable too. We'll put business rules inside the database, as triggers and stored procedures."
So RDBMS proved to be powerful enough to stand every problem.
It became the standard, and since obtaining the top still wasn't a task for commoners, specialized analysts and consultants and DBA proliferated and spread around the IT world.
At the same time, somewhere around the forgotten lands, another family of databases was born:
object-oriented, graph-structured, simple key-value stores, document collections, or simply XML structures.
They were young, but they were many, and the united under a single banner forming the House of NoSql.
But the world laughed ad them
"You are only paesants pretending to be noblemen."
"We have the technology, we have the power.
We have tooling, literature, mathematical bases and consultants."
"We were embraced by great houses like Oracle and Microsoft.
And even you poor blog where you proclaim yourself as a 'DB' is built on top of MySql, the most poor of us, but still one of us."
"NoSQL are just toys, for kids to play with."
Well, interesting story, isn't it?
But now, in an era where the internet is widespread, connections are fast and there's the cloud too, what are the real requirements of a modern application?
An application needs to scale.
So once again
And the DBA was called to the Management room
"You called, your Grace the CEO?"
"Marketing reports that users claims the application is slow. And the analyst says the database is at fault. I gave you thousand of dollars to scale up the machine when the user based increased again last week, what of it now? Speak your words, DBA."
"Your highness, with that money we deployed the most powerful machine on the market to production. But last night the application went viral, and it's not enough anymore. And we're already at the limit with denormalization and optimization."
"Then I hereby declare that you are to be stripped of all your administrative titles and privileged accesses, and condemned to be burnt on the spot. You're fired."
We want
our applications never to be offline.
High Availability
Live updates
Failover
Hot Standby
Disaster recovery
The meeting went on until deep in the night. Suddenly the Analyst broke the silence of the conference room.
"Again, why a Master-Slave solution isn't enough?"
"Because we have to put the slave in read-only mode to protect consistency. And if the master fails, it has to be promoted manually."
It was the tenth time the DBA had to explain. There was no hint of a real solution, and he really just wanted to go home.
"And again, why do we have to promote it manually? Can't we build some infrastructure that does it for us?"
"Because it could incur in a Split Brain. And the top management won't grant us the budget for an arbiter."
Our applications are
Object-Oriented
and when data is fragmented across a lot of tables the price of ORMs and JOINs becomes heavy.
SELECT {many many columns}
FROM (SELECT * FROM dbo.T_CUS_TSK_TASK WHERE is_deleted=0) T
INNER JOIN dbo.V_CUS_GRP_GROUP G ON (T.group_id = G.group_id)
INNER JOIN dbo.T_BKK_DISCOUNT_TYPE DT ON (DT.discount_type_id=T.discount_type_id)
INNER JOIN dbo.T_BKK_CURRENCY DC ON (T.debit_currency_id=DC.currency_id)
INNER JOIN dbo.T_BKK_CURRENCY PC ON (T.payback_currency_id=PC.currency_id)
INNER JOIN dbo.T_BKK_CURRENCY FC ON (T.final_debit_currency_id=FC.currency_id)
INNER JOIN dbo.T_GLOBAL_COUNTER D1C ON (D1C.company_id=T.company_id AND
D1C.counter_name='PROFORMA_INVOICE_COUNTER')
INNER JOIN dbo.T_GLOBAL_COUNTER D2C ON (D2C.company_id=T.company_id AND
D2C.counter_name='TAX_INVOICE_COUNTER')
INNER JOIN dbo.T_GLOBAL_COUNTER D3C ON (D3C.company_id=T.company_id AND
D3C.counter_name='INVOICE_RECEIPT_COUNTER')
INNER JOIN dbo.T_GLOBAL_COUNTER D4C ON (D4C.company_id=T.company_id AND
D4C.counter_name='DELIVERY_NOTE_COUNTER')
INNER JOIN dbo.T_GLOBAL_COUNTER D5C ON (D5C.company_id=T.company_id AND
D5C.counter_name='BILL_OF_LADING_COUNTER')
INNER JOIN dbo.T_GLOBAL_COUNTER D6C ON (D6C.company_id=T.company_id AND
D6C.counter_name='CREDIT_INVOICE_COUNTER')
LEFT JOIN dbo.V_SYS_BRANCH BR ON (T.branch_id = BR.branch_id)
LEFT JOIN dbo.T_CUS_TSK_TASKS_ARRAY AR ON (T.array_id = AR.array_id)
LEFT JOIN dbo.T_DRIVER D ON (T.driver_id = D.driver_id)
LEFT JOIN dbo.T_VEHICLE V ON (T.vehicle_id = V.vehicle_id)
LEFT JOIN dbo.T_STF_INVITER I ON (T.inviter_id = I.inviter_id)
LEFT JOIN dbo.T_STF_SUBCONTRACTOR SC1 ON (SC1.subcontractor_id = D.subcontractor_id)
LEFT JOIN dbo.T_STF_SUBCONTRACTOR SC2 ON (SC2.subcontractor_id = T.subcontractor_id)
LEFT JOIN dbo.T_CUS_TSK_TASK_STATUS S ON (S.task_status_id=T.task_status_id)
LEFT JOIN dbo.V_STF_SUB_LOCATION SL1 ON (SL1.sub_location_id=T.start_sub_location_id)
LEFT JOIN dbo.V_STF_SUB_LOCATION SL2 ON (SL2.sub_location_id=T.end_sub_location_id)
LEFT JOIN dbo.T_STF_CUSTOMER CU ON (CU.customer_id=T.customer_id)
LEFT JOIN dbo.T_STF_CUSTOMER_SPLITTING_CODE SP ON (SP.splitting_id=T.splitting_id)
LEFT JOIN dbo.V_CUS_TSK_CREDIT_FOR_TASK CR ON CR.task_id=T.task_id
LEFT JOIN dbo.T_BKK_PROFORMA_INVOICE D1 ON (T.proforma_invoice_id=D1.proforma_invoice_id)
LEFT JOIN dbo.T_BKK_TAX_INVOICE D2 ON (T.tax_invoice_id=D2.tax_invoice_id)
LEFT JOIN dbo.T_BKK_INVOICE_RECEIPT D3 ON (T.invoice_receipt_id=D3.invoice_receipt_id)
LEFT JOIN dbo.T_BKK_DELIVERY_NOTE D4 ON (T.delivery_note_id=D4.delivery_note_id)
LEFT JOIN dbo.T_BKK_BILL_OF_LADING D5 ON (T.bill_of_lading_id=D5.bill_of_lading_id)
LEFT JOIN dbo.V_CUS_TSK_CONTAINER CONTAINER1 ON (CONTAINER1.container_id=T.container1_id)
LEFT JOIN dbo.V_CUS_TSK_CONTAINER CONTAINER2 ON (CONTAINER2.container_id=T.container2_id)
LEFT JOIN dbo.V_STF_TRAILER TRAILER1 ON (TRAILER1.trailer_id=T.trailer1_id)
LEFT JOIN dbo.V_STF_TRAILER TRAILER2 ON (TRAILER2.trailer_id=T.trailer2_id)
LEFT JOIN dbo.T_STF_LUGGAGE_TYPE LUGGAGE_TYPE ON (LUGGAGE_TYPE.luggage_type_id=T.luggage_type_id)
Our teams are
Agile
requirements change, they change often and they change fast!
It was the fourth meeting in a week over the same problem.
But this time, the top management was present. And angry.
The air was heavy, when the DBA was asked for his reasons.
"Why isn't that business requirement of our app implemented already?
You said it was just a new column on a table. Three weeks ago."
He started to sweat heavily.
"Will they believe my words, or just take it as an attempt to blame someone else?", he thought.
"Yes, sir. But that table is also used by the team of another app that integrates with the database, and they're behind schedule to adapt all of their Triggers to this change.
And the operation team has not yet chosen a date to apply the migrations, we'll need a full day of disservice."
"So you claim to have no blame?"
"Well... There's also this Stored Procedure that my predecessor had written... It has been three days since I stared working on it, and I don't understand how it works yet. And there's no way to test it."
Programmability of RDBMSs
is a powerful feature for sure.
There are many reasons not to do it, and the topic
is covered very well in
(in Italian)
Sometimes
data does not have a clearly defined structure.
Or the modeling just couldn't be done right at the first try.
Or the semantics change over time, as new related data are introduced in the system.
Another night, another late meeting.
"The new product we launched yesterday comes in four different colors. Customers must be able to choose the color from our web store. And it must be clear that it's one product, not four."
"Then I'll have to add a column for the color for all products. Even if it is to be used only for this one."
"Fine. Do it."
"Are you sure? Don't you remember the hell of last week?"
And something we just need something
specific
that RDBMS are not good at.
Graph structures
High speed without persistence
High parallel, distributed queries
and then
NoSQLs, once despised, stood up to bring solutions to these real-world problems.
They are many, and they're all different, but they have some features in common that address the aforesaid requirements.
schema-free
out-of-the-box replication and/or sharding support
simple APIs
BASE (instead of ACID)
capable of handling huge amounts of data
can run on commodity hardware
Apache Cassandra
Column Store
And many more
Object databases, grid/cloud, XML, multidimensional, multivalue...
A comprehensive list:
But
That does not make the NoSQLs the solution to all of world's problems.
They're still young.
That means there's a lack of tooling and expertise.
Business demands
support
And NoSQL vendors are often small startups.
They're not able to provide the same consulting and warranties that an Oracle or a Microsoft would.
Standards
are a good thing.
And where RDBMS all adhere to the same mechanics
and the SQL language,
every NoSQL has its own peculiarities and API.
They're
BASE
instead of ACID.
(Basically Available, Soft state, Eventual consistency)
also
A lot of people complains that NoSQLs are not mature yet.
But in the last years they gave proof
to be able to handle real-world, mission critical jobs with success.
So this is true only for the most recent ones.
That means
Not SQLs nor NoSQLs are the definitive solution.
But either one or the other might be more effective
in a given context.
Lastly
A new category of databases are trying to bring the
scalability features of NoSQLs into the SQL world
all of these legends
passed from mouth to mouth, now embody what it is my personal vision of the DB world.
Do not take them as accurate history, nor as an exquisite technical speech.
But keep in mind
that in every legend there is a seed of truth.