JBoss Developer Experience, Red Hat
- Shared Everything vs Shared Nothing database architectures
- An overview of Data Grids
- Impact on storage
Let's talk about databases first
Shared Everything database architectures
Shared Disk database architectures
- We won't discuss shared memory designs - does not aid a lot in understanding data grids.
- Nodes have their own memory, and share mass storage.
- Disk interconnect may be achieved via SANs.
- OLTP databases like Oracle, IBM DB2, MySQL, PostgreSQL are architected this way.
Issues with shared disk architectures
- Shared storage becomes a single point of contention. Limits horizontal scaling.
- Improved performance is achieved through vertical scaling. Faster nodes == better performance.
- Consistent writes require :
- disk based lock tables
- or synchronization among individual nodes.
Shared Nothing architectures
In shared-nothing architectures:
- Nodes exhibit independence and self-sufficiency with no single point of contention.
- Horizontal scaling is done by adding more nodes and typically does not come at the cost of performance.
- A node can typically operate only on data available locally - leads to data partitioning techniques.
Comparison with shared-disk architectures
- Shared-nothing architectures incur data-shipping problems operating on data spanning multiple nodes. (Think access plans for queries spanning nodes or involving joins)
- Without good data affinity or partitioning techniques, loads will not be uniformly distributed.
- Without data replication, nodes are single points of failure.
Defining Data Grids
Most people know what a database is. Very few know what a data grid is. They are not:
- in-memory relational databases, or
- a simple data caching solution, or
- or even a NoSQL database.
A Data Grid is a system composed of multiple servers that work together to manage information and related operations – such as computations – in a distributed environment.Cameron Purdy (2008). Defining a Data Grid
Data grids are distributed databases designed for scalability having the characteristics of shared-nothing architectures.Note the lack of a 'relational' qualifier for the database. Data grids typically store objects not tuples.
In-memory data grids store data in-memory for fast access to large volumes of data.
Data grids and storage
Design constraints imposed on storage systems are not consistent. The constraints depend on the implementation of the data grid and the applications accessing them.
Online and offline writes
Data grids can be instructed to write to disk
- Online writes == Write-through mode. Clients block until the write is complete.
- Offline write == Write-behind mode. Clients don't block for the write to complete. The write is completed in the background.
Modelling IO operations - #1
Data grids modelled after traditional databases will typically:
- perform an write to disk when objects in the grid are updated.
- evict objects (remove from memory + no local disk write) from the grid based on policies - LRU, LIRS etc.
- expire objects (remove from memory + cluster disk write) from data grid based on policies.
- read objects from memory if available, else read from disk.
Modelling IO operations - #2
Considering evolution in data grids, data grid implementations may evolve to:
- perform writes to disk occur only during eviction!
- ensure HA through replication.
Middleware demystified - datagrids