DDIA ch.3

Storage and retrieval

DB Data structures

Worlds Simplest database



- Straightforward, but not really

- Reading requires you to find the start of the next key, hence some form of escape is necessary

- escape characters will mess up your storage if not handled properly

- Writing to multiple files and hence reading, requires some form of pointer to remember where you are and where to write

Hash indexes - in memory hash



- Segmenting

- Compacting (on older files in background) 

- Merging segments (can be done with compacting

- Crash recovery (in prev example)

- Checksums for crash recovery


- sequential writing is fast (esp for disk drives)

- concurrency & crash recovery (segments are immutable and/or append only)

- little disk fragmentation


- hash tables must fit into memory

- range queries are not efficient


- its very easy to forget that you lose the hash on crashes (in code pen a refresh simulates that)

- If you want to build your hash from source of truth instead of saving it, remember to write the index into the hash-value rather than the actual record-value

mmm hash

SSTables + LSM trees (kind of)



- my misconception is that LSM tree is just some tree and then we write to SSTables

- the LSM in LSM trees mean Log-Structured Merge, which happens periodically, in a rolling function

- this implies having 2 trees/structures (or more) C0 and C1 which can be in the form of SSTables

- C0 is stored in memory and is very fast, containing frequently ref-ed nodes

actual rolling merge


- bloom filter:  space-efficient probabilistic data structure 

  - may contain false +ves (but better than false -ves)

  - check if key exists before going through LSMtree

- 2 strategies in merging+compaction

  - size-tiered, small tables merge into bigger tables

  - level-tiered, oldest tables are merged




- B+Trees are actually more established than LSM trees

- B+Trees store the tree as pages, of two kinds

  - A series of pointers representing the index

  - Leaf pages which contain the data

- B+Trees modify in place

- B+Trees are built from "bottom up"; every time you split, you grow more parents

not sure what I was expecting


- Write Ahead Logs (WAL) - basically an undo list

- Alternatively, use copy on write scheme, also useful for snapshots

- add lightweight thread locks to maintain integrity during updates

- abbreviate keys, only need to know the boundaries between ranges

- Arrange page orders on disk such to be sequential

- sibling pointers of leafs, to reduce need to go back to root

B trees vs LSM trees

Faster Writes

Sequential Writes

Compact Data

Compaction blocks read/write

Faster Reads

Pages are fixed sizes

Each key only appear once

Light locking on write

Other Indexing Structures

Secondary Indexes

-Secondary indexes can be created from a key-value index.

  -The main difference between primary and secondary indexes is that secondary indexes are not unique. This can be solved two ways:
    -Make the value in the index a list of matching row identifiers.
    - Make each key unique by adding a row identifier to the secondary key.
- Both B-trees and log-structured indexes can be used as secondary indexes.

Values within Index (clustered index)

- Sometimes hopping from index to heap file (unordered insertion) is too expensive, we will store the index row directly into the index
- There can only be one clustered index, and secondary indexes will then point to the clustered index
- A compromise between a non-clustered index and a clustered index is a covering index or index with included columns, where only some columns are stored within the index.

Multi Column Indexes

- concatenating indexes
- To index a two-dimensional location database (with latitude and longitude), we can use a space-filling curve and use a regular B-tree index.

- Specialized spatial index such as R-trees are also used.

  - group various dimensions (columns) in a n-dimensional space to form rectangles

Full text search and Fuzzy Index

- Some applications require searching for a similar key. This can be done by fuzzy indexes.
- For example, Lucene allows searching for words within edit distance 

Everything in memory

- Memcached, intended for caching only, data lost on restarts
- Counterintuitively, in-memory databases are faster mainly because they avoid serialization and deserialization between in-memory structures and binaries.
- In-memory databases could also provide data models that are hard to implement with disk-based indexes
- The anti-caching approach: least-recently-used data dumped to disk and loaded back when queried


Transactions VS Analytics

Main read pattern Small number of records, fetched by key Aggregated over large number of records
Main write pattern Random-access, low-latency writes from user input Bulk import or event stream
Primarily used by Customer, through web-application Analysts, for making business decisions
Data Latest state of the world History of events
Size GB to TB TB to PB
Bottleneck Disk seek time Disk bandwidth

Data warehousing

- High availability, low latency

- Usually copy-only data

- Can be optimised for analytics access patterns

- key indexing not as important

OLAP Structures

Stars and Snowflakes schema

each dimension is normalised

no data redundancy

less storage required

Data integrity

easy to maintain

one fact with dimensions

easier for analysts to use

faster query time (less joins)

Data could mismatch

easy to set up

Column-Oriented Storage

- In column-oriented storage, data of all rows from each column are stored together. It relies on each file containing the rows in the same order.

- Access time: usually in OLAP we only need a columns, no need to waste time getting entire rows

- Compression: Adjacent similar data can be compressed, just need to track how many rows are the same data. eg. Gender has very few cardinality, Age is also small (none older than 150)  we can sort by the low cardinality columns to achieve better compression

- Optimisations: bitmap encoding for distinct values (eg. country) allows faster equality comparisons, we can also have multiple sort orders copies for regularly used queries

Writing to Column-Oriented Storage

- Sorted columns optimizes for read-only queries, yet writes are more difficult.
- In-place updates would require rewriting the whole column on every write. Instead, we can use a LSM-tree like structure where a in-memory store buffers the new writes. When enough new writes are accumulated, they are then merged with the column files and written to new files in bulk.
- Queries, in this case, would require reading data from both disk and the in-memory store. This will be hidden within the engine and invisible to the user.


Aggregation: Materialized Views

- Materialized view is defined like a standard view, but with results stored on disk, while virtual views are expanded and processed at query time.

- Updating materialized view when data changes is expensive

Aggregation: Data Cube

A common special case of materialized view is a data cube or OLAP cube, where data is pre-summarized over each dimension. This enables fast queries with precomputed queries, yet a data cube may not have the flexibility as raw data.


Can slice and dice (pictured above) and rollup and drill down

Some thoughts

- We don't usually overwrite old data until compaction

- Caching is fast because we save on transforming data to write

- Clustered index in the book is defined quite differently from resources online (book: index contains data from rows, online: index defines how table is sorted i.e not heap) ref: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described




- ...this chapter feels like it has a lot of words

- ...i was too ambitous


- Binary search tree: https://gist.github.com/wileybenet/4a82937aceb2b7532c958775ace85d8a

- LSM tree graphic: https://blog.acolyer.org/2014/11/26/the-log-structured-merge-tree-lsm-tree/

- Bloom filter definition: https://www.geeksforgeeks.org/bloom-filters-introduction-and-python-implementation/

- B+tree tutorial: https://www.programiz.com/dsa/b-plus-tree

- multi column index: https://dataschool.com/sql-optimization/multicolumn-indexes/

- star snowflake schema: keboola.com/blog/star-schema-vs-snowflake-schema


- column oriented storage: https://en.wikipedia.org/wiki/Column-oriented_DBMS

- data cube: http://www.proxml.be/losd/cubeops.html

- Summary of chapter https://aweather.github.io/software-engineering/ddia-3/

DDIA ch.3

By tzyinc

DDIA ch.3

  • 427