Merkle Trees
and Cryptographic Hash Functions
Tsvetan Dimitrov
Agenda
-
Merkle Trees
-
Definition
-
Benefits
-
Use Cases
-
-
Sparse Merkle Trees
-
Cryptographic Hash Functions
-
SHA-256
-
Summary
Merkle Trees
Definition / 1
-
Also called Binary Hash Tree.
-
Leaves represent individual pieces of data, such as transactions in a blockchain.
Definition / 2
-
Each internal node in the tree represents the hash of the concatenation of its two child nodes.
-
This continues until a single root node is reached, which represents the root hash of the entire tree, called Merkle Root.
Definition / 3
-
Data can be audited using the Merkle Root in O(log n) time to the number of leaves. This is called the Merkle Proof.
-
The Merkle Path corresponds to recreating the branch containing the piece of data from the root to the piece of data being audited.
Benefits
-
Validate data integrity.
-
Take small amount of disk space.
-
Small amount of information is sent across the network.
-
Efficient data format enabling fast verification.
-
Prove that giving tiny amounts of information across the network is all that is required for a transaction to be valid.
Use Cases in Blockchain
Other Use Cases
-
Git
-
Interplanetary File System, a peer-to-peer distributed protocol (Merkle DAG).
-
It's part of the technique that generates verifiable certificate transparency logs.
-
Amazon DynamoDB and Apache Cassandra use it during the data replication process to control discrepancies.
Sparse Merkle Trees
-
A standard Merkle Tree, except the contained data is indexed, and each data point is placed at the leaf that corresponds to that data point’s index.
-
Suitable for proving non-inclusion of a node in the tree.
-
Acts as a key-value store inside of a Merkle Tree.
Definition
Space Complexity and Caching
-
Efficient proofs of non-inclusion, but at the cost of huge trees (about 2²⁵⁶ hashes)
-
Luckily, caching is possible because H(null) is a constant value and so is H(H(null)).
-
A naive caching strategy could record every hash that is non-empty.
-
The goal is to capture branches down to the leaves by a constant.
Cryptographic Hash Functions
-
Avoid collisions for non malicious input.
-
Aim to detect accidental changes in data (CRCs).
Non Cryptographic Hash Functions / 1
Non Cryptographic Hash Functions / 2
-
Put objects into different buckets in a hash table with as few collisions as possible.
Properties of CHF
-
Deterministic.
-
Pre-Image Resistant.
-
Computationally Efficient.
-
Cannot be Reversed Engineered.
-
Collision Resistant.
CHF Algorithm Families
SHA-256
Definition
-
Produces a fixed-size, 256-bit hash value.
-
Part of the SHA-2 family of hash functions and designed by the National Security Agency (NSA).
-
Built with a Merkle-Damgård structure derived from a one-way compression function.
-
A brute-force attack would need to make 2²⁵⁶ attempts to generate the initial data.
Step 1: Padding Bits
-
Bits are appended to the original input to make it compatible with the hash function.
-
Total bits must always be 64 bits short of any multiple of 512.
-
The first bit added is 1 and the rest are zeroes.
Original Message
Padding Bits
Step 2: Padding Length
-
The length of the original message is padded to the result from step 1 and it is expressed in the form of 64 bits.
-
The final data to be hashed will now be a multiple of 512 (used mostly to increase the complexity of the function).
Original Message
Padding Bits
Input Length
Step 3: Initialize Chaining Variables
-
The entire message is broken down into blocks of 512 bits each (5 buffers of 32 bits each).
-
They are 5 words named A, B, C, D, and E.
-
The first iteration has fixed hex values.
A = 01 23 45 67
B = 89 ab cd ef
C = fe dc ba 98
D = 76 54 32 10
E = C3 D2 E1 F0
Step 4.1: Process Each Block
-
512 bit block = 16 sub blocks * 32 bits each.
-
4 rounds of operations, utilizing the ABCDE register, the 512 bit block and a constant named K[t].
-
Each round has 20 iterations: 4 * 20 = 80.
-
K[t] is an array of 80 elements.
Step 4.2: Process Each Block
-
On each iteration the following formula is calculated:
ABCDE = E + Process P + S⁵(A) + W(t) + K(t)
-
P = Non Linear Logical Process that changes each round.
-
S⁵ = Circular shift by 5 bits.
-
W(t) = A 32 bit string derived from an existing sub block.
-
K(t) = One of the 80 constants which change with every round.
A
B
C
D
E
A
B
C
D
E
A
B
C
D
E
A
B
C
D
E
Process P
S⁵
W(t)
K(t)
Summary
-
Merkle Trees and Sparse Merkle Trees - definitions, use cases, benefits and drawbacks.
-
Differences between CHF and non CHF.
-
SHA-256 definition and algorithm explanation.
Questions?
Merkle Trees and Cryptographic Hash Functions
By Tsvetan Dimitrov
Merkle Trees and Cryptographic Hash Functions
Introduction to software versioning and release management best practices
- 67