The Bitcoin blockchain

Toptal academy Blockchain lectures #2

2018-02-26

ivan.voras@toptal.com

Today's agenda

  • Satoshi's big invention
  • How are transactions formed, PoW
  • Addresses, Pseudonimity
  • UTXO, DSHA256, ECC
  • The "Script" language
  • How are blocks formed
  • How does it look in the file system
  • Tx index, why not use compression
  • Limitations, tx/s, block time, block size
  • Why forks exist

Satoshi's big invention

Who is Satoshi? -- a guy from the forums...

?

?

?

?

?

?

https://bitcointalk.org/index.php?action=profile;u=3

Satoshi's big invention

Cryptocurrencies before Bitcoin:

  • DigiCash - used RSA to support tx without counterfeiting
  • BitGold - PoW, double-spend protection 10 years before B.
  • HashCash - e-mail spam prevention by PoW

Satoshi's big invention

In Bitcoin, known algorithms were combined in a new way,
sometime around 2008.

 

"The receiver of the first bitcoin transaction was cypherpunk Hal Finney, who created the first reusable proof-of-work system (RPOW) in 2004.[34] Finney downloaded the bitcoin software the day it was released, and received 10 bitcoins from Nakamoto.[35][36] Other early cypherpunk supporters were Wei Dai, creator of bitcoin predecessor b-money, and Nick Szabo, creator of bitcoin predecessor bit gold.[37]"

 

  • (Reusable) Proof of Work with ECC
  • Distributed ledger (the blockchain)
  • Peer-to-peer networking

Proof of Work, Reusable proof of work

Proof of Work: the user (in a general way) does some hard computational work to prove his eligibility to the service provider,

which is trivially easy to verify by that provider.

 

For example:

  1. The server generates a random string of bytes
  2. Sends the random string to the client
  3. Asks the client to prefix its own random string chosen in such a way that the hash of the result meets some condition
  4. The client tries various prefixes so that the resulting hash meets the condition
  5. Sends the prefix and the hash to the server for verification

Proof of Work, Reusable proof of work

Homework

 

  • The server has generated a random string of bytes, which is hex-encoded to look like this:
f064b8b61422a3456cb273a474a1fb0cabb04200a6a82a9426bd01f56c97fbf8c4ef58634fd5cf21af29e7db3406de4f886fe71408696789f853af9932a84b79
  • The server asks you to find a 4-byte prefix to this string of bytes, such that, when SHA256 is applied to the whole, the result of the hash ends with these 2 bytes: 0xff, 0xff.

Be careful! Everything we're talking about here is about binary data, not hexadecimal representation of binary data
(or any other representation).

Proof of Work, Reusable proof of work

Reusable Proof of Work: instead of making every single client do the computationally complex work, make only some of them do it and "trade" the results.

 

--> RPoW = cryptocurrency <--

 

This is how Bitcoin works: miners do the hard work, "earn" coins in return, and then trade them so everyone else exchanges pieces of the "earned" effort.

Distributed transactioning

A solution for having an unknown number of participants in the network all agreeing on what is going on: the blockchain.

 

--> RPoW + Blockchain = Bitcoin <--

 

RPoW is implemented by using ECC (elliptic curve cryptography).

 

Transactions within the blockchain are also chained.

How Bitcoins are created

  1. A miner solves a PoW problem, gets awarded coins.
  2. These initial coins are recorded in a "coinbase" transaction in the blockchain so that only the owner of the private key which belongs to the public key-derived "address" can spend them.
  3. To spend the coins, the owner of the private key creates another transaction, signed by it, which records that the coins can be spent by owners of private keys associated with certain (other) addresses.

Transaction structure

  1. All transactions (except the "coinbase" ones which create new coins) have one or more INPUTS and one or more OUTPUTS.
  2. Outputs of previous transactions become inputs of new transactions.
  3. As inputs, signatures are attached to them which prove that the creator of the transaction which spends the coin is the one which the sender intended.
  4. All inputs to the transactions are always spent. If there is "change", it can go back to the same address in a new output.
  5. Transactions are identified (their "id") by their hash.

How transactions work

TX 1000:

Inputs: ...

Outputs:

  1. 10 coins, spendable by address 1abc123

TX 2341:

Inputs:

  1. Tx 1000, output #1, signed by private key of 1abc123

Outputs:

  1. 5 coins, spendable by address 1dg23fc
  2. 5 coins, spendable by address 1abc123

...

Miners, and every other node, check that the new transaction is signed with the private key which "owns" the address which the previous transaction said can spend the coins.

This is "change", returning
to the same wallet

"UTXO": unspent transaction output

ECDSA

signature

How addresses are generated

  • Addresses are hashes of certain representations of public keys. The process goes through several stages and types of hashes, and includes a checksum for resistance to typos.
  • For each private key, there is one public key, and for each public key, there is one address (except for different presentations of the same public key, "(un)compressed")
  • When addresses are generated in Bitcoin, actually keypairs are generated
  • HD wallets are a way to generate infinite keypairs from a single seed. In this way, only the "seed", acting as a "super private key" is needed to generate all the others.
  • Pseudonimity: no identifying information is attached to addresses (but network sniffers can determine who uses it)

Transaction signatures

Public keys are included with each transaction input to verify that the creator of the transaction can spend them. The process looks like this:

 

  • Previous tx output contains the address which can spend the coins:
    tx_output = [ output_address, amount ]
  • New tx input references the old tx's output, and attaches the public key for which the address can be verified, AND which signs this input:
    tx_input =
    [ signature(tx_output, public_key), tx_output, public_key]

UTXO prevents double-spending

  • Since a transaction's output can be referenced at most once (as the input to a new transaction), this prevents spending the same coins twice.
  • It also prevents replay attacks (where an attacker tries to drain a person's wallet by re-sending their transaction into the network).
  • It also makes each transaction unique, so their hash is also unique (up to the guarantees of the hash algorithm).

A complication: "Script"

There's a small programming language which actually does transaction verification in Bitcoin, it's called "Script". 

Bitcoin "Script"

Because of Script, Bitcoin can support multiple types of transactions, whatever the "Script" language can execute:

 

  • Pay-to-pubkey-hash: this the described case, sending coins in such a way that the holder of a specific private key (to which the address is related) can spend them.
  • Pay-to-script-hsh: the general case, sending coins to a script (nowadays it would be called a "contract"), which then does something "interesting" with them. For example, multi-signature payments.

 

Big limitation: "Script" is deliberately very simple,
not even Turing complete, so no "smart contracts" here.

Transaction fees

As incentives for the miners to include transactions in their blocks, there are "fees".

 

Basically, the miner fee is the sum of all coins referenced as inputs but not spent as outputs in all transactions in the block.

Input 1: 0.5 BTC

Input 2: 0.2 BTC

Output 1: 0.6 BTC

Fee: 0.1 BTC

(implicit)

(recorded in tx)

Homework

Using a service like https://blockchain.info, inspect the structure of the following transactions, including their scripts:

 

4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b

a1075db55d416d3ca199f55b6084e2115b9345e16c5cf302fc80e9d5fbf5d48d

3a1b9e330d32fef1ee42f8e86420d2be978bbe0dc5862f17da9027cf9e11f8c4

1c12443203a48f42cdf7b1acee5b4b1c1fedc144cb909a3bf5edbffafb0cd204

930a2114cdaa86e1fac46d15c74e81c09eee1d4150ff9d48e76cb0697d8e1d72

how Transactions work

From transactions to blocks

  1. Transactions are hashed - those are their IDs.
  2. The block header contains a hash of all the transaction hashes (the "Merkle root")
  3. The block header is hashed to get the block hash, this is the block ID.

 

So "blocks" contain a list of transactions (which the miner has picked to include in their block), the block header contains a hash of the transaction's hashes (i.e. the "Merkle root"), and the block header is "mined" so that its hash looks like this:

 

000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f

 

Proof-of-work mining

Among other data, the header contains a field called "nonce". It is a 32-bit integer which has no other use except that it can be adjusted in a way so that the hash of the header meets
the requirements for mining difficulty.

Example mining difficulty: "the hash must start with 43 zero bits"

...data...

nonce

...data...

hash f(x)

000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f

Proof-of-work mining

Homework

There is a data structure defined as:

struct blockheader {
    payload char[16];
    nonce uint32;
    timestamp uint32;
};

There is an instance of this data structure initialised like this:

bh.payload = "Hello, World!xx\0"
bh.timestamp = 1519640946 

(i.e. the payload is an ASCII string with a 0-byte terminator)

Proof-of-work mining

Homework

Your task is to find a value for nonce so that the SHA256 hash of the instance of the data structure as described on a previous slide starts with 23 bits which are all 0.

 

Notes:

  • There are many such hashes, the solution is not unique.
  • One such solution is 000001710c33ae5db89ccc3a6020e19303aa55366fbc91cd63b4d579deaffa7c for the nonce 43040045
    (assuming little-endian x86 here)

Orphan blocks

All the miners in the world are competing to calculate the "current" block in the blockchain. Regularly, multiple miners calculate the "same" block.

  • It might have different transactions and nonce, etc., but it continues the chain at the same place.
  • Another miner might pick it up and calculate another block on top of it, etc.

The consensus rule is: the longest chain wins.

The role of Proof-of-work

Proof of Work in Bitcoin serves 2 main purposes:

  1. "Anti-spam": some hard work needs to be done to mine coins, it's not possible to just spam the network with random blocks because they are difficult to calculate.
  2. Enforcing the blockchain immutability (because of #1): because it takes a lot of effort to mine blocks, it's computationally to hard for someone to start forging blocks, or re-calculate a part of the blockchain to suit their purpose.

The role of Proof-of-work

Many things in Bitcoin are probabilistic - "will probably happen":

 

  • Transactions will probably be picked up by miners and included in blocks
  • There will probably be a nonce which generates a block hash with the given difficulty requirements
  • There will probably not be many orphan blocks so that your transaction gets "lost"
  • There will probably not be hash collisions for blocks and transactions

 

Though at least the crypto parts are usually with probability of
1/(2^256)

In the filesystem

Bitcoin's code dates from 2008... 

  • A data directory full of binary files
  • Most of the files are application-specific (basically memory dumps), only wallet.dat is in a stable format, BerkeleyDB 4.6
  • Blocks are dumped in the blocks/ folder
    (multiple blocks per file)
  • Full transaction index (optional) is in blocks/index/
  • Metadata and a list of unspent transactions is in chainstate/
  • BerkeleyDB-specific logs are in database/
~/.bitcoin
├── blocks
│   └── index
├── chainstate
└── database

Why chainstate?

As an optimisation, the blockchain node only needs to keep an index of unspent transactions to verify that new blocks are acceptable.

 

For each block, transactions which are spent are removed from the list of unspent transactions, new ones are added.

 

Normally, (without starting bitcoind with -txindex), there is
no way to get data on random transactions in the blockchain by using the built-in functionalities.

(you can read it externally e.g. with
https://github.com/toidi/pyblockchain)

The size of the blockchain

Today, Bitcoin's blockchain is around 145 GB in size.

With all the auxiliary data, including the txindex, it's 183 GB.

 

When imported to a SQL database, with added indexes to quickly access all transactions and addresses, it climbs to 700 GB.

 

Blockchain data itself does not compress very well, because addresses, public keys and signatures are random-like.

 

A 128 MB file from the blocks/ folder
compresses to around 101 MB with xz.

The great scaling debate

Transactions are expensive (in terms of miner fees) and take a long time (block time: 10 minutes + waiting for confirmations).

  • So a tx from a customer may take 30 minutes or more to be considered "valid" and safely in the blockchain.

 

The blockchain's size grows.

Increasing both the block size and the number of blocks created per unit of time increases the blockchain size
and complicates mining.

 

Some alternatives:
Lightning, Braided PoW, Tangle, (D)PoS, Sharding

Forking is cheap

Some of the reasons for forking:

  • Get-rich-quick schemes (probably 95%+)
  • Implementing a feature differently (e.g. Litecoin)
  • Implementing a new feature / extending it (e.g. BCH)
  • Implementing a new way of using it (ZCash)
  • (not a fork: Ethereum: completely new paradigm)

 

Bitcoin and other cryptocurrencies are open source because they are security-sensitive:
users need to trust the implementation.

 

This makes them easy to fork.

THE END

ivan.voras@toptal.com

Blockchain lecture #2: The Bitcoin blockchain

February 2018

Q&A?

Blockchain lectures #2: The Bitcoin blockchain

By Ivan Voras

Blockchain lectures #2: The Bitcoin blockchain

The structure of the Bitcoin blockchain. The structure of blocks, transactions, file structures. The Script language for transactions. How are transactions validated and accepted. How PoW mining works.

  • 999