Ivan Voras
Ivan Voras is a freelancer and entrepreneur, proud of the breadth of the projects under his belt, and those span from Bitcoin and blockchain technologies, to hardware solutions for Internet of Things devices.
Toptal academy blockchain lectures #1
2018-02-21
ivan.voras@toptal.com
I'll be here all day...
Wrote a "for dummies"
book about it
http://scepticsguide.ivoras.net/
(It's not for the technically-minded)
It's commonly implemented like this (as a singly-linked list):
Node 1 (head)
Next node pointer
Node 2
Next node pointer
Node N (tail)
NULL
...
Nodes can be added to either end. We're interested in the case of adding nodes to the front:
Node 1 (head) 2
Next node pointer
Node 2 3
Next node pointer
Node N (tail)
NULL
New node (new head)
Next node pointer
...
Step 1: Replace pointers with cryptographic signatures.
Step 2: Done.
Node 3
Previous block signature
Node 2
Previous block signature
Node 1 (genesis block)
Previous: NULL
New node (new block)
Previous block signature
This becomes the "first" (head) node
Every new node includes a cryptographic signature of the previous node (usually just a hash)
...
Blockchains are a data structure where data is grouped in blocks, where each block (among other things) contains a cryptographic signature (usually a hash) of the previous block in the chain.
This structure guarantees that each new block effectively contains a signature of all previous blocks, making the data stored in the blockchain immutable (i.e. any changes become obvious because the signatures don't match).
Because of this, it's suitable for sensitive data, e.g. financial data, and that is why it was used in Bitcoin and other cryptocurrencies.
Having blocks which, among other useful data, contain digital signatures of the preceding block, is sufficient to make a blockchain data structure. On top of that, other, database-like functionalities are usually added.
In practice, think of a blockchain as a type of database.
"relational database", "graph database", "nosql document database" and "blockchain" are all types of databases.
Bitcoin contains a specific implementation of a
blockchain database, the same as PostgreSQL contains a specific implementation of a relational database.
Hash functions
They accept inputs of arbitrary sizes (buffers of bytes), and produce a fixed-sized output. Because of that they are sometimes called "compression functions" but that name is misleading because they are unidirectional, irreversible
(there's no "uncompress")
Cryptographic hashes
In addition to these, have the following properties:
Common hash functions are SHA256 and SHA3-256.
Is CRC32 a good cryptographic hash?
NO.
#1: 32-bits is way too short: even without any advanced techniques or theory, it's enough to make at most 2^32=4Gi changes to a custom document you create in order for it to have the same CRC32() as the original one. This can be done quickly on modern CPUs. If you apply cryptanalysis, a little theory from Wikipedia, you can find CRC32() collisions much faster.
#2: It deliberately has a structure which was created for non-security purposes, it does not have good "avalanching"
Homework:
Find a document (a buffer of ASCII bytes) which has the CRC32 hash (the gzip variant of CRC32), of 0xbbd1264f.
Symmetric ("ordinary") cryptography
This is the one where the same password is used
for encryption and decryption.
Important: Passwords, when stored in a database
should not be encrypted, because encryption is reversible;
they should be hashed.
There are block encryption algorithms and stream encryption algorithms. The former (e.g. AES) only encrypt fixed-sized blocks of data (e.g. 16 bytes), and the latter encrypt arbitrary data sizes. Commonly, block encryption algorithms can be adapted with special additional algorithms to create stream encryption algorithms (e.g. AES-CTR).
Symmetric ("ordinary") cryptography
AES256: this algorithm encrypts blocks of 16 bytes (128 bits), and uses 32-byte keys (256 bits, hence the name) to do it. Keys here are often the result of hashing a user-entered password.
In practice, the basic algorithm always needs to be augmented with additional processing to avoid specific forms of attacks. Block ciphers at the very least need to be chained so the output of a previous block is XOR-ed with the input of the next, which is called the CBC "mode". Hence, AES256-CBC. Other common modes are CTR and GCM which convert AES to a stream cipher.
See Bruce Schneier: "Cryptography engineering", or
the "Crypto 101" on-line intro.
Never ever invent your own cipher or mode!
Never ever invent your own cipher or mode!
Because people have spent their entire lives or at least academic careers finding out the right way to encrypt data, and combine encryption methods, so that they are safe - and still failed.
The probability that you will think something up over the weekend and create a working, secure cipher are ... low,
very, very low.
Literally - this is mathematics more than CS. You need a career devoted to cryptography just to be aware of all the methods that have previously been tried and failed.
Asymmetric (public key) cryptography
Commonly, there are 2 keys which are related. What is encrypted with one of them, can be decrypted only with the other. These algorithms are generally much slower than symmetric algorithms, so are used in combination with them to increase the overall performance. If A wants to send a message to B:
Asymmetric (public key) signatures
Signatures can be implemented as encryption of the hash of the message. If A wishes to sign a message (i.e. vouch that the message is authentic so that B can verify its authenticity):
This proves that A had the same message as B and that it intentionally signed it (i.e. encrypted its hash).
Homework
I've used the "openssl" command line utility (available on Linux, Windows and OSX) in this way:
openssl genrsa -out keypair.pem 2048
openssl rsa -in keypair.pem -out public.pem -outform PEM -pubout
my public key from public.pem is:
-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0UyKNoJYVqwW3Cte31Ec
HAc8fAUmeR0UfbuCCkpyOSbADirXuVNiVYpQgPkphlml3KrgprAdA/5X1xUujF+l
mxp+Bm3UsskpB+N55Nulep0uW0BdKx1DiCuAj6qVDc+Tqp1i7NvqDbIiUAL2VlUq
9sunUdiCbaztuayam8gqmqnO73dlboMiB5DK/OitmNJcGW9I8LkZvZLiS9znQgJz
lrbCeCK0ym3KbVGo3ZG8ei8Zbf8EiGxdb84V0QDvTm15l3gqiOcmH++HqNkbUeXZ
b+fzCFFZoeHD67od5TXh1YlfeiW3WLP7ZT3kROD4hVgIbjYQmEtLc50WXXRIYTAg
5QIDAQAB
-----END PUBLIC KEY-----
Homework
With my private key, I have signed the file found on this link:
https://i.imgur.com/O7yLqqq.jpg
by using this command:
openssl dgst -sha256 -sign keypair.pem -out file.signature O7yLqqq.jpg
I've used the base64 command-line utility to generate this representation of the "file.signature" file:
Btdgy9GhfmX0fphIC77as2s5OU5xLDLnjrpPP3uujg55Wf7vIJ8OW47Kcw0VYCVh/kFZtwLfhsgv
xQbZqPzyr2PEAqA8Y5e7Pp1NtX4w7qgBgV3VGEl6oWKXHwU/z0cMhZ9U6m5IzaENMLUaLjjHvBcT
yYHxCXRMytyh9s5LmlRisjAH9xuJIVqz623dALlwTabypdL8PnwEiwRwH+3KCbKH1LvWu0i696kY
YOR0kTib2mOOI/R5jiQpYuo8Qnm8TwBk04wplSgcZ/OHr7arTeZ9yZTRCKnl8Gq7qc1lPj8BtCs7
5x+gruR5G5LuCPUABhTSVA1KPb50aV8xyM1IkQ==
Your task is to write down the openssl command line which will verify my signature of that file.
Pros:
Cons:
In many ways, saying that a product is using
"blockchain technology"
is similar to saying a project uses
"linked list technology"
... what matters is not the technology but what you do with it.
Everyone can participate.
So data needs to be accessible by everyone, including for mining.
Leads to the design of Proof of Work mining.
Security concerns lead to less performance: the blockchain is global, its operations need to be globally synchronised.
Only a known set of nodes can participate.
Those nodes can identify themselves directly.
No need for Proof of Work.
Leads to Proof of Authority design.
Can be more performant as the nodes are known and there's a known number (or upper bound) of them.
Bitcoin contains the first popular implementation of the blockchain, it brought the concept into the spotlight.
Pioneered the concept of combining blockchains with "proof of work" algorithms in order to force the users to "work hard" to "sign" the blocks: mining was born.
Dirty "implementation is the specification" principle - the "Bitcoin system" is whatever the "Bitcoin code" implements.
Ethereum contains arguably the second (by popularity) blockchain, it extended the original idea.
Has a more-or-less well-defined specification outside the base implementation - there are Ethereum nodes in Go, C++, Python...
A more elegant implementation, extending all features in some way: faster block mining, adding smart contracts, supporting "uncle" blocks, simpler transaction structure...
Used to construct hashes for large objects from their parts.
Motivation: sometimes the parts are unknown or too big.
As implemented in common cryptocurrency blockchains it literally means "everyone who runs the same executable follows the same rules."
Examples of consensus rules are:
For common cryptocurrencies, these are often a row of if-thens which check the contents of a block, taking into account previous blocks, and deciding if the new one is valid.
For example: "calculate a new difficulty based on past N blocks; if the difficulty from the newly received block is lesser, discard it."
The concepts tie in with with the idea of distributed consensus.
Soft fork
Introduces limitations on what is valid. New versions of software simply stop producing some forms of transactions / blocks, etc. which were previously valid.
Old software continues to accept data created by new versions - the data will simply lack certain features.
Hard fork
Introduces new features which old versions of software do not support or recognise as valid. Backwards-incompatible.
Old version of software will discard data created by new versions - often because it has features it doesn't know how to handle.
Many of these are also tied in with the distributed consensus.
... but it's good for the specific niches where its good sides outweigh the bad ones. Specifically:
Outside of those, simpler solutions are likely to be better.
Algorithms and data structures used for communication are standardised because every party needs to understand them.
Pay special attention that e.g. SHA256 is defined as accepting binary input (bytes) and produces a binary output (32 bytes).
It is NOT defined as producing 64 hexadecimal characters as output.
When dealing with binary data structures, things as endianess might be a part of the specification.
ivan.voras@toptal.com
Blockchain lecture #1: Introduction
February 2018
By Ivan Voras
Introduction: what the blockchain is, what are its optimal use cases, how is it implemented today in major cryptocurrencies: Bitcoin, Ethereum. What are the key components of a blockchains and how they are implemented. Differences and similarities between blockchain implementations. "Why" in addition to "how".
Ivan Voras is a freelancer and entrepreneur, proud of the breadth of the projects under his belt, and those span from Bitcoin and blockchain technologies, to hardware solutions for Internet of Things devices.