Decentralizing the Cloud

Who Am I?

+ David Vorick

+ Bitcoin Enthusiast since 2011

+ Bitcoin Researcher since 2013

+ Blockchain Technical Expert

+ Part of the Sia team since 2014

What is Sia?

+ Decentralized Data Store

+ Low latency, high throughput uploads and downloads

+ Functioning Prototype Since June 2015

+ 3 full time devs, active community

Modern Cloud Storage

+ Data owned by 1 company

+ Often unencrypted, unauthenticated

+ Often in 1 legal jurisdiction

+ Profit motives may not align with the consumer

+ Gated ecosystems can inflate prices, hurt compatibility

Goals of Decentralization

+ Give control to the owner - the owner should know that data is safe, available, and private

+ Eliminate trust - the owner of the data should not need to rely on anyone to guarantee the security of the data

+ Spread out power within the ecosystem - when a single party controls too much, they can dictate the industry on their terms, often to the disadvantage of everyone else

Blockchains help us achieve decentralization

The Blockchain

+ A database

+ A database with specific rules on updates

+ A database with a specific ordering for updates

+ A database where specific ordering is enforced without a central party

Example Blockchain Rules

+ Some people have money

+ Money can be transferred from one person to another

+ Money cannot be duplicated - once money has been transferred from one person to another, the original owner no longer has the money

Multiple Histories Problem

+ 'Alice' has $10

+ 'Alice' sends $10 to 'Bob'

+ 'Alice' sends $10 to 'Charlie'

 

+ Who has the money now, Alice, Bob, or Charlie? For the system to work, everybody must have the same answer to this question.

Proof-of-Work

+ When looking at multiple histories, everyone agrees that the history with the most work is the valid history

+ Work is easy to verify, easy to see which history has the most work

+ To alter history, more work needs to be placed on the alternate history than is available on the currently accepted history

Mining

+ Miners get paid to extend the longest history

+ Miners will not get paid for work that extends an alternate history unless it becomes the longest history

+ Miners are incentivized to work on the longest known history because they know all other miners will be working on the longest known history

Trustless Verification

+ Anybody can verify the longest history

+ Anybody can verify that alternate histories don't have as much work

+ The only requirement for knowing the truth as accepted by everyone else is having the longest history

+ In Bitcoin, all full nodes keep a full copy of the longest history, and upload it freely to anyone requesting it

Trustless verification is unique to POW blockchains!

Goal of Sia

+ Sia would like to bring this same trustless verification to cloud storage

+ The ultimate goal is to know that your data is safe, and to know that there's nothing anybody can do to harm your data or prevent you from accessing it

Limitations of the Cloud

+ Cloud storage means someone else controls the data, by definition

+ The host can unplug or delete data

+ Host can share data without permission

+ No way to guarantee data safety.

Decentralization with Sia

+ Despite limitations, Sia is powerful

+ Data given to many hosts

+ Encrypted and authenticated

+ Many simultaneous legal jurisdictions

+ Incentives align with the consumer

+ Open marketplace drives prices down, encourages innovation

Creating a Host Ecosystem

+ Any host should be able to offer storage to the network

+ This includes potentially untrustworthy hosts

+ Cryptography, erasure coding, and smart contracts allow us to trust that data is safe, even if we don't trust the host

Creating a Host Ecosystem

+ The core of the host ecosystem is the file contract

+ The file contract is essentially escrow for storage payments - host is guaranteed to get paid, but only if they can prove that they held the data

+ Bad hosts don't get paid

File Contracts

+ Renter and host both put money into a file contract

+ File contract contains a Merkle root of the file, along with the size of the file

+ File contract contains a duration. At the end of the duration, the host must provide a proof-of-storage to the blockchain to get the money in the contract

The Storage Proof

+ A random 64 byte segment is chosen

+ Host must upload segment to the blockchain, along with a Merkle tree proof.

+ Random number seeded by a block ID after the contract duration ends (block IDs are expensive/difficult to manipulate)

Storage Proof 2

+ Host has proven storage for 1 random segment

+ No way to predict which segment

+ Cheating has negative expected value due to the host and renter both adding money to the file contract

Recap

+ There are a bunch of hosts competing for storage contracts

+ Storage contracts force hosts to keep data - there is financial penalty for any attempt to cheat

+ The renter can leverage many hosts to create a safe file upload

Uploading Strategy

+ Use Reed-Solomon erasure coding to upload to many hosts

+ Assuming 95% reliable hosts, '7 of 21' provides 99.999999999% reliability, 3x overhead

+ Using lots of hosts means high parallelism for downloads - lots of throughput, though latency may be affected

Uploading Strategy

+ Increasing the number of hosts means increasing file reliability

+ With 100 hosts, 95% uptime per host, independent failures, and 1.2x redundancy, 99.9999999999% uptime can be achieved

+ The benefits from the network effects are huge!

Uploading Strategy

+ Uploading scheme is fully customizable - any erasure scheme is allowed

+ Uploading can be set to optimize for cost, for uptime, for throughput

+ Sia's global network of hosts makes a great foundation for a CDN

Potential Problems

+ Host is financially incentivized to keep the data. Encryption protects the data

+ Host is not necessarily required to upload the data upon request - data can be held hostage

+ Hosts may be very slow, even if honest

Mitigating Issues

+ High redundancy + parallelism is used, many slow hosts can collectively still provide high throughput

+ Hosts attempting to hold data hostage can be ignored as long as there's a full copy spread among the non-malicious hosts

+ Hosts that get ignored lose out on bandwidth revenue

Not all Storage Created Equal

+ Hosts can have lots of downtime

+ Hosts can have slow speeds

+ Hosts can have high latency

+ Hosts may be Glacier-like in retrieval time

+ Hosts may execute Sybil attacks

Sybil Attacks

+ A Sybil attack is when one person pretends to be many

+ A single host can pretend to be 100,000 hosts, enabling them to get all of a target's data

+ Hostage attacks are now possible

Reputation System Needed

+ Need some way to tell what type of storage / service a host can provide

+ Don't want to trust the host, or any network of unknowns

+ Reputation system needs to handle Sybil attacks as well

+ Biggest challenge in Sia is to determine reliability of unknown / untrusted hosts

Option 1: Internal Tracking

+ Renter tracks host uptime, frequently challenges host  to do off-chain storage proofs.

+ Renter measures latency, throughput, and other relevant metrics

+ Creates a fully trustless, direct observation based reputation system

Option 2: Third Party Tracking

+ Trust a third party to track all host metrics

+ Third party can be more diligent

+ Host may learn to cheat / prioritize the third party

+ Requires trusting the third party

Option 3: Trust, but Verify

+ Keep internal observations

+ Get observations from multiple third parties

+ Cross reference third party observations against eachother and against internal observations

+ More complex, but a stronger solution

Sybil Attack Mitigation

+ Sybil attacks can fool third parties too

+ Sybil attacker may be the third party

 

+ Use Proof-of-Burn to make Sybil attacks expensive

Proof of Burn

+ Provably destroy wealth, in this case, coins

+ Linear relationship between credibility and volume destroyed

+ Hosts that do not destroy coins are not viewed as credible

+ Sybil attacks become very expensive

Limitations of Burn

+ Proving credibility costs money

+ The wealthy have an advantage

+ The wealthy however likely have more and better storage, so should have an advantage

+ Long term reputation will correct for imbalances

Alternate Sybil Defenses

+ Rely on other forms of identity

+ Can use government IDs, web-of-trust, or other identity systems that exist

+ Most non-burn methods of identity can be cheated, or are centralized

+ Decentralized identity is largely an unsolved problem

Recap

+ Cannot eliminate attack, but can use multiple methods to make attacks expensive

+ At some point, attacks become not worthwhile

+ Finding the optimal set of defences is an ongoing effort

+ What we have now is already pretty good

Scalability of Sia

+ One file contract per renter-host relationship

+ Unlimited amount of data per file contract

+ Sia currently supports about 50 million file contracts per year

Scalability of Sia

+ Most users will need between 20 and 200 file contracts per year.

+ Scale is therefore between 250,000 and 2.5 million users.

+ Potential improvements on horizon promise 5x - 100x improvements

+ Still not great.

Scalability of Sia

+ As scaling limits are reached, transaction fees will be used to determine who can access the blockchain

+ Relationships that use lots of blockchain space will have less overhead

+ More favorable to enterprises than to consumers - 2.5 million enterprises is a LOT!

Ecosystem

+ Decentralized ecosystem encourages stability and collaboration - breaking compatibility is extremely difficult by design

+ Many developers already building apps on top of Sia, and Sia is still a prototype

Ecosystem

+ A competitive marketplace of hosts keeps prices at their absolute minimum

+ Renters with wide diversity of needs and geographic locations keeps the ecosystem spread out

+ Reputation system means quality of service is emphasized over branding

Conclusion

Sia brings a revolutionary change to the cloud storage industry. Sia eliminates trust in a single, central source, spreads out data geographically, drives prices down, and successfully navigates adversarial conditions despite challenges

Thanks

Q & A

Deck is safe for reuse - best effort has been made to include only images that are 'labed for reuse'

Decentralizing the Cloud

By David Vorick

Decentralizing the Cloud

  • 1,026