Decentralizing the Cloud
Who Am I?
+ David Vorick
+ Bitcoin Enthusiast since 2011
+ Bitcoin Researcher since 2013
+ Blockchain Technical Expert
+ Part of the Sia team since 2014
What is Sia?
+ Decentralized Data Store
+ Low latency, high throughput uploads and downloads
+ Functioning Prototype Since June 2015
+ 3 full time devs, active community
Modern Cloud Storage
+ Data owned by 1 company
+ Often unencrypted, unauthenticated
+ Often in 1 legal jurisdiction
+ Profit motives may not align with the consumer
+ Gated ecosystems can inflate prices, hurt compatibility
Goals of Decentralization
+ Give control to the owner - the owner should know that data is safe, available, and private
+ Eliminate trust - the owner of the data should not need to rely on anyone to guarantee the security of the data
+ Spread out power within the ecosystem - when a single party controls too much, they can dictate the industry on their terms, often to the disadvantage of everyone else
Blockchains help us achieve decentralization
The Blockchain
+ A database
+ A database with specific rules on updates
+ A database with a specific ordering for updates
+ A database where specific ordering is enforced without a central party
Example Blockchain Rules
+ Some people have money
+ Money can be transferred from one person to another
+ Money cannot be duplicated - once money has been transferred from one person to another, the original owner no longer has the money
Multiple Histories Problem
+ 'Alice' has $10
+ 'Alice' sends $10 to 'Bob'
+ 'Alice' sends $10 to 'Charlie'
+ Who has the money now, Alice, Bob, or Charlie? For the system to work, everybody must have the same answer to this question.
Proof-of-Work
+ When looking at multiple histories, everyone agrees that the history with the most work is the valid history
+ Work is easy to verify, easy to see which history has the most work
+ To alter history, more work needs to be placed on the alternate history than is available on the currently accepted history
Mining
+ Miners get paid to extend the longest history
+ Miners will not get paid for work that extends an alternate history unless it becomes the longest history
+ Miners are incentivized to work on the longest known history because they know all other miners will be working on the longest known history
Trustless Verification
+ Anybody can verify the longest history
+ Anybody can verify that alternate histories don't have as much work
+ The only requirement for knowing the truth as accepted by everyone else is having the longest history
+ In Bitcoin, all full nodes keep a full copy of the longest history, and upload it freely to anyone requesting it
Trustless verification is unique to POW blockchains!
Goal of Sia
+ Sia would like to bring this same trustless verification to cloud storage
+ The ultimate goal is to know that your data is safe, and to know that there's nothing anybody can do to harm your data or prevent you from accessing it
Limitations of the Cloud
+ Cloud storage means someone else controls the data, by definition
+ The host can unplug or delete data
+ Host can share data without permission
+ No way to guarantee data safety.
Decentralization with Sia
+ Despite limitations, Sia is powerful
+ Data given to many hosts
+ Encrypted and authenticated
+ Many simultaneous legal jurisdictions
+ Incentives align with the consumer
+ Open marketplace drives prices down, encourages innovation
Creating a Host Ecosystem
+ Any host should be able to offer storage to the network
+ This includes potentially untrustworthy hosts
+ Cryptography, erasure coding, and smart contracts allow us to trust that data is safe, even if we don't trust the host
Creating a Host Ecosystem
+ The core of the host ecosystem is the file contract
+ The file contract is essentially escrow for storage payments - host is guaranteed to get paid, but only if they can prove that they held the data
+ Bad hosts don't get paid
File Contracts
+ Renter and host both put money into a file contract
+ File contract contains a Merkle root of the file, along with the size of the file
+ File contract contains a duration. At the end of the duration, the host must provide a proof-of-storage to the blockchain to get the money in the contract
The Storage Proof
+ A random 64 byte segment is chosen
+ Host must upload segment to the blockchain, along with a Merkle tree proof.
+ Random number seeded by a block ID after the contract duration ends (block IDs are expensive/difficult to manipulate)
Storage Proof 2
+ Host has proven storage for 1 random segment
+ No way to predict which segment
+ Cheating has negative expected value due to the host and renter both adding money to the file contract
Recap
+ There are a bunch of hosts competing for storage contracts
+ Storage contracts force hosts to keep data - there is financial penalty for any attempt to cheat
+ The renter can leverage many hosts to create a safe file upload
Uploading Strategy
+ Use Reed-Solomon erasure coding to upload to many hosts
+ Assuming 95% reliable hosts, '7 of 21' provides 99.999999999% reliability, 3x overhead
+ Using lots of hosts means high parallelism for downloads - lots of throughput, though latency may be affected
Uploading Strategy
+ Increasing the number of hosts means increasing file reliability
+ With 100 hosts, 95% uptime per host, independent failures, and 1.2x redundancy, 99.9999999999% uptime can be achieved
+ The benefits from the network effects are huge!
Uploading Strategy
+ Uploading scheme is fully customizable - any erasure scheme is allowed
+ Uploading can be set to optimize for cost, for uptime, for throughput
+ Sia's global network of hosts makes a great foundation for a CDN
Potential Problems
+ Host is financially incentivized to keep the data. Encryption protects the data
+ Host is not necessarily required to upload the data upon request - data can be held hostage
+ Hosts may be very slow, even if honest
Mitigating Issues
+ High redundancy + parallelism is used, many slow hosts can collectively still provide high throughput
+ Hosts attempting to hold data hostage can be ignored as long as there's a full copy spread among the non-malicious hosts
+ Hosts that get ignored lose out on bandwidth revenue
Not all Storage Created Equal
+ Hosts can have lots of downtime
+ Hosts can have slow speeds
+ Hosts can have high latency
+ Hosts may be Glacier-like in retrieval time
+ Hosts may execute Sybil attacks
Sybil Attacks
+ A Sybil attack is when one person pretends to be many
+ A single host can pretend to be 100,000 hosts, enabling them to get all of a target's data
+ Hostage attacks are now possible
Reputation System Needed
+ Need some way to tell what type of storage / service a host can provide
+ Don't want to trust the host, or any network of unknowns
+ Reputation system needs to handle Sybil attacks as well
+ Biggest challenge in Sia is to determine reliability of unknown / untrusted hosts
Option 1: Internal Tracking
+ Renter tracks host uptime, frequently challenges host to do off-chain storage proofs.
+ Renter measures latency, throughput, and other relevant metrics
+ Creates a fully trustless, direct observation based reputation system
Option 2: Third Party Tracking
+ Trust a third party to track all host metrics
+ Third party can be more diligent
+ Host may learn to cheat / prioritize the third party
+ Requires trusting the third party
Option 3: Trust, but Verify
+ Keep internal observations
+ Get observations from multiple third parties
+ Cross reference third party observations against eachother and against internal observations
+ More complex, but a stronger solution
Sybil Attack Mitigation
+ Sybil attacks can fool third parties too
+ Sybil attacker may be the third party
+ Use Proof-of-Burn to make Sybil attacks expensive
Proof of Burn
+ Provably destroy wealth, in this case, coins
+ Linear relationship between credibility and volume destroyed
+ Hosts that do not destroy coins are not viewed as credible
+ Sybil attacks become very expensive
Limitations of Burn
+ Proving credibility costs money
+ The wealthy have an advantage
+ The wealthy however likely have more and better storage, so should have an advantage
+ Long term reputation will correct for imbalances
Alternate Sybil Defenses
+ Rely on other forms of identity
+ Can use government IDs, web-of-trust, or other identity systems that exist
+ Most non-burn methods of identity can be cheated, or are centralized
+ Decentralized identity is largely an unsolved problem
Recap
+ Cannot eliminate attack, but can use multiple methods to make attacks expensive
+ At some point, attacks become not worthwhile
+ Finding the optimal set of defences is an ongoing effort
+ What we have now is already pretty good
Scalability of Sia
+ One file contract per renter-host relationship
+ Unlimited amount of data per file contract
+ Sia currently supports about 50 million file contracts per year
Scalability of Sia
+ Most users will need between 20 and 200 file contracts per year.
+ Scale is therefore between 250,000 and 2.5 million users.
+ Potential improvements on horizon promise 5x - 100x improvements
+ Still not great.
Scalability of Sia
+ As scaling limits are reached, transaction fees will be used to determine who can access the blockchain
+ Relationships that use lots of blockchain space will have less overhead
+ More favorable to enterprises than to consumers - 2.5 million enterprises is a LOT!
Ecosystem
+ Decentralized ecosystem encourages stability and collaboration - breaking compatibility is extremely difficult by design
+ Many developers already building apps on top of Sia, and Sia is still a prototype
Ecosystem
+ A competitive marketplace of hosts keeps prices at their absolute minimum
+ Renters with wide diversity of needs and geographic locations keeps the ecosystem spread out
+ Reputation system means quality of service is emphasized over branding
Conclusion
Sia brings a revolutionary change to the cloud storage industry. Sia eliminates trust in a single, central source, spreads out data geographically, drives prices down, and successfully navigates adversarial conditions despite challenges
Thanks
Q & A
Deck is safe for reuse - best effort has been made to include only images that are 'labed for reuse'
Decentralizing the Cloud
By David Vorick
Decentralizing the Cloud
- 1,157