Apache Cassandra
Finding the beauty in the architecture
whoami ?
Backend Developer at Rooter Sports
Shweta Suman
cosmologist10
A day of call !
SQL or NoSQL
Journey from SQL to NoSQL databases
Problem statement
- Storing tons of streamed Geospatial data every minute
- Indexing a large number of documents
- lots and lots of reading and writing tasks
Geospatial data
Why NoSQL over SQL?
- It does not require a fixed schema.
- Avoids joins
- Easy to scale
- Used for distributed data stores with humongous data storage needs.
- Store a large number of data without lowering performance of the read and write operations.
- Cheaper to maintain
Different types of NoSQL databases
Column family
key -value
Document
Graph
CAP theorem
- Masterless with no single point of failure
- Linear scalability
- Geographical distribution
- Consistency
- Providing highly available services
- 100% uptime
- Predictable scalability
- Peer to Peer Architecture
- Schema free
- Tunable Consistency and CAP parameters
Why Cassandra !
Architecture
2. Data distribution
3. Replication
1. Design Goals: scale with continuous availability
Data distribution
masterless “ring” distributed architecture
Searching algorithms time complexity:
Time complexity : 0(1)
Problem : Finite Memory
Solution :
Problem : Collision
Solution:
1. Chaining
2. Probing
Amount of data increased
Increase the size of storage
Rehashing of all data
Lookup issue with data !!
Problem :
Solution: consistent hashing
Data replication
Strategies
- Simple Strategy
- Network Topology Strategy
Remaining topics :
Node Architecture
Write operation
Read operation
Thank you !
deck
By shweta suman
deck
- 798