Apache Cassandra
Finding the beauty in the architecture

whoami ?

Backend Developer at Rooter Sports
Shweta Suman
cosmologist10
A day of call !

SQL or NoSQL
Journey from SQL to NoSQL databases

Problem statement

- Storing tons of streamed Geospatial data every minute
- Indexing a large number of documents
- lots and lots of reading and writing tasks
Geospatial data
Why NoSQL over SQL?
- It does not require a fixed schema.
- Avoids joins
- Easy to scale
- Used for distributed data stores with humongous data storage needs.
- Store a large number of data without lowering performance of the read and write operations.
- Cheaper to maintain
Different types of NoSQL databases
Column family
key -value




Document
Graph
CAP theorem

- Masterless with no single point of failure
- Linear scalability
- Geographical distribution
- Consistency
- Providing highly available services
- 100% uptime
- Predictable scalability
- Peer to Peer Architecture
- Schema free
- Tunable Consistency and CAP parameters
Why Cassandra !
Architecture
2. Data distribution
3. Replication
1. Design Goals: scale with continuous availability
Data distribution

masterless “ring” distributed architecture
Searching algorithms time complexity:

Time complexity : 0(1)

Problem : Finite Memory

Solution :
Problem : Collision
Solution:
1. Chaining


2. Probing
Amount of data increased
Increase the size of storage

Rehashing of all data
Lookup issue with data !!
Problem :
Solution: consistent hashing


Data replication
Strategies
- Simple Strategy
- Network Topology Strategy


Remaining topics :
Node Architecture
Write operation
Read operation
Thank you !
deck
By shweta suman
deck
- 921