Apache Cassandra

Finding the beauty in the architecture

whoami ?

Backend Developer at Rooter Sports

Shweta Suman 

cosmologist10

A day of call !

SQL or NoSQL

Journey from SQL to NoSQL databases

Problem statement

  1. Storing tons of streamed Geospatial data every minute
  2. Indexing a large number of documents
  3. lots and lots of reading and writing tasks

Geospatial data

Why NoSQL over SQL?

  • It does not require a fixed schema.
  • Avoids joins
  • Easy to scale
  • Used for distributed data stores with humongous data storage needs.
  • Store a large number of data without lowering performance of the read and write operations.
  • Cheaper to maintain

Different types of NoSQL databases

Column family 

key -value

Document

Graph

CAP theorem

  1. Masterless with no single point of failure
  2. Linear scalability
  3. Geographical distribution
  4. Consistency
  5. Providing highly available services
  6. 100% uptime
  7. Predictable scalability
  8. Peer to Peer Architecture
  9. Schema free
  10. Tunable Consistency and CAP parameters

Why Cassandra !

Architecture

2. Data distribution

3. Replication

1. Design Goals: scale with continuous availability

Data distribution

masterless “ring” distributed architecture

Searching algorithms  time complexity:

Time complexity : 0(1)

Problem : Finite Memory

Solution :

Problem : Collision

Solution:

1. Chaining

2. Probing

Amount of data increased

Increase the size of storage

Rehashing of all data

Lookup issue with data !!

Problem :

Solution:  consistent hashing

Data replication

 Strategies

  • Simple Strategy
  • Network Topology Strategy

Remaining topics :

Node Architecture

Write operation

Read operation

Thank you !

deck

By shweta suman

deck

  • 798