What is Cassandra?

  • Distributed Database
  • Open Source
  • No single point of failure
  • Highly Scalable
  • Tunable Consistency

Why pick Cassandra?

  • High availability
  • Store massive amounts of data
  • Handle high volume of reads or writes
  • Easily integrate with analytics/search/etc

Who uses Cassandra?

Tons of people... some have listed themselves here:

http://planetcassandra.org/companies/

  • Netflix
  • Microsoft
  • CERN
  • NASA
  • Instagram
  • Call of Duty
  • ... TONS more.....

 

 

 

How does it work?

  • This is a cluster
  • composed of peer-to-peer nodes (the dots)
  • A "node" is an instance of Cassandra
  • The "ring" is a range of token values from -2^63 to 2^63
  • (out of scope: racks, data centers)

Change the Cluster

->

Change the Cluster

Cool Bits

  • "Seed nodes" are just regular nodes that you (the ops person) knows about.
  • No "special" node
  • The topology and node liveliness are determined via gossip protocol, not a central authority.

Read/Write

Coordinator

  • Client picks *any* node to talk to
  • Any node can be a coordinator
  • If a node goes down, just move on to the next one.

Write Replication

Writes can be Replicated

  • The number of nodes that store a write is called the Replication Factor "RF"  
  • You can write to every node in a cluster.
  • You can control where writes are stored with a replication strategy.

Client Consistency

Cassandra offers tunable consistency

  • CL is the number of nodes that must acknowledge a client request.
  • Varying CL helps achieve differing availability or performance goals.
  • Common CLs: 1, Quorum, ALL

Learn More

https://academy.datastax.com/

Intro To Cassandra

By Philip Doctor

Intro To Cassandra

What is Cassandra? When to use it?

  • 1,300