Writes in Cassandra aren’t free, but they’re awfully cheap. Cassandra is optimized for high write throughput, and almost all writes are equally efficient. If you can perform extra writes to improve the efficiency of your read queries, it’s almost always a good tradeoff. Reads tend to be more expensive and are much more difficult to tune.
Disk space is generally the cheapest resource (compared to CPU, memory, disk IOPs, or network), and Cassandra is architected around that fact. In order to get the most efficient reads, you often need to duplicate data.
These are the two high-level goals for your data model:
Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key
Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible.
The way to minimize partition reads is to model your data to fit your queries. Don’t model around relations. Don’t model around objects. Model around your queries.
-2^63 (java.long.MIN_VALUE) +2^63-1 (java.long.MAX_VALUE)
* - only if CL is met, otherwise write operation is reported back as failed
When to run nodetool repair:
NEVER leave the default settings for the data_file_directories and commitlog_directory
(benefit from SSD sequential writes)