Cassandra

Apache Cassandra is an open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.

Key Concepts

Cluster

A cluster, in Cassandra, is a collection of nodes or “Data Centers” arranged in a ring architecture. A name must be assigned to every cluster, which will subsequently be used by the particpating nodes.

Keyspace

If you know about relational databasees, then the schema is the respective keyspace in Cassandra. The keyspace is the outermost container for data in Cassandra. The main attributes to set per keyspace are the Replication Factor, the Replica Placement Strategy, and the Column Families.

Column Family

Column families in Cassandra are like tables in traditional relational databases. Each column family contains a collection of rows which are represented by a Map<RowKey, SortedMap<ColumnKey, ColumnValue>>. The key gives the ability to access related data together.

Column

A column in Cassandra is a data structure which contains a column name, a value, and a timestamp. The columns and the number of columns in each row may vary in contrast with a relational database where data are well structued.

When and when not to use

Use

Need scalability to store massive amounts of data (> 1TB).
Need scalability for read/write intensive application (>50,000 IOPS).
Require High Availability/Disaster Recovery (HA/DR)) characteristics such as hot/hot deployments or global replication.
Functional requirements include specialized use cases such as temporal/time-series of flexible schema.

Don’t Use

As a replacement for a relational database in the cases where relational data model is the most effective.
If your dataset is highly normalized, and you have frequent dynamic reporting requirements across tables.