Cassandra
Apache Cassandra is an open source NoSQL distributed database trusted by thousands of companies for scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data.
Key Concepts
Cluster
A cluster, in Cassandra, is a collection of nodes or “Data Centers” arranged in a ring architecture. A name must be assigned to every cluster, which will subsequently be used by the particpating nodes.
Keyspace
If you know about relational databasees, then the schema is the respective keyspace in Cassandra. The keyspace is the outermost container for data in Cassandra. The main attributes to set per keyspace are the Replication Factor, the Replica Placement Strategy, and the Column Families.
Column Family
Column families in Cassandra are like tables in traditional relational
databases. Each column family contains a collection of rows which are
represented by a Map<RowKey, SortedMap<ColumnKey, ColumnValue>>. The key
gives the ability to access related data together.
Column
A column in Cassandra is a data structure which contains a column name, a value, and a timestamp. The columns and the number of columns in each row may vary in contrast with a relational database where data are well structued.
When and when not to use
Use
- Need scalability to store massive amounts of data (> 1TB).
- Need scalability for read/write intensive application (>50,000 IOPS).
- Require High Availability/Disaster Recovery (HA/DR)) characteristics such as hot/hot deployments or global replication.
- Functional requirements include specialized use cases such as temporal/time-series of flexible schema.
Don’t Use
- As a replacement for a relational database in the cases where relational data model is the most effective.
- If your dataset is highly normalized, and you have frequent dynamic reporting requirements across tables.