Cassandra
Cassandra overview
Cassandra is an open source distributed database management system with high scalability and availability.
Cassandra was originally developed at Facebook to store large amounts of data, and its source code was released in 2008.
After Facebook adopted HBase, a separate product modeled on Google’s Bigtable, the number of companies adopting Cassandra was limited for a while. Recently, however, companies such as Apple and Netflix have adopted Cassandra for their large-scale systems from the perspective of availability, so it is attracting attention again.
Because Cassandra is an open source project, the source code can be obtained and used in production for free. There is also a commercial product edition from DataStax that provides tested support.
The major release 4.0.0 was released in July 2021.
This major release advanced features as follows.
- Added experimental support for Java 11 and transient replication
- Added Virtual Tables, which publish metrics and YAML configuration information
- Added audit logging useful for compliance and debugging
- Added Full Query Logging (FQL) for live traffic capture and replay
- Optimized the internode messaging protocol
- Improved streaming used to exchange data between cluster nodes
Main features of Cassandra
The main features of Cassandra are as follows.
Masterless architecture
Most database cluster architectures use a master-slave method, but Cassandra uses a masterless method.
In this method, each node is equivalent, and there is no master that supervises the nodes. Therefore, there is no single point of failure. Also, depending on replication settings, data registered on each node is automatically propagated to other nodes, enabling high availability.
Schema definitions exist
Unlike document DBs such as MongoDB and key-value stores such as Redis, Cassandra has schema definitions. This makes it easier for developers and operators to understand data contents during application development and operation. Also, unlike RDBs, Cassandra does not completely depend on schemas and can have somewhat flexible table structures, such as inserting multiple values into one column.
Performance scales linearly
Cassandra can scale processing by adding nodes.
Cassandra officially claims that performance scales linearly, and a benchmark conducted by Netflix supports this.
This scaling performance is thought to be one reason Cassandra is adopted in large-scale systems such as Apple’s iCloud.
SQL-like queries available (CQL)
Cassandra can be operated with SQL-like queries called CQL (Cassandra Query Language).
Basic queries such as SELECT, UPDATE, and DELETE can be executed almost exactly like SQL. GROUP BY has been implemented and supported since version 3.4.3. However, aggregations such as JOIN are not implemented.
There are also restrictions on functions and sorting such as ORDER BY, so care is required.
Cassandra license
Cassandra is released under the Apache License version 2, and anyone can freely use, modify, and redistribute it for free, regardless of commercial or non-commercial use.
Cassandra operating environment
Cassandra is a Java application.
The latest version of Oracle Java Standard Edition 8 or OpenJDK 8 is required. Java 11 is still experimental support and is not recommended for production use.
To use cqlsh, the latest version of Python 3.6 or later is required.
Cassandra download
http://cassandra.apache.org/download/