Introduction to Keyspaces by AWS

A fully managed database solution by AWS with the claim that the database is Apache Cassandra–compatible database. Cassandra is very well known for its speed of reading and best performance at high levels of horizontal scalability. So, with the expertise of AWS’s infrastructure management and Cassandra’s high performance at scale, one can be assured of not hitting the performance bottleneck from the database side at least. I call this database as hybrid DB, as it even though has ability provided by document databases to make select queries on basis of any key within one document value, it provides them in a very limited form, as “ALLOW FILTERING” comes into play, it is mainly designed for searches based on the id or first partition key.
So, before we begin digging deeper into it further, there are few terms that we should acquaint ourselves to first.
- Keyspace: As I guess most of us must be aware of schema in relational databases, this is closest to anything it comes, A keyspace in Cassandra is a namespace that defines data replication on nodes. A cluster contains one keyspace per node, since keyspaces is nothing but the name of the managed Cassandra by AWS, this applies to its keyspace too.
- Partition Key: One of the most important topics to understand, if not the most important. It is a column on which a row of data is selectable in a table within a keyspace. It is responsible for data distribution across your nodes. So selecting a correct partition key is paramount for performant db modelling.
- Clustering Key: This key along with the partition key is important for select queries to be performant, if there is a column which is needed for sorting the queried resultset, then it should be part of the clustering key.
- Consistency Level: Since AWS Keyspaces, aka Cassandra, is a distributed system, there are multiple levels of data consistency which can be achieved, obviously, higher the consistency, lower the performance. Hence Keyspaces, unlike Cassandra in its original form, supports only a few of them.
- ONE (READ ONLY OPERATIONS)
- LOCAL ONE (READ ONLY OPERATIONS)
- LOCAL QUORUM (BOTH READ AND WRITE)
For details please refer the following link
https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlConfigConsistency.html
- Lightweight transactions: As a non-relational database, Cassandra does not support joins or foreign keys, and consequently does not offer consistency in the ACID sense. For example, when moving money from account A to B the total in the accounts does not change. Cassandra supports atomicity and isolation at the row-level, but trades transactional isolation and atomicity for high availability and fast write performance. Cassandra writes are durable. For further detail please read the following blog, the above extract is from this only.
https://docs.datastax.com/en/cassandra-oss/2.2/cassandra/dml/dmlTransactionsDiffer.html
Starting with Keyspaces
Starting with Keyspaces needs the following steps to be completed beforehand
- Getting an AWS account, though understood well but I thought in keeping the record straight.
- Configuring AWS CLI on your system, details can be found at
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html
- Connecting Keyspaces programmatically, I am straight away suggesting this as keyspaces will have to be used in a system, or software application that you must be using or developing
Details of which can be found at https://docs.aws.amazon.com/keyspaces/latest/devguide/programmatic.html
Keyspaces features
- Very high scalability at realtime.
- Performance of single digit millisecond search guaranteed.
- Very low cost, as you pay as you query(read or write), also no DevOps needed.
- Reliability as of any managed solution by AWS.
Keyspaces data modelling
Like any other database how it will perform will depend on how the data is modelled. In keyspaces, like Cassandra, there are two goals that one needs to target to make most from your data
- Spread data evenly around the cluster
- Minimize the number of partitions read
To achieve the above goal there are few things that can be done
Step 1: Determine What Queries to Support by your Application
Step 2: Try to create a table where you can satisfy your query by reading (roughly) one partition
Above goals and steps are taken from the link below, don’t forget to read before starting.
https://www.datastax.com/blog/basic-rules-cassandra-data-modeling
Keyspaces is not a panacea
Like every technology, Keyspaces has its set of constraints which might make it not so useful for scenarios such as mentioned below
- If you are having data which might not need huge scaling
- If you don’t need super fast write/read operations
- If you need to use aggregations on your data
- If you are not clear with the possible queries that can be fired on your data set in the future or near future
- If you need transactions and a very high level of data consistency.