Apache Cassandra on AWS
Partner Solution Deployment Guide

August 2023
Raks Krishna, DataStax
Tony Vattathil, AWS Integration & Automation team

Refer to the GitHub repository to view source files, report bugs, submit feature ideas, and post feedback about this Partner Solution. To comment on the documentation, refer to Feedback. |
This Partner Solution was created by DataStax in collaboration with Amazon Web Services (AWS). Partner Solutions are automated reference deployments that help people deploy popular technologies on AWS according to AWS best practices. If you’re unfamiliar with AWS Partner Solutions, refer to the AWS Partner Solution General Information Guide.
Overview
This guide covers the information you need to deploy the Apache Cassandra Partner Solution in the AWS Cloud.
As Cassandra adoption grows within your organization, so could the challenges involved with using, maintaining, and supporting the technology. These challenges can add considerable cost, complexity, and administrative burden. Cassandra addresses these challenges by streamlining operations and controlling costs for all your Cassandra workloads.
This Partner Solution reference deployment guide provides step-by-step instructions for deploying Apache Cassandra 4.0 on the AWS Cloud. Cassandra is an open-source, scalable, active-everywhere NoSQL database used by the internet’s largest applications. Cassandra is the only NoSQL database with a masterless architecture enabling zero downtime, zero lock-in, and global scale for data sovereignty.
Cassandra for AWS gives users and enterprises a deployment process based on cloud best practices to quickly install an Cassandra cluster on AWS in a single Region across multiple zones.
Each cluster also includes a virtual machine (VM) that provides a complete set of development resources, including code examples, documentation, and data integration tools.
Optionally, DataStax Luna provides subscription-based support for open-source Cassandra on AWS. DataStax Luna subscribers get the benefits of open-source software and direct access to the engineers who authored most of Cassandra’s code and provide support for some of the largest Cassandra deployments.
For details about Cassandra support, see DataStax Luna.
Costs and licenses
There is no cost to use this Partner Solution, but you will be billed for any AWS services or resources that this Partner Solution deploys. For more information, refer to the AWS Partner Solution General Information Guide.
Architecture
Deploying this Partner Solution with default parameters builds the following Apache Cassandra environment in the AWS Cloud.

As shown in Figure 1, the Quick Start sets up the following:
-
A highly available architecture that spans three Availability Zones.*
-
A virtual private network (VPN) configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*
-
In the public subnets:
-
Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.*
-
A DevOps instance where you can use tools to help manage the cluster.
-
-
In the private subnets:
-
Amazon Elastic Compute Cloud (Amazon EC2) seed nodes used to bootstrap the gossip process for new nodes joining a cluster. These nodes are the inital Amazon EC2 instances that are outside of the Amazon EC2 Auto Scaling group that assists in bootstraping the cluster.
-
Cassandra non-seed nodes (starting with the fourth node onwards) that are part of the Amazon EC2 Auto Scaling group. Note that making every node a seed node is not recommended because of increased maintenance and reduced gossip performance. Gossip optimization is not critical, but a small seed list is recommended nonetheless.
-
Amazon EC2 Auto Scaling group used for scaling {partner-product-name-short} nodes in the private subnets based on workload demand.
-
-
An Amazon Simple Storage Service (Amazon S3) bucket for storing the AWS CloudFormation templates and scripts.
* The template that deploys this Partner Solution into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.
Deployment options
This Partner Solution provides the following deployment options:
-
Deploy Apache Cassandra into a new VPC. This option builds a new AWS environment that consists of the VPC, subnets, NAT gateways, security groups, bastion hosts, and other infrastructure components. It then deploys Apache Cassandra into this new VPC.
-
Deploy Apache Cassandra into an existing VPC. This option provisions Apache Cassandra in your existing AWS infrastructure.
This Partner Solution provides separate templates for these options. It also lets you configure Classless Inter-Domain Routing (CIDR) blocks, instance types, and Apache Cassandra settings.
Predeployment steps
Prepare your AWS account
If you don’t already have an AWS account, create one at https://aws.amazon.com by following the on-screen instructions. Part of the sign-up process involves receiving a phone call and entering a PIN using the phone keypad. Use the region selector in the navigation bar to choose the AWS Region where you want to deploy the Quick Start on AWS.

Choosing an AWS Region
Choose a Region closest to your data center or corporate network to reduce network latency between systems running on AWS and the systems and users on your corporate network. |
Also, note that your choice of Region determines whether the Quick Start deploys network address translation (NAT) gateways or NAT instances for network connections. For a list of Regions that support NAT gateways, see Amazon VPC pricing.
Create a key pair in your preferred Region. To do this, in the navigation pane of the Amazon EC2 console, choose Key Pairs, Create Key Pair, type a name, and then choose Create.

Creating a key pair
Amazon EC2 uses public-key cryptography to encrypt and decrypt login information. To log in to your instances, you must create a key pair. With Windows instances, a key pair is used to obtain the administrator password through the Amazon EC2 console and then log in using Remote Desktop Protocol (RDP), as explained in the Amazon Elastic Compute Cloud User Guide. When using Linux, a key pair is used to authenticate Secure Shell (SSH) login.
Prepare your DataStax account
(Optional) Create a free account in DataStax Academy for Apache Cassandra certification resources, courses, and role-based learning paths.
Deployment steps
-
Sign in to your AWS account, and launch this Partner Solution, as described under Deployment options. The AWS CloudFormation console opens with a prepopulated template.
-
Choose the correct AWS Region, and then choose Next.
-
On the Create stack page, keep the default setting for the template URL, and then choose Next.
-
On the Specify stack details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary. When you finish reviewing and customizing the parameters, choose Next.
Unless you’re customizing the Partner Solution templates or are instructed otherwise in this guide’s Predeployment section, don’t change the default settings for the following parameters: QSS3BucketName
,QSS3BucketRegion
, andQSS3KeyPrefix
. Changing the values of these parameters will modify code references that point to the Amazon Simple Storage Service (Amazon S3) bucket name and key prefix. For more information, refer to the AWS Partner Solutions Contributor’s Guide. -
On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you finish, choose Next.
-
On the Review page, review and confirm the template settings. Under Capabilities, select all of the check boxes to acknowledge that the template creates AWS Identity and Access Management (IAM) resources that might require the ability to automatically expand macros.
-
Choose Create stack. The stack takes about 15-20 minutes to deploy.
-
Monitor the stack’s status, and when the status is CREATE_COMPLETE, the Apache Cassandra deployment is ready.
-
To view the created resources, choose the Outputs tab.

Postdeployment steps
Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory. You can take a snapshot of all keyspaces, a single keyspace, or a single table while the system is online. For more information about backing up and storing data, see Backups on the Cassandra documentation website. For information about storing backups in Amazon S3, see Batch upload files to the cloud.
Test the deployment
After you deploy the Cassandra cluster, the fastest way to begin using the cluster is to use SSH to connect to the DevOps instance and then to one of the node instances. You can use SSH agent forwarding using the key pair (replacing the KEY_FILE and DevIpAddress values for those of your cluster).
You can get the IP addresses of the nodes from the Output tab of the stack.

Stack Output
ssh -i $KEY_FILE ubuntu@$DevIpAddress
You can get the Seed1PrivateIpAddress from the Output tab of the stack.

Seed1 IP
Once logged in to the DevOps instance, run the following command:
ssh -i $KEY_FILE ubuntu@$Seed1PrivateIpAddress
If you chose to create the cluster in the public subnet, you can skip the steps above and SSH in to one of the nodes with the public IP address in the Output tab.
Then you can view the status of the Cassandra cluster:
~$ nodetool status
For a 6-node cluster, the nodetool status should be:

NodeTool Status
The developer resource website is accessible at Dev Url value in the Output tab.
Best practices for using Apache Cassandra on AWS
See the following resources:
Troubleshooting
For troubleshooting common Partner Solution issues, refer to the AWS Partner Solution General Information Guide and Troubleshooting CloudFormation.
Other useful information
AWS services
-
AWS CloudFormation AWS CloudFormation Documentation
-
Amazon EC2 What is Amazon EC2?
-
Amazon VPC Amazon Virtual Private Cloud Documentation
Apache Cassandra
Customer responsibility
After you deploy a Partner Solution, confirm that your resources and services are updated and configured—including any required patches—to meet your security and other needs. For more information, refer to the Shared Responsibility Model.
Feedback
To submit feature ideas and report bugs, use the Issues section of the GitHub repository for this Partner Solution. To submit code, refer to the Partner Solution Contributor’s Guide. To submit feedback on this deployment guide, use the following GitHub links:
Notices
This document is provided for informational purposes only. It represents current AWS product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS products or services, each of which is provided "as is" without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.
The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at https://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "as is" basis, without warranties or conditions of any kind, either expressed or implied. Refer to the License for specific language governing permissions and limitations.