Apache Superset on AWS

Partner Solution Deployment Guide

QS

August 2023
Ke Yi, AWS Great China Region Solution team
Troy Ameigh, AWS Integration & Automation team

Refer to the GitHub repository to view source files, report bugs, submit feature ideas, and post feedback about this Partner Solution. To comment on the documentation, refer to Feedback.

This Partner Solution was created by Apache in collaboration with Amazon Web Services (AWS). Partner Solutions are automated reference deployments that help people deploy popular technologies on AWS according to AWS best practices. If you’re unfamiliar with AWS Partner Solutions, refer to the AWS Partner Solution General Information Guide.

Overview

This guide covers the information you need to deploy the Apache Superset Partner Solution in the AWS Cloud.

This deployment supports many database management systems to store data that can be visualized through Apache Superset. Examples of supported database systems include Amazon Athena, Amazon Redshift, Amazon DynamoDB, ClickHouse, MySQL, and PostgreSQL.

This Partner Solution is for users who want to use Apache Superset as a business intelligence platform for transitioning from data-driven cognition to data-driven decision making, following AWS best practices.

Inbound client traffic redirects through an Application Load Balancer to an Amazon Elastic Container Service (Amazon ECS) cluster. Amazon ECS is the core service for all Superset modules, including the core business framework, cache, database, and message queue. Each module runs separately as a single Amazon ECS instance and relaunches if any tasks fail.

Discovery of all Amazon ECS services is handled by AWS Cloud Map through an internal, private DNS. Outbound traffic, such as for software updates, connects through network address translation (NAT) gateways. Persistent data, which includes queried data and system metadata, is stored in an Elastic File System according to security and cost considerations.

Compared to the community Apache version, Superset provides the following:

  • Core modules (that is, Superset, cache, and database) that are highly available.

  • Business data (that is, metadata, query data, and interactive data) persistence.

  • Platform elasticity and scalability. Users do not need to maintain the infrastructure and associated resource scheduling.

  • Visualizations for existing data, preinstalled SQL, PostgreSQL, Amazon Redshift, Amazon Athena, Amazon DynamoDB, and ClickHouse data-source driver.

  • Future trend predictions, a preinstalled timing-prediction algorithm that is based on imported data.

  • Visual Kanban for real-time service metrics and comprehensive application monitoring.

Costs and licenses

There is no cost to use this Partner Solution, but you will be billed for any AWS services or resources that this Partner Solution deploys. For more information, refer to the AWS Partner Solution General Information Guide.

Architecture

Deploying this Partner Solution with default parameters builds the following Apache Superset environment in the AWS Cloud.

Architecture
Figure 1. Partner Solution architecture for Apache Superset on AWS

As shown in Figure 1, the Quick Start sets up the following:

  • A highly available architecture that spans two Availability Zones.*

  • A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*

  • A managed internet gateway to direct inbound traffic to a Network Load Balancer, which manages traffic to the AWS Fargate cluster.

  • In the public subnets, managed network address translation (NAT) gateways to provide outbound internet access for resources in the private subnets.*

  • In the private subnets:

    • An Amazon ECS cluster using AWS Fargate to provide all Superset functions, including the core system, cache, database, message queue, and frontend.

    • Amazon EFS to provide metadata, query cached-data persistency, and share service modules.

    • AWS Cloud Map, a discovery service for application resources.

    • AWS Secrets Manager gerete and store a key as "SECRET_KEY" for securely signing the session cookie and encrypting sensitive information on the database.

  • Amazon Athena, a serverless, interactive service to query data and analyze big data in Amazon S3 using standard SQL.

  • Amazon Redshift, a fully managed, data-warehouse service.

  • Supported database systems, such as Amazon Athena, Amazon Redshift, Amazon DynamoDB, ClickHouse, MySQL, and PostgreSQL.

* The template that deploys this Partner Solution into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.

Deployment options

This Partner Solution provides the following deployment options:

This Partner Solution provides separate templates for these options. It also lets you configure Classless Inter-Domain Routing (CIDR) blocks, instance types, and Apache Superset settings.

Predeployment steps

  • For initial exploration, use the following default settings:

    Launch Options

Deployment steps

  1. Sign in to your AWS account, and launch this Partner Solution, as described under Deployment options. The AWS CloudFormation console opens with a prepopulated template.

  2. Choose the correct AWS Region, and then choose Next.

  3. On the Create stack page, keep the default setting for the template URL, and then choose Next.

  4. On the Specify stack details page, change the stack name if needed. Review the parameters for the template. Provide values for the parameters that require input. For all other parameters, review the default settings and customize them as necessary. When you finish reviewing and customizing the parameters, choose Next.

    Unless you’re customizing the Partner Solution templates or are instructed otherwise in this guide’s Predeployment section, don’t change the default settings for the following parameters: QSS3BucketName, QSS3BucketRegion, and QSS3KeyPrefix. Changing the values of these parameters will modify code references that point to the Amazon Simple Storage Service (Amazon S3) bucket name and key prefix. For more information, refer to the AWS Partner Solutions Contributor’s Guide.
  5. On the Configure stack options page, you can specify tags (key-value pairs) for resources in your stack and set advanced options. When you finish, choose Next.

  6. On the Review page, review and confirm the template settings. Under Capabilities, select all of the check boxes to acknowledge that the template creates AWS Identity and Access Management (IAM) resources that might require the ability to automatically expand macros.

  7. Choose Create stack. The stack takes about 30 minutes to deploy.

  8. Monitor the stack’s status, and when the status is CREATE_COMPLETE, the Apache Superset deployment is ready.

  9. To view the created resources, choose the Outputs tab.

Postdeployment steps

  • After the stack is deployed successfully, check the outputs for the Superset dashboard address.

    Result Output
  • Log in to the Superset dashboard using your preconfigured user name and password.

    Login Page

Troubleshooting

For troubleshooting common Partner Solution issues, refer to the AWS Partner Solution General Information Guide and Troubleshooting CloudFormation.

Best practices for using Apache Superset on AWS

If you choose Yes for the WithExample option, you can explore a sample dashboard with a predefined dataset after you log in to the Superset console. Otherwise, you must connect to your datasource (for example, Amazon Redshift or Amazon S3), customize your data, and create a dashboard from the provided visualization plugins. For more information, see the Superset documentation.

Security

Apache Superset provides a granular security model through users, roles, and customized permissions. You can associate multiple roles that comprise sets of permissions to access different Superset resources, including Model & Action, Views, Data Sources, and Databases. By creating filters assigned to a particular table, the granularity of data access is row level, which means you can specify rows that meet only certain conditions for your users. For more information, see Superset Security.

Customer responsibility

After you deploy a Partner Solution, confirm that your resources and services are updated and configured—including any required patches—to meet your security and other needs. For more information, refer to the Shared Responsibility Model.

Feedback

To submit feature ideas and report bugs, use the Issues section of the GitHub repository for this Partner Solution. To submit code, refer to the Partner Solution Contributor’s Guide. To submit feedback on this deployment guide, use the following GitHub links:

Notices

This document is provided for informational purposes only. It represents current AWS product offerings and practices as of the date of issue of this document, which are subject to change without notice. Customers are responsible for making their own independent assessment of the information in this document and any use of AWS products or services, each of which is provided "as is" without warranty of any kind, whether expressed or implied. This document does not create any warranties, representations, contractual commitments, conditions, or assurances from AWS, its affiliates, suppliers, or licensors. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

The software included with this paper is licensed under the Apache License, version 2.0 (the "License"). You may not use this file except in compliance with the License. A copy of the License is located at https://aws.amazon.com/apache2.0/ or in the accompanying "license" file. This code is distributed on an "as is" basis, without warranties or conditions of any kind, either expressed or implied. Refer to the License for specific language governing permissions and limitations.