Getting Started with Apache Superset
2021/03/01
Eric

What is Apache Superset?
Apache Superset is one of the hottest free / open-source business intelligence tools available. Superset makes it very easy to quickly explore data at petabyte scale.
Lets Get Started!
The documentation is rather frustrating deploy Superset. It is much faster / easier to deploy an image directly from the AWS Marketplace. Apache Superset Certified by RO Labs, provides an ease of deployment and contains several of the most popular SQL Alchemy data connectors.
Quick Start Steps
- Navigate to the Apache Superset Certified by RO Labs
- Click the
Continue to Subscribe
button, accept the terms. - Accept the default Configuration and Click
Continue to Launch
- On the next configuration page under
Security Settings
selectCreate New Based On Seller Settings
. - Select Launch to kick off the creation of the AWS Image. Follow the link to your
EC2 Console
(I launched my image in AWS East Region which can be found here) - The running instance is listed in the EC2 Console give it 30 seconds or so to get started. After navigate to
http://<your ip here>
to pull up the superset login page. - To login use the username is
admin
with the password being your instance-id.Logging into Superset
Note: If you have any issues with launching the image a fairly indepth guide is available on the RO Labs website
Setting up a Amazon RDS Database
We will need data to create our Superset Charts, if you have a database handy great! If not instructions for setting up a Amazon RDS Database instance can be found in the AWS documentation.
I chose MySQL as its what I am comfortable with, but Superset has many different database connectors. One thing to consider, if you do not set Database OptionsPublic Access
to Yes
make sure that RDS is in the same VPC as Apache Superset. Using your favorite program build out a table structure and load data. I played around with data found in this reddit post
- First navigate to Amazon RDS.
- Under the Create database section click the
Create database
button. - Use the following Configuration, and then hit the
Create Database
Button- Engine Type: MySQL
- Templates: Free Tier and give it a unique name and password.
- Public Access: Yes
- Give it a few moments to provision then it should be listed under Amazon RDS Databases. From here you can click the DB instance and find the endpoint URL.
Creating a Table and Loading Data
If you do not have a data source handy to perform a little testing you can follow this section to load sample data.
- Using your favorite database management tool connect to the database.
- Create a database
- Use the table definition below to create a table:
-- superset.CPIvsPopulationByDate definition
CREATE TABLE `CPIvsPopulationByDate` (
`DATE` date DEFAULT NULL,
`CPI` double DEFAULT NULL,
`Population` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
- Download sample INSERT statements and apply them to populate table.
Designing Our First Chart
In this section we will connect to a database using Apache Superset. If you dont have a database handy check out the steps below to create one in Amazon RDS quickly.
Getting around in Superset
Navigating to add a database
Adding a Database
Adding a Database into Supersetmysql://<username>:<password>@<database url>/<database name>
so as an example to connect to a SQL database that would be mysql://admin:password@database-2.awsurl.us-east-1.rds.amazonaws.com/superset
Database Name | Connection String |
---|---|
Amazon Redshift | redshift+psycopg2://:@:5439/ |
Apache Druid | druid://:@:/druid/v2/sql |
Apache Hive | hive://hive@{hostname}:{port}/{database} |
Apache Impala | impala://{hostname}:{port}/{database} |
Apache Spark SQL | hive://hive@{hostname}:{port}/{database} |
Azure MS SQL | mssql+pymssql://UserName@presetSQL: TestPassword@presetSQL. database.windows.net:1433/TestSchema |
MySQL | mysql://:@/ |
PostgreSQL | postgresql://:@/ |
Presto | presto:// |
SQL Server | mssql:// |
Teradata | teradata://{user}:{password}@{host} |
Before clicking the Add button, click the Test Connection button to verify that Superset can reach your database.
Creating a Dataset
Adding a Dataset
After adding the dataset – we are all set to create our first chart!
Creating a Chart
Creating a new Chart Simple Line Chart Example
Depending on your dataset you may have to adjust the time range to ensure data is displayed, you can do so by clicking the gray button in the red box. Click the Run button after making any changes to update the chart.
Superset features many different types of charts with the ability to aggregate data based on count, average, sum, etc. You can adjust the primary data used by clicking the Metrics button and selecting the column along with the aggregation type.

Dual Axis Chart Example