Getting Started with Apache Superset


2021/03/01

Eric


What is Apache Superset?

Apache Superset is one of the hottest free / open-source business intelligence tools available. Superset makes it very easy to quickly explore data at petabyte scale.

Lets Get Started!

The documentation is rather frustrating deploy Superset. It is much faster / easier to deploy an image directly from the AWS Marketplace. Apache Superset Certified by RO Labs, provides an ease of deployment and contains several of the most popular SQL Alchemy data connectors.

Quick Start Steps

  1. Navigate to the Apache Superset Certified by RO Labs
  2. Click the Continue to Subscribe button, accept the terms.
  3. Accept the default Configuration and Click Continue to Launch
  4. On the next configuration page under Security Settings select Create New Based On Seller Settings.
  5. Select Launch to kick off the creation of the AWS Image. Follow the link to your EC2 Console (I launched my image in AWS East Region which can be found here)
  6. The running instance is listed in the EC2 Console give it 30 seconds or so to get started. After navigate to http://<your ip here> to pull up the superset login page.
  7. To login use the username is admin with the password being your instance-id.
    Logging into Superset

    Logging into Superset

Note: If you have any issues with launching the image a fairly indepth guide is available on the RO Labs website

Setting up a Amazon RDS Database

We will need data to create our Superset Charts, if you have a database handy great! If not instructions for setting up a Amazon RDS Database instance can be found in the AWS documentation. I chose MySQL as its what I am comfortable with, but Superset has many different database connectors. One thing to consider, if you do not set Public Access to Yes make sure that RDS is in the same VPC as Apache Superset. Using your favorite program build out a table structure and load data. I played around with data found in this reddit post

Database Options

Database Options

  1. First navigate to Amazon RDS.
  2. Under the Create database section click the Create database button.
  3. Use the following Configuration, and then hit the Create Database Button
    • Engine Type: MySQL
    • Templates: Free Tier and give it a unique name and password.
    • Public Access: Yes
  4. Give it a few moments to provision then it should be listed under Amazon RDS Databases. From here you can click the DB instance and find the endpoint URL.

Creating a Table and Loading Data

If you do not have a data source handy to perform a little testing you can follow this section to load sample data.

  1. Using your favorite database management tool connect to the database.
  2. Create a database
  3. Use the table definition below to create a table:
-- superset.CPIvsPopulationByDate definition

CREATE TABLE `CPIvsPopulationByDate` (
  `DATE` date DEFAULT NULL,
  `CPI` double DEFAULT NULL,
  `Population` int DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
  1. Download sample INSERT statements and apply them to populate table.

Designing Our First Chart

In this section we will connect to a database using Apache Superset. If you dont have a database handy check out the steps below to create one in Amazon RDS quickly.

Getting around in Superset

Navigating to add a database

Navigating to add a database

After logging into Superset you are presented with a blank table. We first must add our database connection. To do so on the menu select Data -> Database. Then click the Add Database button.

Adding a Database

Adding a Database

Adding a Database into Superset

Enter the Database identifier, and the SQL Alchemy Connection string. That is in the format mysql://<username>:<password>@<database url>/<database name> so as an example to connect to a SQL database that would be mysql://admin:password@database-2.awsurl.us-east-1.rds.amazonaws.com/superset

Database Name Connection String
Amazon Redshift redshift+psycopg2://:@:5439/
Apache Druid druid://:@:/druid/v2/sql
Apache Hive hive://hive@{hostname}:{port}/{database}
Apache Impala impala://{hostname}:{port}/{database}
Apache Spark SQL hive://hive@{hostname}:{port}/{database}
Azure MS SQL mssql+pymssql://UserName@presetSQL: TestPassword@presetSQL. database.windows.net:1433/TestSchema
MySQL mysql://:@/
PostgreSQL postgresql://:@/
Presto presto://
SQL Server mssql://
Teradata teradata://{user}:{password}@{host}

Before clicking the Add button, click the Test Connection button to verify that Superset can reach your database.

Creating a Dataset

Adding a Dataset

Adding a Dataset

After adding a database next we must add a dataset, which corresponds to a table. On the menubar click the Dataset button, followed by the + Dataset button in the top right corner. For your dataset select the options based on a database, schema, and table.

After adding the dataset – we are all set to create our first chart!

Creating a Chart

Creating a new Chart

Creating a new Chart

A Chart can be created by going to the Charts Menu item. Select the + Chart button, and select your dataset, and chart type. Click the Create New Chart button to continue. Note: Your dataset will need a date/time column as the charts rely on time-series data.
Simple Line Chart

Simple Line Chart Example

Depending on your dataset you may have to adjust the time range to ensure data is displayed, you can do so by clicking the gray button in the red box. Click the Run button after making any changes to update the chart.

Superset features many different types of charts with the ability to aggregate data based on count, average, sum, etc. You can adjust the primary data used by clicking the Metrics button and selecting the column along with the aggregation type.

Dual Axis Chart

Dual Axis Chart Example