Data Modeling Quick Start: Hands On Lab

Datasets are often created by data teams in order to enable business users' data exploration. Datasets may be created for many reasons, including:

  • Joining and flattening dimensional models
  • Masking sensitive data
  • Materializing data to improve performance
  • Ensuring common calculations and Key Performance Indicators (KPIs) are consistent

This step-by-step guide will walk you through creating a dataset that flattens out a dimensional model.

Summary of Content

Requirements
Creating Datasets
Modeling Data
      Pre-calculated Metrics/KPIs
      Joining Additional Tables
      Extracting JSON
      Filtering
      Linking Other Tables
      Badging/Endorsing Datasets
Permissions & Sharing
      Direct Share
      Move to Workspace
Materialization
      Requirements
      Adding Materialization to Datasets

Requirements

  • You must be assigned the Creator account type or have a custom account type that includes modeling permissions.

Create Datasets

  1. At the top of the home page, click Create New and select Dataset.01-home-page-new-menu.png
  2. On the Select a Data Source page, choose the Table option.
    02-data-source-picker.png
  3. Find the Connections section in the In the left panel and select Sigma Sample Database.
  4. Locate the Examples database and the PLUGS_ELECTRONICS schema.
  5. Click on the F_POINT_OF_SALE table to see a preview of the data.
    03-table-select.png
  6. Click Get Started at the top right corner of the page header.

Modeling Data

Create Calculations

Now that you have located the table from which to build your dataset, its time to add additional columns with calculations. Adding calculated columns in the dataset ensures consistency in the way metrics and KPIs are computed by your organization. Calculated columns are a valuable time-saver - as analysts do not need to add common calculations to their workbooks every time. 

  1. Click the Worksheet tab at the top of the screen.
    04-worksheet-tab.png
  2. Click the caret icon (caret.png) on the Sales Amount column.
  3. Click Add New Column.
    05-add-new-column.png
  4. In the formula bar, enter [Sales Amount] * [Sales Quantity].
    06-enter-formula.png
  5. Click Enter on your keyboard.
  6. Double click the new column's name and rename it to Revenue
  7. Click the caret icon (caret.png) on the Revenue column.
  8. Select Format.
  9. Select Currency.
    07-format-currency.png
  10. Repeat steps 1-5 to add another column called COGS (Cost of Goods Sold).
    • In the formula bar enter [Cost Amount] * [Sales Quantity].
    • Click Enter on your keyboard.
    • Click the caret icon (caret.png)on the COGS column select Format
    • Select Currency.

Joining Additional Tables 

Let's join this dataset to additional tables to add more context to the dataset we created above.

Note: Sigma supports left, right, full, and inner join types, plus lookups. You may also join more than one column as well as define arbitrary functions within the join (e.g. truncate date). Learn more.

  1. Click to open the database icon, located in the right-side panel.
    08-database-icon.png
  2. Click on the plus icon (add-datasource.png) located below the database icon. This will open the Add Source page.
    09-add-data-source.png
  3. Select Table as data source type.
    10-select-table-for-join.png
  4. On the left side of the page, under Connections, select Sigma Sample Database.
  5. Locate the Examples database and the PLUGS_ELECTRONICS schema
  6. Click on the F_SALES table to see a preview of the data.
  7. Click Next at the top right corner of the page header.
  8. Under Join Type, use the default Left Join.
  9. Set the left join key selector to ORDER_NUMBER
  10. Set the right join key selector to ORDER_NUMBER
    11-join-type.png
    Note: These two selectors are usually pre-selected if the same field exists in both tables.
  11. Click Done to go back to the dataset worksheet.
    Note: If you look in the left-side panel, you will see that the columns from F_SALES are present in the worksheet, and the F_SALES table is listed as a data source.
  12. Repeat steps 2 to 5 for each of the following tables:
    • D_STORE
      Join Key - STORE_KEY
      Columns -STORE_NAME, STORE_CITY, STORE_STATE, STORE_JSON_FIELD
    • D_CUSTOMER
      Join Key - CUST_KEY
      Columns - CUST_NAME, CUST_JSON_FIELD

12-4-joins.png

Note: All columns added via joins will have the table name appended to the column in parentheses. Double click on column names to rename them. Move columns by dragging them to a new position in the worksheet. 

 

Extracting JSON

Sigma gives you the ability to extract JSON.

  1. Click the caret icon (caret.png) on the Cust Json Field (D_CUSTOMER) column.
  2. Select Extract Columns.
    13-extract-menu.png
  3. On the Extract Fields modal select the AGE_GROUP and LOYALTY_PROGRAM fields.
    14-extract-modal-1.png
  4. Click the caret icon (caret.png) on the LOYALTY_PROGRAM
  5. Select Logical.
    15-extract-modal-2.png
  6. Click the Confirm button.

Filter Your Data

Filtering datasets limits the volume of data available to analysts during data exploration.

Add Relative Date Filters

  1. Click the caret icon (caret.png) on the [Date (F_SALES)].
  2. Select Add Filter.
    16-add-filter.png
  3. Select Date Range as the Filter Type.
  4. Click the Select Date Range field to bring up the date selector menu.
    17-select-filter-type.png
  5. On the left side of the modal, select Relative.
  6. Enter 1 in the text entry field.
  7. In the ... dropdown, select Weeks.
    Note:
    This filter ensures the data is always filtered for the most recent week.

    18-relative-date-filter.png
  8. Click Save.

Add Text Filters

  1. Click the down arrow on the Store State column and select Add Filter.
  2. Click the Filter Values dropdown menu.
    19-select-filter-type-include.png
  3. Check the boxes for California, Texas, and Michigan.
    20-include-filter.png
  4. Click Save.

 

Link Tables

By defining links in datasets, users have to option to add joined data to their workbook. In addition, Sigma automatically links tables that have relational metadata (foreign keys) defined in Snowflake.

  1. Click the Links tab at the top of the screen.
    21-links-tab.png
  2. Click Add links to other source.
    22-add-links.png
  3. Click Connections at the bottom of the left panel.
    23-select-connection.png
  4. Select Sigma Sample Database.
  5. Locate the Examples and PLUGS_ELECTRONICS folders.
  6. Click on the D_PRODUCT table to see a preview of the data.
    24-select-link-table.png
  7. Click Next.
  8. Locate the Join Type selection and select Left Join.
  9. Set the left join key selector to PRODUCT KEY Link (F_POINT_OF_SALE).
  10. Set the right join key selector to PRODUCT KEY.
    25-select-join-keys-links.png
  11. Click Save
  12. Click the Publish button at the top of the screen to save all your dataset changes.

Badging Datasets

Add badges to datasets to indicate whether the content is Endorsed, has a Warning, or has been Deprecated.  Optional badge notes can be used to provide additional context for all organization members.

Add or Update a Badge

  1. Click the information icon (info-icon.png) in the page header. This will open the page's info popup.
  2. Select the Endorsed badge type from the dropdown list.
    26-badging.png

Setting Permissions & Sharing Datasets

Once your modeling is complete, the dataset can be shared. You may do so by direct share to a team of users or individuals or more broadly to a workspace. Once permissions have been granted, people may use the dataset as the basis for their analysis. Learn more.

Direct Share

  1. Click the Permissions tab at the top of the screen.
  2. Click Add Permission.
    27-add-permission.png
  3. Search for a member email address or a team name.
    28-search-for-team.png
  4. Select the type of permission you would like to give the selected member or team.
    29-grant-permission.png
    Note: users will be able to access the dataset from the “Shared with Me” tab on the home page.

Move to Workspace

  1. Click the ... (kebab.png) button in the dataset header.
  2. Click Move.
    30-move-dataset.png
  3. Click the Workspaces tab.31-select-workspace.png
  4. Select the workspace to which you would like to move the dataset.
    32-workspace.png
  5. Click Move

Materialization

Materialization allows you to write datasets back to your warehouse as tables. These tables act as simplified versions of the complex SQL queries you build in Sigma and can improve the speed and performance of your downstream reports. Each table will be written and rewritten to your warehouse on the schedule that you set. Sigma writes a view on top of the table so that other apps outside of Sigma can access the prepared/transformed data that Sigma creates.

Requirements

  • Write access must be enabled on your dataset’s connection.
  • You must be an organization Admin.

Adding Materialization to Datasets

  1. Click the Materialization tab at the top of the screen.
    33-materialization-tab.png
  2. Click Create Schedule.
    34-create-schedule.png
  3. Select the frequency to materialize the data.
    35-schedule.png
  4. Click Save.

Once saved, Sigma will run the first materialization. You can view all scheduled materializations on the dataset Materialization tab.

 

Related Resources

Data Modeling Best Practices
Annotating Tables
Dataset Lineage
Edit Warehouse Data from Sigma
Dataset Warehouse Views