Tutorial: Data modeling with datasets

Data teams often create datasets to enable data exploration by business users, but datasets can be created for many reasons, including:

  • Joining and flattening dimensional models
  • Masking sensitive data
  • Materializing data to improve performance
  • Ensuring common calculations and Key Performance Indicators (KPIs) are consistent

This step-by-step guide walks you through creating a dataset that flattens out a dimensional model.

Requirements

  • You must be assigned the Creator account type or have a custom account type that includes modeling permissions.

Create Datasets

  1. At the top of the home page, click Create New and select Dataset.
    Create new dataset.
  2. On the Select a Data Source page, choose the Table option.
    From the Data Source option, select Table.
  3. Find the Connections section in the In the left panel and select Sigma Sample Database.
  4. Locate the Examples database and the PLUGS_ELECTRONICS schema.
  5. Click on the F_POINT_OF_SALE table to see a preview of the data.
    Preview the table F_POINT_OF_SALE.
  6. Click Get Started at the top right corner of the page header.

Modeling Data

Create Calculations

Now that you have located the table from which to build your dataset, its time to add additional columns with calculations. Adding calculated columns in the dataset ensures consistency in the way metrics and KPIs are computed by your organization. Calculated columns are a valuable time-saver - as analysts do not need to add common calculations to their workbooks every time.

  1. Click the Worksheet tab at the top of the screen.
    Select the Worksheet tab of the interface.
  2. Click the caret icon (Caret icon) on the Sales Amount column.
  3. Click Add New Column.
    Add a new column to the dataset.
  4. In the formula bar, enter [Sales Amount] * [Sales Quantity].
    **Specify the formula definition in the formula bar.
  5. Click Enter on your keyboard.
  6. Double click the new column's name and rename it to Revenue
  7. Click the caret icon (“The) on the Revenue column.
  8. Select Format.
  9. Select Currency.
    In the column format specification, select Currency.png
  10. Repeat steps 1-5 to add another column called COGS (Cost of Goods Sold).
    • In the formula bar enter [Cost Amount] * [Sales Quantity].
    • Click Enter on your keyboard.
    • Click the caret icon (The caret icon)on the COGS column select Format
    • Select Currency.

Joining additional tables 

Let's join this dataset to additional tables to add more context to the dataset we created above.

Sigma supports left, right, full, and inner join types, and lookups. You may also join more than one column as well as define arbitrary functions within the join (e.g. truncate date); see Join types.

  1. With the dataset in Edit mode, click the database icon to open its tab. This is located in the right-side panel.
    Open the databse interface by clicking the database icon.

  2. Click on the plus icon (Add datasource icon, +) located below the database icon. This will open the Add Source page.
    Add the datasource to the new dataset.

  3. Select Table as data source type.
    For the datasource type, select table.

  4. On the left side of the page, under Connections, select Sigma Sample Database.

  5. Locate the Examples database and the PLUGS_ELECTRONICS schema.

  6. Click the F_SALES table to see a preview of the data.

  7. Click Next at the top right corner of the page header.

  8. Under Join Type, use the default Left Join.

  9. Set the left join key selector to the ORDER_NUMBER column.

  10. Set the right join key selector to the ORDER_NUMBER column.
    Set the right join key selector to the ORDER_NUMBER column.
    Note: Sigma populates these fields automatically if it finds identically-named columns in both tables.

  11. Click Done to go back to the dataset worksheet.

    Note: If you look in the left-side panel, you will see that the columns from F_SALES are present in the worksheet, and the F_SALES table is listed as a data source.

  12. Repeat steps 2 to 5 for each of the following tables:

    • D_STORE
      Join Key: STORE_KEY
      Columns: STORE_NAME, STORE_CITY, STORE_STATE, STORE_JSON_FIELD
    • D_CUSTOMER
      Join Key: CUST_KEY
      Columns: CUST_NAME, CUST_JSON_FIELD
Create joins with tables D_STORE and D_CUSTOMER.

Note: All columns added via joins will have the table name appended to the column in parentheses. Double click on column names to rename them. Move columns by dragging them to a new position in the worksheet.

Extracting JSON

Sigma gives you the ability to extract JSON.

  1. Click the caret icon () on the Cust Json Field (D_CUSTOMER) column.
  2. Select Extract Columns.
    Choose the action 'Extract columns'.
  3. On the Extract Fields modal, select the AGE_GROUP and LOYALTY_PROGRAM fields.
    Select the fields that represent the columns you plan to extract.png
  4. Click the caret icon (The caret icon) on the LOYALTY_PROGRAM
  5. Select Logical.
    Specify the Logical datatype for the extracted colummn.png
  6. Click the Confirm button.

Filter Your Data

Filtering datasets limits the volume of data available to analysts during data exploration.

Add Relative Date Filters

  1. Click the caret icon (The caret icon) on the [Date (F_SALES)].
  2. Select Add Filter.
  3. Select Date Range as the Filter Type.
  4. Click the Select Date Range field to bring up the date selector menu.
    Select the Date range filter type, and enter the value.
  5. On the left side of the modal, select Relative.
  6. Enter 1 in the text entry field.
  7. In the ... menu, select Weeks.
    Note:
    This filter ensures the data is always filtered for the most recent week.
    Complete the values in the date filter, the 'Relative' option.
  8. Click Save.

Add Text Filters

  1. Click the down arrow on the Store State column and select Add Filter.
  2. Click the Filter Values dropdown menu.
    19-select-filter-type-include.png
  3. Check the boxes for California, Texas, and Michigan.
    “Specify
  4. Click Save.

Link Tables

By defining links in datasets, users have to option to add joined data to their workbook. In addition, Sigma automatically links tables that have relational metadata (foreign keys) defined in Snowflake.

  1. Click the Links tab at the top of the screen.
    Click the 'Links' tab.
  2. Click Add links to other source.
    Click 'Add links to other sources'.
  3. Click Connections at the bottom of the left panel.
    Click the connections icon to open the connection picker.
  4. Select Sigma Sample Database.
  5. Locate the Examples and PLUGS_ELECTRONICS folders.
  6. Click on the D_PRODUCT table to see a preview of the data.
    Select table D_PRODUCT to link. Preview the data in the table.
  7. Click Next.
  8. Locate the Join Type selection and select Left Join.
  9. Set the left join key selector to PRODUCT KEY Link (F_POINT_OF_SALE).
  10. Set the right join key selector to PRODUCT KEY.
    “Set
  11. Click Save
  12. Click the Publish button at the top of the screen to save all your dataset changes.

Badging Datasets

Add badges to datasets to indicate whether the content is Endorsed, has a Warning, or has been Deprecated.  Optional badge notes can be used to provide additional context for all organization members.

Add or Update a Badge

  1. Click the information icon () in the page header. This will open the page's info popup.
  2. Select the Endorsed badge type from the dropdown list.

Setting Permissions & Sharing Datasets

Once your modeling is complete, the dataset can be shared. You may do so by direct share to a team of users or individuals or more broadly to a workspace. Once permissions have been granted, people may use the dataset as the basis for their analysis. See Data permissions.

Direct Share

  1. Click the Permissions tab at the top of the screen.
  2. Click Add Permission.
  3. Search for a member email address or a team name.
  4. Select the type of permission you would like to give the selected member or team.

    Note: users will be able to access the dataset from the “Shared with Me” tab on the home page.

Move to Workspace

  1. Click More menuMore in the dataset header.
  2. Click Move.
  3. Click the Workspaces tab.“Select
  4. Select the workspace to which you would like to move the dataset.
  5. Click Move.

Materialization

Materialization allows you to write datasets back to your warehouse as tables. These tables act as simplified versions of the complex SQL queries you build in Sigma and can improve the speed and performance of your downstream reports. Each table will be written and rewritten to your warehouse on the schedule that you set. Sigma writes a view on top of the table so that other apps outside of Sigma can access the prepared/transformed data that Sigma creates.

Requirements

  • Write access must be enabled on your dataset’s connection.
  • You must be an organization Admin.

Adding Materialization to Datasets

  1. Click the Materialization tab at the top of the screen.
  2. Click Create Schedule.
  3. Select the frequency to materialize the data.
    Specify the materialization schedule: periodicity and further details.
  4. Click Save.

After you save the materialization schedule, Sigma runs the first materialization.

View all scheduled materializations on the dataset's Materialization tab.