Migrate a dataset to a data model

You can migrate a dataset to a data model to take advantage of the capabilities that exist only in data models, such as the following:

  • Version tagging and materialization of tagged versions
  • Relationships to perform joins only as needed
  • Entity relationship diagram of data model contents and relationships

What is migrated from a dataset to a data model

When you migrate a dataset to a data model, Sigma creates a new data model with elements that correspond to the components of the published dataset. Any draft changes and warehouse views are not migrated. The following components of a dataset are automatically recreated in the data model:

Dataset componentRecreated in data model as
Dataset, including calculated columns and column descriptions.Table in the data model.
Joins across datasetsJoined tables in the data model
Links to other datasetsRelationships
Links from warehouse tablesNot migrated. To create a relationship from a warehouse table, some manual steps are required. See How links are migrated to relationships.
Column-level security (CLS) rulesCLS rules
Dataset parametersControls in the data model
FiltersFilters
MetricsMetrics
Warehouse viewsNot migrated. To create a warehouse view for the table in the data model created from the migrated dataset, see Create and manage warehouse views.
Materialization scheduleMaterialization schedule for the element.
🚧

If a materialization schedule is migrated, the materialization runs immediately when the data model is created.

How CLS rules are migrated

Column-level security (CLS) rules and column visibility rules are defined differently in a data model than a dataset. After migration, rules defined in a dataset are set up according to the following table:

Setting in datasetSetting in data model after migration
Column visibility is AvailableNo CLS rule is applied. Column is visible in Source columns but not visible by default.
Column visibility is RestrictedOne CLS rule is created set to No one can view
Column visibility is set to a user attributeTwo CLS rules are created, one rule with the user attribute set to 0 and one rule with the value set to 1 to allow users assigned a value of 0 or 1 to view the data in the column.

For more details about CLS in data models, see Configure column-level security.

How links are migrated to relationships

When you migrate a dataset with links to other datasets, all linked datasets are migrated to one data model. Each dataset is created as a table in the data model. If you choose to update references, all references to the datasets are updated in workbooks that use the datasets as sources.

📘

If you migrate a dataset with links to other datasets, but one of the links references the migrated dataset, the migration cannot be performed. Remove the link that references the dataset to perform the migration, or create a data model and migrate the dataset configuration manually.

For example, if you have a dataset with no links, the dataset is migrated to a new data model. The migration looks like the following:

graph LR
  A[Migrated Dataset] --> B[New Data model]

  style A fill:#FF9A74
  style B fill:#4CEC8C

If you migrate a dataset with links to two other datasets, the dataset and its linked datasets are migrated to one data model. The migration looks like the following:

graph TD
  A[Migrated Dataset] --> B[New Data model]
  A -->|Link| C[Dataset]
  A -->|Link| D[Dataset]
  C --> B
  D --> B

  style A fill:#FF9A74
  style B fill:#4CEC8C
  style C fill:#FF9A74
  style D fill:#FF9A74

If you migrate a dataset with links from a table in your data warehouse, those links are not recreated as relationships. Instead, manually migrate those links by adding the table to the created data model, then recreating the link as a relationship in the data model.

How parameters are migrated

Existing dataset parameters are migrated to control elements in the data model.

🚧

If a parameter in a migrated dataset targets a parameter in another dataset, both parameters are recreated as control elements, but the targets are removed. After the migration, refer to the existing dataset setup to manually update the target for each control element to the relevant table created from the dataset.

How references are updated

When you migrate a dataset to a data model, any documents that use the dataset or linked datasets as a source (reference the dataset) can be updated to instead reference the newly created data model and related columns. When you migrate a dataset to a data model, you can choose whether to update references or not.

If you choose to update references from the dataset when migrating to a data model, the following references are not updated:

  • Explorations that use the dataset. Instead, manually change the source of the exploration.
  • An unpublished draft of a document that use the dataset, if the published version of the document does not use the dataset. Instead, manually change the source of the relevant elements in the document.
  • Tagged document versions with a dataset as a source. Instead, tag the data model created by the migration, then manually re-tag and swap the source of the tagged workbook version.
  • Custom SQL statements that reference the dataset warehouse view. Instead, update the SQL statement to reference the data model instead of the dataset.

Requirements

  • You must be the dataset owner or be granted Can Edit access to the dataset
  • You must have the Create, edit, and publish data models permission enabled on your account type.

Migrate a dataset to a data model

To migrate a dataset to a data model:

  1. Open the dataset.

  2. Select the caret () to open the document menu, and then select Migrate to data model.

  3. In the Migrate to data model modal, review the details of what will be created when you migrate the dataset:

    • If your dataset is linked to other datasets, the linked datasets are listed.
    • If your dataset is used by other documents as a data source, the total number of references to be updated is listed.
  4. (Optional) If your dataset is used by other datasets, workbooks, or data models as a data source, Sigma can update the data sources of affected elements to use the new data model automatically. Select the checkbox for Update references from dataset to new data model.

    📘

    If you prefer to manually update references from relevant datasets to the new data model, leave the checkbox deselected.

    To manually update references, identify each affected document and change the source manually:

    1. For each dataset migrated to the data model, review the Lineage.
    2. For each document, follow the steps to change the source.
  5. Select Create.

    Sigma creates a data model in the same location and with the same name as the migrated dataset, with you as the owner. Any linked datasets are created as tables and named to match the relevant datasets.

    The created data model opens in the same browser window. Review the details to ensure that complex configurations migrated properly.

💡

After you migrate a dataset, you cannot migrate it again unless you delete the data model created when you first migrated the dataset.

Review migration status for all datasets in your organization

If you are assigned the Admin account type, you can review the datasets in your organization and the migration status for each one. Optionally migrate datasets from the list, or open a dataset and migrate the dataset from the document menu.

📘

You must be assigned the Admin account type to review the migration status for all datasets in your organization.

  1. Go to Home and select the user icon with your initials.
  2. Select Administration, then choose Dataset migration.

On the dataset migration page, you can perform different tasks:

  • Search, sort, and review the list of datasets in your organization. The following information is available:

    • Dataset lists the name of the dataset with a link to open the dataset.
    • Migration status lists whether the dataset has been migrated. Datasets marked Not required are not used as a data source (are not referenced by other documents) and likely do not need to be migrated to data models. SQL statements that reference datasets are not counted by the remaining references.
    • Migrated by lists the username of the user that selected the migration.
    • Migration date lists the time at which the migration was performed.
    • Remaining references lists the total number of references to the dataset by other documents, such as a workbook, data model, or another dataset. SQL statements that reference datasets are not counted by the remaining references.
    • Data model lists the data model created by the dataset migration.
  • As you review the list, you can search, sort, and filter the list depending on your use case:

    • Search by dataset name to identify a particular dataset.
    • Filter the datasets by Migration status to identify the datasets that still require migration.
    • Sort the datasets by Remaining references to identify the most-used datasets.
  • Optionally perform other management tasks for datasets:

    • Select More () > Delete dataset to delete a dataset.
    • Select More () > Migrate dataset to migrate a dataset that has not yet been migrated.

Programmatically manage dataset migration to data models

To assist with the migration, you can use the List datasets (GET /v2/datasets) endpoint to plan your dataset migration or guide cleanup after performing a migration.

📘

You cannot programmatically migrate datasets to data models due to the different document structures. To prevent duplicate migrations of linked datasets, using the option in the Sigma UI is required.

Plan your migration

  • An admin can retrieve a list of all datasets in the organization by calling the List datasets endpoint with the skipPermissionCheck option set to true. Review the response and sort or group the datasets by owner, location (path) of the dataset in Sigma, migration status, and total number of documents that reference the dataset. This information is also available in the Sigma UI.

  • Identify workbooks that use a dataset as a source by calling the List workbooks (GET /v2/workbooks) endpoint with the skipPermissionCheck option set to true and excludeExplorations set to true, then calling the List workbook sources (GET /v2/workbooks/workbookId/sources) endpoint for each workbook.

    • Identify data models that use a dataset as a source by calling the List data models (GET /v2/dataModels) endpoint, then calling the List data model sources (GET /v2/dataModels/dataModelId/sources) endpoint for each data model. When you programmatically list data models, only data models that the user associated with the API credentials has access to are returned.
  • Identify tagged workbook versions that use a dataset as a source and need to be re-tagged to a tagged data model version:

    1. For each version tag, call the List workbooks for a tag (GET /v2/tags/{tagId}/workbooks) endpoint.
    2. For each workbook, use the taggedWorkbookId as the workbookId and call the List workbook sources (GET /v2/workbooks/workbookId/sources) endpoint.
    3. Review the response for datasets.

Clean up after a migration

After migrating datasets in your organization, you can perform cleanup tasks like the following:

  • Identify datasets with documents that still reference the dataset. An admin can retrieve a list of all datasets in the organization by calling the List datasets endpoint with the skipPermissionCheck option set to true, then review the response for datasets where the referenceCount is more than 0.
  • Delete migrated datasets with no remaining references after performing manual QA to confirm that all dataset configurations are recreated in a data model. An admin can retrieve a list of all datasets in the organization by calling the List datasets endpoint with the skipPermissionCheck option set to true. For any datasets in the response with a migrationStatus of migrated, call the Delete a file (DELETE /v2/files/{inodeId}) endpoint using the datasetId as the inodeId.
  • Delete datasets not referenced by any documents directly. Datasets referenced by custom SQL are not listed in the reference count, so if you use custom SQL, manually confirm that the dataset is unused. An admin can retrieve a list of all datasets in the organization by calling the List datasets endpoint with the skipPermissionCheck option set to true. For any datasets in the response with a migrationStatus of not-required, call the Delete a file (DELETE /v2/files/{inodeId}) endpoint using the datasetId as the inodeId.