Schedule incremental refresh for an extract
You can schedule incremental refreshes for a dataset extract. With an incremental refresh, you add only new rows since the previous refresh, based on a date. For example, your data source is updated daily with new transactions. Instead of rebuilding the entire extract every day, you can just add the new transactions added that day.
When performing an incremental refresh, consider the following:
-
A dataset must have a date column that is updated when new rows are added.
-
Incremental refresh ignores the existing rows that were updated but considers the rows appended to a data source.
-
Incremental refresh does not identify newly added columns.
Consequently, you should still occasionally run a full refresh. For details, see Schedule full refresh for an extract. For information on refresh types, see Data extracts.
Prerequisites
-
You have a dataset extract in the Datasets pane.
-
The dataset extract has a column with dates (DateTime) to identify the new rows.
-
You have all the necessary role permissions granted by an administrator in Access Manager.
Permissionsdataprep.access
dataprep.dataset.create
dataprep.dataset.extract
-
For the content in Shared with me, you need to have the View and Edit shared content permissions granted by the sharer.
Procedure
-
On the sidebar, click Datasets.
The Datasets pane appears.
Tile view is selected by default.
-
For the dataset, point to More actions, and then click Modify > Extract > Schedule refresh.
The Schedule extract refresh dialog appears.
-
Select Incremental for the refresh type.
-
(Single-table extract) Select a column based on which the system can identify new data rows.
(Multi-table extract) If the extract consists of multiple tables:
-
Click Table and select a table that you want to refresh from the dropdown list.
-
Select a column based on which the new data rows can be identified.
-
Repeat the previous steps for other tables that need to be refreshed.
Note: You can select the Use the same schedule for all tables option to define the schedule once for all tables. At any time, you can unselect the option and adjust the schedule for each table.
-
-
Click Schedule and define the frequency of the refresh:
-
Hourly – Run every hour on a specific day(s) of the week.
-
Daily – Run once a day at the specified hour (including time zone).
-
Weekly – Run on a specific day(s) of the week at the specified hour (including time zone).
-
Monthly – Run on a specific day(s) of the month at the specified hour (including time zone). For days, you can select Last to always have the last day of the month. The selected days are highlighted. To deselect a day, click it again.
Click Save.
The schedule summary is updated.
Figure 1: Scheduling incremental refresh for a multi-table extract
Figure 2: Scheduling incremental refresh for a single-table extract
-
-
Click Save.
You can review the date and time of the last extract refresh in the Datasets pane. In the List view, point to the info icon next to the Extract label.
-
If you no longer need a schedule, you can do the following:
-
A: Inactivate a schedule (put it on hold) – Deselect the checkbox in front of the schedule record.
-
B: Remove a schedule – Point to the schedule record and click Delete (multi-table extract) or Clear (single-table extract).
-