Data sources

To create a dataset, you can use the various data sources, such as data files, IBM data sources, databases, or any combination of these data sources.

If you create a dataset from two or more data sources (for example, a cube view and an Excel file), you need to define joins between them. For details, see Define joins between data sources.

Each table from the data source is displayed as a separate data source. You can make changes to these data sources, for example, remove columns or add calculations. The changes performed within Data Preparation are not saved in the original files.


You can use the data files of the following types: .csv, .tsv, .txt, .xlsx, .xls, and .sav.

To create a dataset, you can use one or more data files. You may need to prepare your files before uploading them. For details, see Preparing data files.


You can create datasets using the following database types:

  • Amazon Athena
  • Amazon Aurora (MySQL)
  • Amazon Aurora (PostgreSQL)
  • Amazon Redshift
  • Apache Derby
  • Apache Hive
  • Dremio
  • FirebirdSQL
  • Google Cloud SQL (MySQL)
  • Google Cloud SQL (PostgreSQL)
  • HyperSQL
  • IBM CognosTM1 / IBM Planning Analytics - cubes
  • IBM CognosTM1 / IBM Planning Analytics - cube views
  • IBM Cognos packages:

    • Relational package
    • DMR package
    • PowerCube
    • Dynamic Cube
    • TM1/PA cube
  • Informix
  • MariaDB
  • Microsoft Azure SQL Database
  • Microsoft Azure SQL Data Warehouse
  • Microsoft SQL Server
  • Microsoft SQL Server Analysis Services
  • MySQL
  • memSQL
  • Oracle
  • PostgreSQL
  • SAP Business Objects
  • SAP Hana
  • Snowflake

To create a dataset from a database, you need to define a connection to the respective server. For details, see Add data connections.