IMPORTANT: This article covers the setup of a warehouse for data load from Improvado, not a customer data warehouse from which data is extracted. This article doesn't cover the setup of a customer data warehouse for Data Prep as well.
Select how you want to authenticate to Google Cloud Storage
There are two ways you can get authenticated to GCS using Improvado UI:
You can learn how to use any of these methods by following the instructions below.
Workload Identity Federation authentication
With identity federation, you can use Identity and Access Management (IAM) to grant external identities IAM roles, including the ability to impersonate service accounts. This approach eliminates the maintenance and security burden associated with service account keys. Learn more about Identity Federation here: Workload identity federation | IAM Documentation | Google Cloud.
Configure Workload Identity Federation
Setup a Workload pool and Provider for your Google Cloud project
Specify the Improvado AWS account ID that you can find on Improvado UI:
Configure attribute mapping and conditions to allow only one AWS IAM role that is called: "workload_identity_federation"
Learn more about Workload Identity Federation configuration here.
Required information
Title
Bucket Name
~Preferred bucket for GCS uploading
~Bucket Name can only contain letters, numbers, dots and underscores and must start and end with a letter or number
~Bucket Name length must be between 3 and 222 characters
```{{filename}}``` is the same as the destination table name
```{{dataclass}}``` - is an optional parameter that describes how the data will be updated in the destination.
~Possible data class values:
~~```daily```
~~```monthly```
~~```weekly```
~~```last_day```
~~```last_day_incremental```
~~```unknown```
IMPORTANT: you cannot use {{DD}} for partition by month
```{{filename}}-{{YYYY}}-{{MM}}-{{DD}}``` – for partition by day
```{{filename}}-{{YYYY}}-{{MM}}``` – for partition by month
Also, you can use “_” instead of “-” or do not use any symbols at all, for example:
```{{filename}}_{{YYYY}}-{{MM}}-{{DD}}```
```{{filename}}{{YYYY}}{{MM}}{{DD}}```
File format
Possible formats:
csv
csv+gzip
json
json+gzip
parquet
Separator
Possible delimiters that can separate data in your file:
comma
semicolon
tab
Partition by
Possible ways of splitting data:
Day (default value)
Month
Encryption
Possible options:
Default Cloud Storage encryption
Customer-managed encryption keys
Customer-supplied encryption keys
Encryption Key
If you have selected the Default Cloud Storage encryption type, you will not be able to edit this field, the default value is stub. Otherwise, you should enter your AES-256 key, encoded in standard Base64, or the resource name of the Cloud KMS key used to encrypt the blob’s contents. For more info, see Google Cloud Storage encryption docs.
```{{data_source}}``` is a data provider, integration, connector
```{{data_table_title}}``` is an object that contains all extraction orders with the same granularity (dimensional schema)
```{{report_type}}``` is a set of such fields as metrics, properties, dimensions, etc.
```{{timestamp}}``` is the date and time when data load started
If you use ```/{{YYYY}}/{{MM}}/{{DD}}``` settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.By request to the support team, we are able to support different root structures in a bucket.
Use static IP
Select Yes for Use static IP option if you allow Improvado to connect your database by the static IPs mentioned on the Destination connection page.
Select No if you have permitted access to your database from any IP. In this case, Improvado will connect your database using dynamic IPs not listed on the Destination connection page.