Documentation

Google Cloud Storage

Updated on

Dec 18, 2023

IMPORTANT: This article covers the setup of a warehouse for data load from Improvado, not a customer data warehouse from which data is extracted. This article doesn't cover the setup of a customer data warehouse for Data Prep as well.

Select how you want to authenticate to Google Cloud Storage

There are two ways you can get authenticated to GCS using Improvado UI:

You can learn how to use any of these methods by following the instructions below.

Workload Identity Federation authentication

With identity federation, you can use Identity and Access Management (IAM) to grant external identities IAM roles, including the ability to impersonate service accounts. This approach eliminates the maintenance and security burden associated with service account keys.
Learn more about Identity Federation here: Workload identity federation  |  IAM Documentation  |  Google Cloud.

Configure Workload Identity Federation

  1. Setup a Workload pool and Provider for your Google Cloud project
  2. Specify the Improvado AWS account ID that you can find on Improvado UI:
  1. Configure attribute mapping and conditions to allow only one AWS IAM role that is called: "workload_identity_federation"

Learn more about Workload Identity Federation configuration here.

Required information

  • Title
  • Bucket Name
  • ~Preferred bucket for GCS uploading
  • ~Bucket Name can only contain letters, numbers, dots and underscores and must start and end with a letter or number
  • ~Bucket Name length must be between 3 and 222 characters
  • Filename
  • File format
  • Separator
  • GCS Region
  • Partition by
  • Encryption
  • Encryption Key
  • Root Name for GCS uploading (optional)
  • ~Root Name can only contain letters and numbers and have between 1 and 64 characters in length
  • Use static IP
  • Authentication type
  • ~Select Workload Identity Federation
  • GCP Project Number
  • Workload Pool ID - Pool IDs are used as identifiers in IAM
  • AWS Provider ID - Providers manage and verify identities
  • Service Account Email - A service account is identified by its email address, which is unique to the account

Service Account Key authentication

Generate a Service Account Key JSON file

In order to use Service Account Key authentication, first, you need to generate a JSON file via Google Cloud Console using official documentation or interactive step-by-step guide provided by Google. Alternatively, you can follow the instruction below:

  1. In Google Cloud Console, go to IAM & AdminService Accounts.
  1. Click on the Actions button for your Service account and select Manage keys.
  1. In the KEYS tab section, click ADD KEY Create new key. Choose JSON as a key type and click Create.
  1. In the JSON file, make a note of your Project ID.

Required information

  • Title
  • Bucket Name
  • ~Preferred bucket for GCS uploading
  • ~Bucket Name can only contain letters, numbers, dots and underscores and must start and end with a letter or number
  • ~Bucket Name length must be between 3 and 222 characters
  • Filename
  • File format
  • Separator
  • GCS Region
  • Partition by
  • Encryption
  • Encryption Key
  • Root Name for GCS uploading (optional)
  • ~Root Name can only contain letters and numbers and have between 1 and 64 characters in length
  • Use static IP
  • Authentication type
  • ~Select Service account key
  • Service account key
  • Upload your JSON file here
  • GCP Project Number

How to connect

You need to share access for your Google Cloud Storage bucket to Improvado Google Service account: improvado-gcs-loader@green-post-223109.iam.gserviceaccount.com with a role at GCS bucket: Storage Object Admin.

Learn more here.

Additional information

Filename

Possible parameters:

```{{filename}}-{{dataclass}}-{{YYYY}}-{{MM}}-{{DD}}```

  • ```{{filename}}``` is the same as the destination table name
  • ```{{dataclass}}``` - is an optional parameter that describes how the data will be updated in the destination.
  • ~Possible data class values:
  • ~~```daily```
  • ~~```monthly```
  • ~~```weekly```
  • ~~```last_day```
  • ~~```last_day_incremental```
  • ~~```unknown```

IMPORTANT: you cannot use {{DD}} for partition by month

  • ```{{filename}}-{{YYYY}}-{{MM}}-{{DD}}``` – for partition by day
  • ```{{filename}}-{{YYYY}}-{{MM}}``` – for partition by month

Also, you can use “_” instead of “-” or do not use any symbols at all, for example:

  • ```{{filename}}_{{YYYY}}-{{MM}}-{{DD}}```
  • ```{{filename}}{{YYYY}}{{MM}}{{DD}}```

File format

Possible formats:

  • csv
  • csv+gzip
  • json
  • json+gzip
  • parquet

Separator

Possible delimiters that can separate data in your file:

  • comma
  • semicolon
  • tab

Partition by

Possible ways of splitting data:

  • Day (default value)
  • Month

Encryption

Possible options:

  • Default Cloud Storage encryption
  • Customer-managed encryption keys
  • Customer-supplied encryption keys

Encryption Key

If you have selected the Default Cloud Storage encryption type, you will not be able to edit this field, the default value is stub.
Otherwise, you should enter your AES-256 key, encoded in standard Base64, or the resource name of the Cloud KMS key used to encrypt the blob’s contents. For more info, see Google Cloud Storage encryption docs.

Root Name

Possible parameters:

```/{{data_source}}/{{data_table_title}}/{{report_type}}/{{YYYY}}/{{MM}}/{{DD}}/{{timestamp}}```

  • ```{{data_source}}``` is a data provider, integration, connector
  • ```{{data_table_title}}``` is an object that contains all extraction orders with the same granularity (dimensional schema)
  • ```{{report_type}}``` is a set of such fields as metrics, properties, dimensions, etc.
  • ```{{timestamp}}``` is the date and time when data load started

If you use ```/{{YYYY}}/{{MM}}/{{DD}}``` settings, the data will be added to folders daily. Each new record will not delete the previous one, even for data that contains no date.By request to the support team, we are able to support different root structures in a bucket.

Use static IP

Select Yes for Use static IP option if you allow Improvado to connect your database by the static IPs mentioned on the Destination connection page.

Select No if you have permitted access to your database from any IP. In this case, Improvado will connect your database using dynamic IPs not listed on the Destination connection page.

Schema information

Setup guide

Settings

No items found.

Troubleshooting

Troubleshooting guides

Check out troubleshooting guides for
Google Cloud Storage
here

Limits

Frequently asked questions

No items found.
☶ On this page
Description
Related articles
No items found.
No items found.

Questions?

Improvado team is always happy to help with any other questions you might have! Send us an email.

Contact your Customer Success Manager or raise a request in Improvado Service Desk.