You can't use this until you register with us!




Description

Read data from files in Google Cloud Storage and store it in BigQuery.


Purchasing Options



Available Actions

Load Data Non-Natively

Read data from files in a Google Cloud Storage Bucket and store it in a BigQuery database using DataFrames.


Variables

Source Parameters
gcloud_connection required

Credentials to use to authenticate with the Google Cloud Platform to access Google Cloud Storage.

bucket required

Specify name of the Cloud Storage bucket with the files to load.

folder_path

Specify the path to folder containing the files to load. This parameter is used as a key prefix filter.

process_subfolders required

If enabled, files within subfolders under the Folder Path will also be loaded.

files_lookback

Specify the number of load records to examine, from most recent and going backwards, when comparing to determine which files have not yet been loaded. If no value is specified, all load records will be examined.

process_increment required

Number of files to load into the DataFrame in memory before the Dataframe is loaded into BigQuery.


File Format Parameters
format_type required

Select the format of the data in the files to load.

skip_rows

Number of rows to ignore at the beginning of files with delimited data format.

delimiter

Specify the character used to separate data values in files with delimited data format.

quote_char

Specify the character used to enclose string data values in files with delimited data format.

null_value

Specify the characters used to represent a null value in files with delimited data format.

escape_char

Specify the character used to represent an escaped value in files with delimited data format.

line_terminator

Specify the characters used to represent the end of a line.


Destination Parameters
bigquery_connection required

Credentials to use to authenticate with the Google Cloud Platform to access BigQuery.

project_id required

Specify the unique identifier of the Google Cloud project that contains the dataset where the data will be loaded.

dataset required

Specify the name of the dataset that contains the table where the data will be loaded.

bigquery_table required

Specify the name of the table where the data will be loaded.

region

Optionally specify the region where the dataset is located.

source_file_column required

Column within the Table where the path to the file that was the source of the data will be stored.

fields_mapping required

Mapping between the source file columns or JSON paths and BigQuery columns.


Load Data Natively

Read data from files in a Google Cloud Storage Bucket and store it in a BigQuery database using a native command.


Variables

Source Parameters
gcloud_connection required

Credentials to use to authenticate with the Google Cloud Platform to access Google Cloud Storage.

bucket required

Specify name of the Cloud Storage bucket with the files to load.

folder_path

Specify the path to folder containing the files to load. This parameter is used as a key prefix filter.

process_subfolders required

If enabled, files within subfolders under the Folder Path will also be loaded.


File Format Parameters
format_type required

Select the format of the data in the files to load.

skip_rows

Number of rows to ignore at the beginning of files with delimited data format.

delimiter

Specify the character used to separate data values in files with delimited data format.

quote_char

Specify the character used to enclose string data values in files with delimited data format.

null_value

Specify the characters used to represent a null value in files with delimited data format.

allow_quoted_new_lines

Allow quoted data sections that contain newline characters in files with delimited data format.

allow_jagged_rows

Allow rows that are missing trailing optional columns in files with delimited data format.

ignore_unknown_values

Allow extra values that are not represented in the table schema.

encoding

Character encoding of the data.

max_bad_rows

Maximum number of bad records that BigQuery will allow before failing the entire load.


Destination Parameters
bigquery_connection required

Credentials to use to authenticate with the Google Cloud Platform to access BigQuery.

project_id required

Specify the unique identifier of the Google Cloud project that contains the dataset where the data will be loaded.

dataset required

Specify the name of the dataset that contains the table where the data will be loaded.

bigquery_table required

Specify the name of the table where the data will be loaded.

region

Optionally specify the region where the dataset is located.