Clustering in BigQuery

We’re excited to announce the introduction of Clustering for BigQuery warehouses in Daton! This functionality is designed to optimize query costs and enhance performance when using BigQuery.

What is Clustering?

Clustering in BigQuery arranges rows with similar values closer together, based on one or more specified columns.

Benefit: Optimized query performance through data pruning, which scans only the relevant subsets of data based on filter criteria.
Target Column: Daton implements clustering on the _daton_batch_runtime column to maximize efficiency.

How to Enable Clustering?

New Destination setup

During destination creation, you can select TRUE for the enable clustering option.

Existing Destination

Go to Destinations page
Click on Edit destination

Set the enable clustering drop-down to TRUE

Note: Tables created or reloaded after this change will automatically be clustered on _daton_batch_runtime

Impact on Queries

Here's an example to compare the query cost difference before and after clustering

Before Clustering:

Here the query to get latest data pulled from Daton had to go through the entire data set costing 1.46 GB to process the data

After Clustering:

Once we enabled clustering, the same query is costing 10 MB providing efficiency of more than 99%.

Start optimizing your BigQuery queries with clustering today!

Table of Contents

What is Clustering? How to Enable Clustering? New Destination setup Existing Destination Impact on Queries

Clustering in BigQuery

Saras Pulse

Saras Daton

Saras IQ