Clustering in BigQuery
Clustering in BigQuery
We’re excited to announce the introduction of Clustering for BigQuery warehouses in Daton! This functionality is designed to optimize query costs and enhance performance when using BigQuery.
What is Clustering?
Clustering in BigQuery arranges rows with similar values closer together, based on one or more specified columns.
- Benefit: Optimized query performance through data pruning, which scans only the relevant subsets of data based on filter criteria.
- Target Column: Daton implements clustering on the _daton_batch_runtime column to maximize efficiency.
How to Enable Clustering?
New Destination setup
- During destination creation, you can select TRUE for the enable clustering option.
Existing Destination
- Go to Destinations page
- Click on Edit destination
- Set the enable clustering drop-down to TRUE
Note: Tables created or reloaded after this change will automatically be clustered on _daton_batch_runtime
Impact on Queries
Here's an example to compare the query cost difference before and after clustering
Before Clustering:
Here the query to get latest data pulled from Daton had to go through the entire data set costing 1.46 GB to process the data
After Clustering:
Once we enabled clustering, the same query is costing 10 MB providing efficiency of more than 99%.
Start optimizing your BigQuery queries with clustering today!