Blog

Get GA4 events into BigQuery without sampling

The native export is good. The sGTM-driven export is better, especially for high-traffic sites and custom dimensions.

GA4 ships a free BigQuery export that handles up to one million events a day. Above that threshold you need the GA4 360 license, which costs more than most teams want to spend on what is fundamentally a data pipeline. There is a cleaner option: forward your GA4 events from your sGTM container directly to BigQuery in parallel with the GA4 export.

Why bother if the native export works

The native export has three limitations that the sGTM route avoids: it samples above 1M events per day on the free tier, custom dimensions take 24 hours to appear in the export schema, and the data lands with a delay of several hours rather than near-real-time.

If you are building dashboards from BigQuery and need same-day data, or you are running custom dimensions that change weekly, the sGTM route gives you control over both.

The architecture

  1. Add a Custom Template tag in your sGTM container that triggers on the same events as your GA4 tag.
  2. Map the event payload to a BigQuery row, including the event_id, event_name, parameters, user_pseudo_id, and timestamp.
  3. POST the row to a Cloud Function or webhook endpoint that streams it into BigQuery using the streaming insert API.
  4. For high volume, batch in the Cloud Function and use load jobs instead of streaming inserts. Cheaper at scale.

Schema design that scales

Use a single events table with the same shape as the native GA4 export. That way your existing GA4 BigQuery queries continue to work against the new table with minimal changes. The columns to definitely include: event_timestamp, event_name, user_pseudo_id, event_params (as a repeated record), user_properties, geo, device.

Partition by event_date and cluster by event_name. Without these two settings, your query costs will be five to ten times what they need to be.

When to keep the native export anyway

If your data team relies on the native GA4 schema for downstream models, run both. The sGTM-driven table for real-time dashboards, the native export for compatibility. Storage is cheap; rebuilding queries is not.

For the inverse case (you want to send the BigQuery data back into sGTM for activation), there is a separate post on that pipeline.