Data Analysis and Reporting in Marketing

GA4 to Google Sheets: Sampling, Thresholds & Cardinality (2026)

Adela
April 20, 2026
GA4 to Google Sheets for Data Integrity: Sampling Thresholding 2026

GA4 doesn't tell you when it's lying to you. The numbers come back, the rows look populated, but underneath: sampled data, hidden rows from privacy thresholding, queries collapsed into an opaque (other) bucket, dimensions silently dropped because they exceeded cardinality limits. Most teams pull GA4 into Sheets and treat the output as ground truth. It isn't.

This guide is for getting GA4 data into Sheets with the data integrity layer intact. Three methods to extract (free Reports Builder, Apps Script, scheduled connector), the GA4-specific quirks at each layer (sampling thresholds, Google Signals thresholding, (other) bucket triggers, cardinality limits, 14-month retention), and then — the part the rest of the internet skips — the two data integrity dashboards: a sampling/threshold detector and a high-cardinality dimension splitter.

For paid media reporting (creative testing on Meta Ads, firmographic targeting on LinkedIn Ads) or CRM-side velocity (HubSpot to Sheets for RevOps), we have separate guides. This post is about the data trust layer: when can you actually believe what GA4 returned? For the equivalent trust-layer questions on organic search (anonymized queries, the 16-month retention wall), see our Search Console to Google Sheets guide.

Why GA4 reports lie (and how to detect it)

Four mechanisms cause GA4 to return technically valid but practically misleading data. Knowing the trigger for each lets you avoid them by query design.

1. Sampling. When a query combines high-cardinality dimensions over a long date range, GA4 samples your sessions and extrapolates. Standard GA4 properties hit sampling at ~10 million events per query in Explorations and at the equivalent threshold via the Data API (the exact threshold isn't published, but it's lower than 360's 1 billion). The query returns successfully — there's just no warning in the API response that the numbers are estimates.

2. Data thresholding (privacy threshold). When Google Signals is enabled and a row contains too few users to be considered privacy-safe (typically fewer than 10), GA4 hides the row entirely. Your sessions total at the top doesn't match the sum of rows below it. Smaller properties are hit hardest — a niche B2B site with 500 weekly users can lose 30%+ of rows to thresholding without any visible warning.

3. The (other) bucket. Each GA4 query has a cardinality limit. For standard properties, dimension combinations that exceed ~50,000 distinct values get rolled up into a single (other) row containing the leftover totals. Query page_path with query strings across a year of data and you'll see most of your traffic disappear into (other). The metric totals are right; the per-row attribution is wrong.

4. Cardinality drops on the dimension itself. Custom dimensions configured with high cardinality (a user_id, a transaction_id, a unique session ID) hit the GA4 cardinality limit and stop being recorded for new events. The Data API still queries against the field, just with truncated data.

None of these throw errors. The query returns 200 OK. You paste into your dashboard. The dashboard lies. The fix is either query design (smaller date ranges, fewer dimensions, aggregated paths) or moving to GA4 → BigQuery export, where sampling and thresholding don't apply because you query raw event data directly. But that's a different pipeline; this post is about Sheets.

What the four problems share: they're detectable from the response if you know what to look for. The Data API returns metadata flags for sampling, threshold counts for hidden rows, and the (other) row appears literally in the data. The methods below explain how each extraction path exposes (or hides) those flags.

Method 1: GA4 Reports Builder (free official add-on)

This is Google's own spreadsheet add-on. It queries the GA4 Data API under the hood but hides the code from you — and hides the data integrity flags too, which is its biggest weakness for serious reporting.

Setup: open a new Google Sheet → Extensions → Add-ons → Get add-ons → search GA4 Reports Builder for Google Analytics → install → Extensions → GA4 Reports Builder → Create new report. Pick property, dimensions, metrics, date range, click Create reports, then Run reports.

GA4-specific gotchas at this layer:

  • No sampling indicator. The add-on doesn't surface the samplingMetadatas field returned by the Data API. Your report looks clean while half the data is extrapolated.
  • No threshold count. The Data API returns propertyQuota.thresholdedConcurrentRequests in the response metadata, but the add-on doesn't display it. Rows hidden by Google Signals are silently missing.
  • The (other) bucket shows up as a literal row labeled (other). This one you can see — but only if you scroll the table. If you sort by sessions descending, (other) can be the top row.
  • Row limit per call. The Data API returns max 100,000 rows per call. Reports Builder doesn't paginate; queries needing more get truncated.
  • No 360 distinction. The add-on doesn't tell you whether you're on standard or 360, which matters because the sampling thresholds are 100x different.

Fits when: quarterly board snapshot at a high dimension (channel, source/medium), small-property weekly traffic check.

Doesn't fit when: any report where you need to know whether the data was sampled, multi-property reporting, queries that approach the cardinality limit (most page-level analyses).

Method 2: Google Apps Script + GA4 Data API

For analysts who want the sampling and threshold metadata exposed. The Data API returns it in every response; you just have to read and surface it. Setup: 1–2 hours. Maintenance: yours, including handling OAuth token refresh.

Step 1 — Enable the API. In Google Sheets, open Extensions → Apps Script → Services → Add Google Analytics Data API v1beta. The service handles OAuth automatically using your Google account.

Step 2 — Apps Script that surfaces data integrity flags. Paste into Apps Script:

function pullGA4WithIntegrityFlags() {
  const PROPERTY_ID = 'YOUR_PROPERTY_ID';

  const request = {
    dateRanges: [{ startDate: '30daysAgo', endDate: 'yesterday' }],
    dimensions: [
      { name: 'sessionDefaultChannelGrouping' },
      { name: 'date' }
    ],
    metrics: [
      { name: 'sessions' },
      { name: 'totalUsers' },
      { name: 'keyEvents' }
    ],
    // Request the metadata that signals sampling and thresholding
    returnPropertyQuota: true,
    keepEmptyRows: false
  };

  const report = AnalyticsData.Properties.runReport(request, 'properties/' + PROPERTY_ID);

  // Inspect data integrity flags
  const sampled = (report.metadata && report.metadata.samplingMetadatas)
    ? report.metadata.samplingMetadatas
    : null;
  const dataLossFromOther = (report.metadata && report.metadata.dataLossFromOtherRow) || false;
  const quota = report.propertyQuota || {};
  const concurrent = quota.concurrentRequests || {};
  const thresholdedReports = (concurrent.consumed || 0);

  const sheet = SpreadsheetApp.getActiveSheet();
  sheet.clear();

  // Top row: data integrity warnings
  sheet.appendRow(['DATA INTEGRITY FLAGS']);
  sheet.appendRow(['Sampled?', sampled ? 'YES - check samplingMetadatas' : 'No']);
  sheet.appendRow(['(other) row present?', dataLossFromOther ? 'YES - cardinality exceeded' : 'No']);
  sheet.appendRow(['Threshold concurrent consumed', thresholdedReports]);
  sheet.appendRow([]);

  // Then the data
  sheet.appendRow(['Channel', 'Date', 'Sessions', 'Users', 'Key events']);
  (report.rows || []).forEach(r => {
    sheet.appendRow([
      r.dimensionValues[0].value,
      r.dimensionValues[1].value,
      r.metricValues[0].value,
      r.metricValues[1].value,
      r.metricValues[2].value
    ]);
  });
}

Step 3 — Schedule. Apps Script → Triggers → Add Trigger → run daily at 6 AM. Critical: the top rows show data integrity flags so you know whether to trust the report before you read it.

GA4-specific gotchas at the API layer:

  • The Data API returns sampling metadata only when you request it. You have to read response.metadata.samplingMetadatas per date range. Skipping that check is why most homegrown scripts hide sampling.
  • dataLossFromOtherRow in response.metadata. Signals that the (other) bucket absorbed rows. If your dimension is high-cardinality (page_path with query strings, full URLs), this flag is what tells you to redesign the query.
  • Quota errors come back as HTTP 429 with quotaExceeded in the body. Free properties: 25,000 tokens/day standard. 360 properties: 250,000/day. Daily breakdowns × many properties hits this fast.
  • 14-month retention for standard properties. If you query data older than 14 months, you get nothing (no error, just empty rows). For longer history, GA4 → BigQuery export is the only path.
  • Custom dimension cardinality cap. Custom event-scoped dimensions stop recording new values after the daily cardinality cap (~50,000 distinct values). The API still queries the field — just with truncated data.

For the dashboard-side context once data lands in Sheets, see our Google Sheets automation guide.

Skip the GA4 quota gymnastics

Quota tokens, sampling metadata parsing, (other) bucket detection, threshold counts. Dataslayer surfaces all of it automatically, plus 50+ other connectors (Google Ads, Meta, LinkedIn, HubSpot) in the same workbook.

Try Dataslayer Free

Method 3: Scheduled connector (no-code)

For teams reporting GA4 data weekly across multiple properties, with data integrity flags visible and quota usage tracked across queries.

Dataslayer connects GA4 to Google Sheets in under 10 minutes. From Google Sheets: Extensions → Add-ons → Get add-ons → install Dataslayer. Then Extensions → Dataslayer → Launch, pick GA4, authenticate, pick properties, pick metrics + dimensions (the field picker categorizes by Acquisition, Engagement, Monetization, Tech, Custom), set date range, click Run.

Why a connector wins specifically for GA4 data integrity work:

  • Sampling flag surfaced per query. When the Data API returns samplingMetadatas, the connector shows it next to the result so you know to discount the numbers.
  • (other) bucket detection. If your query hit the cardinality limit, the connector warns and suggests breaking the date range.
  • Multi-property batched within quota. Pulling 10 client properties in one workbook is one query block per property. Each respects its own quota; you don't manually balance.
  • 14-month retention awareness. Queries for older dates return a clear warning instead of silent empty rows.
  • Reporting identity exposure. The GA4 Data API uses whatever identity you specify; the connector defaults to DEVICE_BASED (matches the API default) and lets you switch to BLENDED (matches the UI default) per query so reconciliation between Sheets and the GA4 UI is straightforward.

Pricing: Free for 1 connector and 1 user. Starter $35/month annual covers 3 connectors and 1 destination. Advanced $115/month adds hourly schedule, AI Insights, and MCP integration for Claude/ChatGPT (ask Claude "did our GA4 data get sampled last week?" and it queries live). Pro $345/month covers 100+ accounts per connector. See dataslayer.ai/pricing.

The GA4 gotcha NO method solves: the GA4 → BigQuery export. For unsampled, unthresholded, full-cardinality data, you have to enable the BigQuery export in GA4 admin, then query BigQuery directly. The methods in this post pull from the Data API — which is convenient but subject to all four integrity problems above. If you need raw event data, BigQuery is the only path.

The 2 data integrity dashboards worth building

Once data flows to Sheets with metadata flags exposed (Apps Script or connector method), here are the two reports that make GA4 reporting trustworthy. Neither is something the GA4 UI shows, and both require the metadata layer Method 1 hides.

Dashboard 1: Sampling and threshold detector

The question: across all my reports, which ones contain sampled, thresholded, or (other)-truncated data — and how much of the report is affected?

Columns to pull from GA4:

  • Report name (one row per query in your workbook)
  • samplingMetadatas (per date range): contains samplesReadCount and samplingSpaceSize
  • dataLossFromOtherRow (boolean per query)
  • Sessions total (top-level metric)
  • Sum of session rows in the result (sum the metric column)

Calculations in Sheets:

  • Sampling ratio: samplesReadCount / samplingSpaceSize per date range — values below 1.0 mean the data is sampled
  • Threshold ratio: (top-level sessions total - SUM(row sessions)) / top-level sessions total — values above 0 mean rows are hidden by thresholding
  • (other) ratio: (other)_row_sessions / total_sessions — values above 0.05 mean the cardinality limit is biting
  • Integrity score: compound metric, e.g. =SI(O(sampling_ratio<1, threshold_ratio>0, other_ratio>0.05), "WARNING", "OK")

Pivot layout:

  • Rows: report name (one per query in your workbook)
  • Columns: sampling ratio, threshold ratio, (other) ratio, integrity score
  • Conditional formatting: red when integrity score = WARNING

What this surfaces that the GA4 UI doesn't:

  • "The landing page report has 47% of sessions in the (other) bucket" → query is too granular, aggregate to top-level pages
  • "The cohort retention report shows sampling_ratio = 0.34" → only 34% of the data was actually read; the rest is extrapolated. Move this to a BigQuery export.
  • "The audience segment report has threshold_ratio = 0.18" → 18% of rows are hidden. Property too small for that segmentation; consolidate.

Dashboard 2: High-cardinality dimension splitter

The question: when I need page-level data over a long date range, how do I split the query to avoid the cardinality limit and the (other) bucket?

Strategy: instead of one query for (page_path, date) × 90 days, split into many smaller queries. Each query stays under the ~50,000 distinct-value limit. Stitch the results in Sheets.

Splitting strategies:

  • By date chunk. Query weekly instead of monthly. 4 queries per month, each well under the limit.
  • By section. Filter by page_path prefix. Query /blog/*, /product/*, /pricing/* separately, then stitch.
  • By traffic source. If a single source dominates, filter by sessionMedium=organic, then run separately for paid and direct.
  • By landing page only. Use landingPage instead of pagePath. Lower cardinality because users land on fewer unique URLs than they browse.

Calculations in Sheets:

  • Cardinality estimate per query: =COUNTUNIQUE(page_path_column) per chunk — flag chunks above 40,000 to split further
  • Sessions stitched: =SUM over all chunks for the same period and dimension combo
  • (other) leak: If any chunk still shows (other), isolate the offending date or section and re-split

What this enables that a single query doesn't:

  • Page-level performance across 365 days without losing data to (other)
  • Content audits where you need every URL's actual session count, not the rolled-up bucket
  • Reliable joins to other data sources by URL — the join only works when every URL is in its own row

For content-side analysis once you have unsampled URL-level data, see Why Marketing Attribution Is Broken in 2026.

Comparing the three methods

Aspect Reports Builder Apps Script Scheduled connector
Setup time 5 min 1–2 hr Under 10 min
Cost Free Free From $35/mo
Sampling flag visible No Yes (with code) Yes (auto)
Threshold count visible No Yes (with code) Yes (auto)
(other) bucket warning Only as a row Via dataLossFromOtherRow Auto warning
Multi-property in one sheet One report block each Loop in code Native, one query block each
Pagination over 100K rows No (truncates) Yes (write the loop) Auto
Scheduled refresh Manual Apps Script triggers Built-in (hourly tier)
Code maintenance None Owner Vendor

The decision: for GA4, the answer almost always involves the metadata layer. If you don't have the sampling flag, threshold count, and (other) warning visible, you're flying blind. That means Apps Script or connector — and for multi-property reporting, the connector wins on quota awareness alone.

GA4-specific quirks worth knowing

Five GA4 behaviors that affect every method and every dashboard:

  • Google Signals thresholding. Properties with Signals on hide rows that don't meet the privacy threshold (typically <10 users). The property owner can turn off Signals for the reporting identity in Admin → Property → Data Settings → Reporting Identity (use "Device-based" instead of "Blended"). Only do this if your privacy policy allows it.
  • 14-month standard retention. Data older than 14 months returns empty rows on standard properties. 360 retains up to 50 months. For longer history, GA4 → BigQuery export is the only path.
  • Reporting identity choice changes the numbers. The GA4 UI defaults to Blended; the Data API defaults to Device-based. Same property, same date range, different numbers. Pick one identity in your query and stick with it across all reports.
  • The Data API doesn't include real-time data. The standard runReport endpoint excludes the last 30 minutes by default. For live data use runRealtimeReport, which has its own (much smaller) dimension and metric set.
  • Standard vs 360 sampling thresholds differ ~100x. Standard properties hit sampling much sooner. If your property is on 360, you can ignore many of the sampling concerns; if not, design every query assuming sampling will hit.

Common errors and how to read them

A few errors show up often enough that recognizing them saves debugging time.

RESOURCE_EXHAUSTED or HTTP 429 with quotaExceeded: you hit a Data API quota — daily tokens, hourly tokens, or concurrent requests. Standard properties: 25,000 tokens/day. 360 properties: 250,000/day. Solution: reduce frequency, batch fewer reports per script run, or move heavy queries to a later hour when quota resets.

INVALID_ARGUMENT on a dimension or metric name: GA4 deprecates and renames metrics. conversions was renamed to keyEvents in 2025. Old queries break silently. Solution: check the Data API schema reference for the current name.

Empty rows with no error: the property is too small for the dimension combination + thresholding is on, or the date range is older than 14 months, or your reporting identity returned zero users. Check the threshold flag in metadata first.

Mismatch with the GA4 UI: the most common cause is different reporting identity (UI default Blended vs API default Device-based). Second most common: the UI's 30-minute processing window not yet applied to the API. Third: sampling on one side and not the other for the same date range. Match identity and date range exactly when reconciling.

(other) row dominating the report: your dimension combination exceeded the cardinality limit. Solution: split by date chunk, by section, or by traffic source as described in Dashboard 2.

FAQ

How do I know if my GA4 query was sampled?
Read response.metadata.samplingMetadatas. If samplesReadCount / samplingSpaceSize is less than 1.0, your data is sampled. Reports Builder hides this; Apps Script and connectors surface it. On 360 the sampling thresholds are much higher (~100x), so most queries stay unsampled.

Why don't my GA4 sessions add up between Sheets and the UI?
Three common causes: (1) different reporting identity (UI default Blended vs Data API default Device-based), (2) Google Signals thresholding hides rows in one but not the other, (3) sampling triggered on the larger query but not the smaller one. Match identity and date range, and check the threshold/sampling metadata.

What's the (other) row, and why is it sometimes huge?
GA4 has a cardinality limit per query (~50,000 distinct dimension values for standard properties). When your dimension combination exceeds that, everything past the limit gets rolled into a single (other) row containing the leftover totals. The metric totals are correct; the per-row attribution is lost. Fix: split the query by date or by section to stay under the limit.

Does sampling apply to BigQuery exports?
No. The GA4 → BigQuery export contains raw event data, not aggregated reports. Sampling and thresholding are aggregation-time artifacts; they don't apply to raw events. For unsampled analysis at scale, BigQuery is the only path. Setup requires enabling the export in GA4 admin and a Google Cloud project with billing enabled.

How do I get more than 14 months of GA4 data into Sheets?
You can't directly from the Data API on a standard property. The options are: enable GA4 → BigQuery export (going forward), upgrade to GA4 360 (50-month retention), or archive monthly Data API pulls to a separate Sheet so you build a history beyond what GA4 retains.

Why are some of my custom dimensions returning truncated data?
Custom event-scoped dimensions have a daily cardinality cap (~50,000 distinct values per day on standard properties). After hitting the cap, new distinct values stop being recorded for the rest of the day. Solution: don't use unique IDs (user_id, transaction_id) as event-scoped custom dimensions — use them at user scope or set up the BigQuery export.

Should I use the GA4 Data API or the GA4 → BigQuery export?
Data API for aggregated reports (sessions, conversions, channel grouping) up to standard sampling/threshold limits. BigQuery export for raw event data, joins to other warehouses, long-term retention, and any analysis that cannot tolerate sampling. Many teams run both: Data API for weekly dashboards, BigQuery for analytical deep dives.

Conclusion

Three methods for getting GA4 data into Google Sheets exist on a spectrum: free Reports Builder for quarterly snapshots at high dimensions, Apps Script for analysts who want the integrity flags exposed via code, scheduled connector for teams running multi-property weekly reports with quota and sampling awareness baked in.

But the methods are means, not the end. The deliverable is the two data integrity reports: a sampling/threshold detector that flags reports you can't trust, and a high-cardinality dimension splitter that lets you query page-level data without losing it to the (other) bucket. Both require the metadata layer the free Reports Builder hides — and most teams accept whatever GA4 returns as truth, then build dashboards on data that's been quietly sampled, thresholded, or truncated.

Surface the integrity flags. Split the cardinality. The connector is a Thursday afternoon of setup; the data trust is something you'll rely on every Monday for the next year. Start a free Dataslayer trial if you want to skip the metadata parsing and get to the dashboards faster.

HOW CAN WE HELP?

Knowledge baseSupport ticketContact

RELATED POST

How to Connect Klaviyo to Google Sheets for Ecommerce Email and SMS Attribution

Visual Query Builder vs SQL: Which One Should Your Marketing Team Use?

Marketing Data Warehouse in 2026: When You Need One and Which Path Fits

Our Partners

Google Cloud Partner
Microsoft Partner