Google's BigQuery AI functions have evolved significantly throughout 2025 and into 2026. The foundational functions AI.GENERATE and AI.GENERATE_TABLE became generally available in late January 2026, while Google introduced three new managed AI functions (AI.IF, AI.SCORE, AI.CLASSIFY) in public preview in November 2025.
These functions eliminate the need for Python scripts in most data analysis workflows. If you're still exporting CSVs to categorize thousands of ad variations manually, this guide shows what's now possible directly in SQL.
What Are BigQuery AI Functions?
BigQuery AI functions let you call Gemini models directly from SQL queries. Instead of building complex CASE statements or writing Python scripts to classify data, you can write natural language prompts directly in your queries and let the AI handle the interpretation.
General-Purpose AI Functions (Generally Available)
Two core functions went GA in late January 2026:
- AI.GENERATE: Returns structured data (JSON, STRUCT) based on your prompt
- AI.GENERATE_TABLE: Returns a complete table with the schema you define
Managed AI Functions (Public Preview)
Three additional managed functions launched in November 2025 and are currently in public preview:
- AI.IF: Filter and join data based on semantic meaning
- AI.SCORE: Rate and rank inputs based on natural language criteria
- AI.CLASSIFY: Classify text or images into user-defined categories
All functions work with text, images, video, audio, and PDFs. According to Google Cloud's BigQuery ML documentation, you can use these functions anywhere in SQL: SELECT, WHERE, ORDER BY, GROUP BY.
The GA release brought simplified authentication. You can now use End User Credentials instead of managing service account permissions. Plus, full support for Gemini 3.0 Pro (which launched in November 2025) and new semantic search functions AI.EMBED and AI.SIMILARITY are all now available.
The generative AI overview covers the full feature set, including support for Anthropic Claude and Mistral models if you prefer those over Gemini.
How to Use BigQuery AI Functions: Setup
Before you can start, you need to connect BigQuery to Vertex AI's Gemini models. The setup takes about 5 minutes once you know the steps.
Enable two APIs in Google Cloud Console:
- BigQuery Connection API
- Vertex AI API
Create an external connection:
Navigate to BigQuery, then External Connections, then Add Connection, then Vertex AI remote models. Give it a name (such as vertex_ai_connection) and match your dataset region. This matters for latency.
Grant the right permissions:
Copy the service account email that appears, head to IAM & Admin, and add the "Vertex AI User" role to that account. This is a common setup mistake. Without this role, every query fails with permission errors.
Create your remote model:
CREATE OR REPLACE MODEL `project.dataset.gemini_model`
REMOTE WITH CONNECTION `project.us.vertex_ai_connection`
OPTIONS (ENDPOINT = 'gemini-2.0-flash-001');
For Gemini 3.0 Pro:
OPTIONS (ENDPOINT = 'gemini-3-pro');
Once your model is set up, you can start running queries. Here's what's possible with BigQuery AI functions.
7 Practical Marketing Queries
These seven use cases solve real marketing problems. Each example shows how to structure queries and what insights become available.
1. Classify Ad Headlines by Emotional Trigger
Say you have several thousand Google Ads headlines and want to understand which emotional triggers drive conversions. Manual tagging isn't realistic for that volume.
SELECT
headline,
clicks,
impressions,
AI.GENERATE(
MODEL `project.dataset.gemini_model`,
headline,
'Classify into one category: urgency, curiosity, benefit, neutral. Return only the category.'
) AS emotion_type
FROM `project.dataset.google_ads_headlines`
WHERE impressions > 1000
ORDER BY clicks DESC;
The query runs in minutes. In B2B campaigns, you might find that "curiosity" headlines outperform "urgency" angles significantly. The kind of pattern that remains hidden when manually reviewing thousands of variants.
Important: If you don't batch queries properly, you might wait 20+ minutes for results. Filter your data with WHERE clauses before AI.GENERATE runs.
2. Extract Customer Pain Points from Reviews
SELECT * FROM AI.GENERATE_TABLE(
MODEL `project.dataset.gemini_model`,
TABLE `project.dataset.customer_reviews`,
STRUCT(
'Extract the main complaint as a short phrase (max 10 words).' AS prompt,
'pain_point STRING, severity STRING' AS output_schema
)
)
WHERE severity = 'high'
GROUP BY pain_point
ORDER BY COUNT(*) DESC
LIMIT 5;
This approach pulls structured data from customer reviews showing which complaints appear most frequently. One common pattern: customers mention issues like "packaging arrives damaged" hundreds of times in reviews but only file a handful of support tickets. People complain in reviews, not to support teams.
Danish candy company Lakrids by Bülow used similar sentiment analysis on customer feedback, identified a significant increase in packaging complaints, updated their packaging design, and reduced complaints by 26% according to their customer service data.
Fair warning: AI.CLASSIFY sometimes gets creative with severity categories. Always spot-check the first 50-100 results before trusting the full dataset.
3. Generate Campaign Concepts from Your Best Performers
Instead of brainstorming in a vacuum, feed Gemini your top campaigns and ask for variations.
WITH top_campaigns AS (
SELECT campaign_name, ad_copy, roas
FROM `project.dataset.facebook_ads_performance`
WHERE roas > 3.0
ORDER BY roas DESC
LIMIT 10
)
SELECT AI.GENERATE(
MODEL `project.dataset.gemini_model`,
CONCAT(
'Based on these high-ROAS campaigns: ',
STRING_AGG(CONCAT(campaign_name, ' - ', ad_copy), ', '),
'. Generate 5 new campaign concepts following similar patterns. Numbered list.'
)
) AS campaign_ideas
FROM top_campaigns;
If "limited-time discount" campaigns consistently hit ROAS above 3.0, Gemini suggests variations like "exclusive early access" or "VIP pricing". Angles that follow the same psychological triggers but with fresh messaging.
4. Track Sentiment at Scale
Here's a query pattern that works well for tracking brand perception:
SELECT
DATE_TRUNC(comment_date, MONTH) AS month,
AI.GENERATE(
MODEL `project.dataset.gemini_model`,
comment_text,
'Sentiment: positive, negative, or neutral. One word only.'
) AS sentiment,
COUNT(*) AS comments
FROM `project.dataset.social_comments`
WHERE comment_date >= '2025-01-01'
GROUP BY month, sentiment
ORDER BY month DESC;
Track brand perception across tens of thousands of social comments month-over-month, all in SQL. When sentiment drops significantly, you know something changed. Then you can dig into which specific topics or products are getting negative mentions.
5. Classify Landing Pages by Funnel Stage
This query helps identify where paid budget is actually going:
SELECT
page_url,
sessions,
AI.GENERATE(
MODEL `project.dataset.gemini_model`,
page_content,
'Funnel stage: awareness, consideration, or decision. Base on content tone, CTA strength, product detail. Return stage only.'
) AS funnel_stage
FROM `project.dataset.landing_pages`
WHERE sessions > 500;
Marketing teams often discover they're allocating the majority of paid budget to awareness content when most conversions happen from decision-stage pages. This query makes budget reallocation decisions obvious.
If you're manually pulling campaign data from multiple platforms before loading it into BigQuery, that's your real bottleneck. Dataslayer automates data transfers from Google Ads, Facebook, LinkedIn, TikTok, and 50+ other sources directly to BigQuery. Once everything's centralized, you can run these AI functions across all campaigns in a single query.
6. Extract Structured Data from Campaign Briefs
If you store client briefs as PDFs in Cloud Storage, this eliminates hours of manual searching:
SELECT * FROM AI.GENERATE_TABLE(
MODEL `project.dataset.gemini_model`,
TABLE `project.dataset.campaign_brief_pdfs`,
STRUCT(
'Extract: monthly_budget (number), primary_kpi (string), target_age_range (string), target_locations (array).' AS prompt,
'monthly_budget INT64, primary_kpi STRING, target_age_range STRING, target_locations ARRAY<STRING>' AS output_schema
)
);
Now you can query "which clients target 25-34 year olds with budgets above $50k" instead of opening dozens of PDFs. Useful for resource planning and capacity forecasting.
7. Find Gaps in Competitor Messaging
Competitive analysis on advertising messaging reveals positioning opportunities:
SELECT
competitor_name,
AI.GENERATE(
MODEL `project.dataset.gemini_model`,
STRING_AGG(ad_headline, ' | '),
'Analyze these headlines. Top 3 messaging themes they emphasize? What angle are they NOT covering?'
) AS messaging_analysis
FROM `project.dataset.competitor_ad_scrapes`
WHERE scrape_date >= CURRENT_DATE() - 30
GROUP BY competitor_name;
When analyzing 50+ competitor ads reveals that everyone hammers the same 2-3 angles while ignoring others, you've found potential differentiation opportunities. This works especially well in crowded markets.
What This Actually Costs
No separate AI subscription needed. You pay regular BigQuery costs plus Gemini API usage.
Here's a real example: sentiment analysis on 10,000 customer reviews (200 characters average):
- Input: 10,000 × 200 = 2M chars = $0.15
- Output: 10,000 labels × 10 chars = 100K chars = $0.03
- Total: $0.18
Compare that to manual review at 15 reviews per hour. You're looking at 667 hours of analyst time. Even at $25/hour, that's $16,675 in labor costs versus eighteen cents in API costs.
The BigQuery pricing page breaks down on-demand versus reserved capacity. For Gemini API costs specifically, check Vertex AI pricing.
Four Ways to Keep Costs Down
- Batch your queries: Process 1,000 rows at once instead of making 1,000 individual API calls
- Filter before analyzing: Use WHERE clauses to reduce rows before AI.GENERATE runs
- Start with Flash: Gemini 2.0 Flash costs 75% less than Pro and works fine for most marketing use cases
- Partition your tables: Query only recent data instead of processing your entire history every time
Google's cost optimization guide covers additional strategies if you're processing massive volumes.
Troubleshooting Common Issues
"Permission denied" errors when running queries:
You forgot to grant the Vertex AI User role to your BigQuery service account. That email address shows up when you create the external connection. Copy it, go to IAM & Admin, and add the role. This is the most common setup mistake.
Queries taking 2-3 minutes to complete:
You're calling AI.GENERATE on every single row instead of batching. Process rows in groups of 100-1000:
-- SLOW: Makes 10,000 API calls
SELECT AI.GENERATE(MODEL m, headline, prompt) FROM ads;
-- FAST: Makes 10 API calls (batches of 1,000)
SELECT
batch_id,
AI.GENERATE(MODEL m, STRING_AGG(headline, ' | '), prompt)
FROM (
SELECT headline, FLOOR(ROW_NUMBER() OVER () / 1000) AS batch_id
FROM ads
)
GROUP BY batch_id;
Getting inconsistent output formats:
If AI.GENERATE returns "High" sometimes, "high" other times, and "Very High" occasionally, switch to AI.GENERATE_TABLE with explicit schemas:
SELECT * FROM AI.GENERATE_TABLE(
MODEL m,
TABLE source,
STRUCT(
'Rate urgency' AS prompt,
'urgency STRING OPTIONS (description="Must be exactly: high, medium, or low")' AS output_schema
)
);
The OPTIONS constraint forces Gemini to stick to your defined values.
When to Use BigQuery AI vs Exporting to ChatGPT
Both approaches have their place. Here's when to use each:
Use BigQuery AI when:
- Analyzing more than 1,000 rows (exporting gets tedious)
- Analysis needs to run daily or weekly on a schedule
- Data can't leave BigQuery due to compliance requirements
- Joining multiple tables before running the analysis
- Data pipelines already flow into BigQuery
Export to ChatGPT or Claude when:
- Doing one-off analysis on a small sample (under 500 rows)
- Still exploring and figuring out what questions to ask
- Team doesn't write SQL
- Testing prompt ideas before building them into automated queries
- Need to show someone a quick proof-of-concept
For most teams with established BigQuery pipelines, the AI functions make more sense. They eliminate manual export steps, process millions of rows in minutes, and keep sensitive customer data where it belongs.
Semantic Search with AI.EMBED and AI.SIMILARITY
Beyond text generation, Google also launched AI.EMBED and AI.SIMILARITY. These let you do semantic search without building custom vector databases.
Here's how to find campaigns similar to your top performer:
SELECT
campaign_name,
AI.SIMILARITY(
AI.EMBED(MODEL embed_model, campaign_description),
AI.EMBED(MODEL embed_model, 'your best campaign description')
) AS similarity
FROM campaigns
ORDER BY similarity DESC
LIMIT 10;
This searches by meaning, not just matching keywords. A campaign about "summer sale" would match "warm weather discount" even though they share zero words. Useful for finding which historical campaigns are similar to new concepts before testing. Helps predict performance before committing budget.
How to Get Started
Pick one use case from this list. Starting with ad copy classification or sentiment analysis works well. They're straightforward and show results immediately.
Run it on 100-500 rows first. Validate the AI outputs with manual spot-checks to make sure they match expectations. Then scale it up to your full dataset and set it to run on a schedule.
Most teams waste time trying to craft the perfect prompt on day one. Better approach: start with simple classifications (sentiment, basic categories), learn what Gemini handles well with your specific data, then gradually expand to more complex extraction tasks.
Always test new prompts on small samples before running them on production data. And use Gemini 2.0 Flash for testing. It costs significantly less than Pro and works fine for validating your approach.
This works great for English. For other languages, your mileage may vary. Test thoroughly with sample data before committing to production workflows.
FAQ
Do I need to know Python to use BigQuery AI functions?
No. These are standard SQL functions. If you can write SELECT * FROM table, you can use AI.GENERATE. The prompt is just a text string inside your query.
Can I analyze images or videos with these functions?
Yes. BigQuery supports object tables that reference files stored in Cloud Storage. You can analyze ad creative images, extract key moments from videos, or transcribe audio files, all through SQL queries.
How accurate are the AI responses?
Depends on your prompt quality and how much context you provide. For classification tasks with well-written prompts, accuracy typically ranges from 85-95%. Always spot-check outputs on samples before trusting large-scale results. The more specific your prompt, the better.
What's the difference between AI.GENERATE and AI.GENERATE_TABLE?
AI.GENERATE returns a STRUCT (structured data within one column). AI.GENERATE_TABLE returns a complete table with columns you define. Use TABLE when you need multiple output fields like "sentiment" AND "confidence_score" AND "category."
Does this replace the need for data analysts?
No. It automates repetitive classification and extraction tasks, but you still need analysts to write effective prompts, interpret results in business context, spot data quality issues, and design experiments based on insights. Think of it as giving analysts 10x leverage, not replacing them.
When will the managed AI functions (AI.IF, AI.SCORE, AI.CLASSIFY) be generally available?
These functions are currently in public preview as of November 2025. Google hasn't announced a specific GA date yet, but they're actively optimizing them with plans for significant performance improvements by moving more processing directly into BigQuery.
If you're still manually exporting data to BigQuery before running these analyses, Dataslayer can schedule automatic transfers from Google Ads, Facebook, LinkedIn, TikTok, and 50+ other sources directly to BigQuery. Try it free for 15 days.


.avif)




