Can I use this research for my own content?

Yes. All sources listed on this page are publicly available. We encourage brands and practitioners to read the original research and form their own conclusions. If you reference this page, a link to ai-seeyou.com/research is appreciated.

Why do you rate evidence quality?

The field is new and much of the published analysis comes from vendors with commercial interests. Rating evidence quality helps our clients (and ourselves) distinguish between findings grounded in rigorous methodology and claims that sound compelling but lack verification. We apply the same standard to our own findings.

Last updated 13 May 2026

RESEARCH LIBRARY

The research behind AI Recommendation Infrastructure.

AI See You's methodology is built on published research, official platform documentation, and practitioner analysis from the leading experts in the field. Every source on this page has been evaluated for evidence quality and directly informs how we set up AI Knowledge Centres. We do not use tactics that lack credible evidence, regardless of how widely they are promoted.

By David Willey, Founder and CEO. Updated monthly as new research is published.

HOW WE RATE EVIDENCE

Not all research is equal.

Vendor-funded studies, self-reported data, and reverse-engineered analyses are common in this field. We rate every source using three tiers so our clients (and we ourselves) can distinguish between findings grounded in rigorous methodology and claims that sound compelling but lack verification.

HIGH

Peer-reviewed research published at recognised academic conferences (KDD, NeurIPS, EMNLP), official documentation from AI platform operators (OpenAI, Google, Anthropic), or data from senior industry practitioners with access to large verifiable datasets.

MEDIUM

Analytical research with credible methodology but not peer-reviewed. Includes reverse-engineered studies of AI citation patterns, large-scale content audits, and practitioner reports from recognised industry sources. Useful for informing strategy. Not sufficient for definitive claims.

SPECULATIVE

Logical hypotheses consistent with known AI architecture but lacking empirical validation. Single-source claims from vendors without independent corroboration. Included for completeness where the underlying logic is sound, but not used as a basis for client recommendations.

ACADEMIC RESEARCH

Peer-reviewed academic research.

Princeton / IIT Delhi Generative Engine Optimisation Study

HIGH

Authors: Pranjal Aggarwal et al.

Published: KDD 2024 (30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

Paper: arxiv.org/abs/2311.09735

The landmark study in the GEO field. Tested nine optimisation tactics across 10,000 search queries using a controlled experimental framework.

Key findings:

Citing credible sources produced a 115.1% visibility increase for lower-ranked sites. The single strongest tactic tested.
Adding specific statistics delivered up to a 40% visibility improvement.
Adding expert quotations improved visibility by 22 to 37%.
Keyword stuffing performed worse than baseline.
Lower-ranked sites benefited more than already-dominant ones. This is the structural finding that makes GEO especially relevant for challenger brands.

How we apply it: every AI Knowledge Centre we set up implements citation density, statistical frequency, expert quotation, and answer-first structure as core requirements.

RAG architecture research (multiple papers)

HIGH

Sources: Multiple papers from NeurIPS and EMNLP proceedings on Retrieval-Augmented Generation

Field: AI retrieval architecture

RAG (Retrieval-Augmented Generation) is the fundamental architecture behind how ChatGPT, Perplexity, and Google AI Overviews retrieve and synthesise information. Understanding RAG mechanics is essential for building content AI systems can reliably extract.

RAG systems use cosine similarity in vector space to match content to queries. Content that closely mirrors how users phrase their questions to AI is more likely to be retrieved. AI systems break pages into individual passages for retrieval and re-ranking, which means each section of a page must stand alone as a self-contained answer. Typically, LLMs pull from 5 to 16 sources per answer. Different models favour different source ecosystems: Perplexity leans heavily on community platforms, while Gemini relies on them far less.

How we apply it: Knowledge Centre content is structured to semantically match the natural language patterns consumers use when asking AI systems for recommendations. Each section is modular and extractable.

PLATFORM DOCUMENTATION

Official platform documentation.

OpenAI ChatGPT Search documentation

HIGH

Source: OpenAI

URL: help.openai.com/en/articles/9237897-chatgpt-search

OpenAI's official documentation confirms that ChatGPT Search uses specific crawlers (OAI-SearchBot, GPTBot) to discover content, and rewrites user prompts into search queries sent to Bing. This makes Bing indexing a prerequisite for ChatGPT citation. OpenAI states: "Ranking in ChatGPT Search is based on a number of factors designed to help users find reliable, relevant information. There is no way to guarantee top placement." 80% of top publishers block AI crawlers, creating a significant opportunity for brands that allow crawl access.

How we apply it: every AI Knowledge Centre is indexed by Bing Webmaster Tools. All AI crawlers are explicitly allowed in robots.txt. No AI crawler is blocked.

Google Search Central structured data documentation

HIGH

Source: Google

URL: developers.google.com/search/docs/appearance/structured-data

Google's documentation recommends putting Product markup in initial HTML rather than relying solely on JavaScript rendering. Schema markup (Product, Organisation, FAQ, HowTo, Review, Article) increases the probability of inclusion in AI Overviews.

How we apply it: every AI Knowledge Centre page includes JSON-LD structured data in pre-rendered HTML that AI crawlers receive directly, without requiring JavaScript execution.

PRACTITIONER RESEARCH

Industry practitioner research.

Lily Ray, VP SEO, Amsive (March 2026)

HIGH

Lily Ray identified the foundational constraint: 100% of sites that lost Google organic traffic also lost AI citations across ChatGPT, AI Mode, and Gemini. Strong organic SEO remains a prerequisite for AI recommendation.

Ray has also flagged the following tactics as dangerous: scaled AI content production (triggers Google's "Scaled Content Abuse" spam policy), Reddit and forum astroturfing (detectable and carries brand risk), and self-promotional listicles.

How we apply it: AI Knowledge Centres are built on genuine human expertise and verified research. We do not use AI-generated content at scale, forum astroturfing, or self-promotional listicles.

Ahrefs AI citation analysis

MEDIUM-HIGH

Ahrefs analysis found that Google AI Overviews pull 74% of citations from top-10 organic results. This confirms that organic search performance feeds directly into AI recommendation probability.

How we apply it: AI Knowledge Centres are designed to rank organically as well as serve AI systems, because organic ranking is a primary input to AI recommendation.

ZipTie.dev ChatGPT citation pattern analysis

MEDIUM

ZipTie.dev analysed ChatGPT citation patterns and found:

Sites with Trustpilot presence average 4.6 to 6.3 citations from ChatGPT.
Domain trust scores of 97 to 100 average 8.4 ChatGPT citations versus 1.6 for scores below 43 (a 5.25x gap).
Content updated within 30 days receives 3.2x more ChatGPT citations than stale content.

How we apply it: AI Knowledge Centres strengthen cross-platform brand signals and maintain content freshness with visible update timestamps on every page.

Growth Memo ChatGPT citation positioning analysis

MEDIUM

Growth Memo analysis found that 44% of ChatGPT citations come from the first third of a page's content.

How we apply it: every AI Knowledge Centre page puts the direct answer in the first 40 to 60 words.

First Page Sage algorithm research

MEDIUM-HIGH

First Page Sage research found that being featured in independent "best [category]" articles is more valuable for AI recommendation than any single on-site optimisation.

How we apply it: AI Knowledge Centres include comparison and "best of" content that positions the brand within broader category discussions, rather than only self-promotional product pages.

Brandlight GEO research (2026)

MEDIUM

Brandlight analysis suggests the overlap between top Google links and AI-cited sources has dropped from 70% to below 20%. Organic SEO dominance no longer guarantees AI recommendation, and the playing field for challenger brands is more open than it has been in a decade.

How we apply it: our methodology treats AI recommendation as a distinct channel from organic SEO, with its own requirements for structure, citation, and schema, while maintaining the organic SEO foundations that remain a prerequisite.

MARKET CONTEXT

Market data and category context.

HIGH (Gartner)

Gartner search volume forecast

Traditional search engine volume predicted to drop 25% by end of 2026, with AI chatbots absorbing that share. Organic search traffic predicted to decrease by 50% or more in the extended forecast.

MEDIUM-HIGH

Previsible 2025 AI traffic report

AI-referred web sessions grew 527% year-over-year between January and May 2025.

HIGH (McKinsey)

McKinsey 2025 consumer AI usage

44% of consumers now use AI as a primary source of information for purchasing decisions.

MEDIUM-HIGH

eMarketer / Reddit citation analysis

Reddit accounts for 21% of Google AI Overview citations.

HIGH (Motoko Hunt, Gianluca Fiorelli)

Search Engine Land multi-market GEO guide (April 2026)

Common Crawl uses domain structure to classify content geographically. ccTLDs are classified at the data-ingestion stage. At least 20% of content on a local page must be unique to prevent AI systems from collapsing local identity. How we apply it: multi-market Knowledge Centres use market-specific content, regulatory references, currency, and pricing to maintain distinct geographic identity per market.

NOT IN OUR METHODOLOGY

Tactics we have evaluated and rejected.

We actively track tactics promoted in the field and evaluate them against the evidence base. The following have been assessed and are not used in our methodology.

"Triple JSON-LD schema stacking." Single-source claim from GenOptima. No independent validation.
Press releases generating AI citations within 14 to 21 days. GenOptima internal claim. Not externally verified.
"74.2% of AI citations come from Top N content." Self-reported vendor data with no external verification.
Specific "freshness cycle" timings (7 to 14 day update windows). No rigorous evidence for exact timing.
Keyword stuffing or density targets. The Princeton study showed this performs worse than baseline.
Scaled AI content production. Lily Ray (VP SEO, Amsive) warns this triggers Google's spam policies.

We include this section because transparency about what we reject is as important as transparency about what we use. AI Recommendation Infrastructure is built on evidence, not on vendor hype.

Frequently asked questions

We review and update this page monthly. New research is added as it is published and evaluated. Sources that are superseded or contradicted by newer evidence are noted and revised. The "Last updated" date at the top of the page reflects the most recent review.

Related reading: how AI Recommendation Infrastructure works, the Insights library, the featured article on why challenger brands win problems, not categories, and our pricing.

See how we apply this research.

Read how AI Recommendation Infrastructure works in practice, or book a 15-minute call to see your baseline Recommendation Score.

Book a Call How It Works