
Kurt FischmanFounder, Marshal
Kurt is the CEO of Marshal, a Managed AI Ops service built for small businesses. That means AI agents doing the work, leads coming from answer engines, and a team that keeps your business running at full speed.

ChatGPT does not rank pages. It retrieves, evaluates, and cites sources through a multi-stage pipeline that bears almost no resemblance to Google's link graph. This article documents the retrieval mechanics that determine which brands get cited, maps the specific content and authority signals that predict citation, and provides an operational framework grounded in published data from AirOps, Search Engine Journal, and Search Engine Land.
The phrase "rank on ChatGPT" is already misleading, and we should acknowledge that upfront. ChatGPT does not maintain a ranked index. It runs a retrieval-augmented generation pipeline that works nothing like Google's crawl-index-rank model. Understanding the pipeline is the prerequisite for every tactical decision that follows.
When a user submits a prompt, ChatGPT first decides whether web search is needed. Commercial intent prompts trigger search 53.5% of the time; informational queries trigger it only 18.7%. If search activates, the model decomposes the original prompt into multiple sub-queries through a mechanism called fan-out. The AirOps study of 15,000 prompts found that 89.6% generated two or more fan-out queries, expanding the total query set to 43,233. This matters because 32.9% of cited pages appeared only in fan-out results. If your content does not match the reformulated sub-queries the model generates internally, you are invisible regardless of how well you match the original prompt.
ChatGPT then retrieves candidate pages, currently pulling from its own OAI-SearchBot index layered on top of Bing's infrastructure. Of those candidates, the model reads, evaluates, and selects a small subset for citation. The AirOps data is brutal here: of 548,534 retrieved pages, only 15% made it into the final answer. The selection filter weighs title-to-query alignment, content position (answers near the top win), readability, and authority signals. This is not a ranking. It is a multi-stage elimination tournament where 85% of contestants get cut before the audience sees anything.
Search Engine Journal published the most comprehensive factor analysis to date, and the results should force a rethink of how marketing teams allocate resources. Referring domains emerged as the single strongest predictor of citation likelihood. Sites with up to 2,500 referring domains averaged 1.6 to 1.8 citations. Sites with over 350,000 referring domains averaged 8.4. A hard threshold effect kicks in at roughly 32,000 referring domains, where citation probability jumps 3.5x compared to sites below 200.
Content freshness produced the second clearest signal. Pages updated within three months averaged 6 citations. Stale content averaged 3.6. This is not Google's gentle decay curve. ChatGPT penalizes outdated content more aggressively because its retrieval system is designed to answer questions as if speaking in the present tense.
Content structure matters in ways that are geometrically precise. Pages with section lengths between 120 and 180 words between headings averaged 4.6 citations. Articles under 800 words averaged 3.2 citations; those over 2,900 words averaged 5.1. Pages with expert quotes averaged 4.1 citations versus 2.4 without. Content with 19 or more statistical data points averaged 5.4 citations compared to 2.8 for data-sparse pages. The model is not rewarding length. It is rewarding information density packaged in extractable chunks.
The positional bias is perhaps the most actionable finding: 44.2% of citations come from the first 30% of content. Search Engine Land describes this as a "ski ramp" pattern. Long preambles, extended throat-clearing introductions, and buried answers reduce citation probability regardless of content quality. The model reads top-down and cites what it finds first.
| Citation Factor | Low-Signal Benchmark | High-Signal Benchmark | Citation Lift |
|---|---|---|---|
| Referring domains | Under 200 domains: ~1.6 avg citations | Over 32K domains: 3.5x citation rate | 3.5x at threshold |
| Content freshness | Stale pages: 3.6 avg citations | Updated within 3 months: 6 avg citations | 1.67x |
| Content length | Under 800 words: 3.2 avg citations | Over 2,900 words: 5.1 avg citations | 1.6x |
| Expert quotes | No quotes: 2.4 avg citations | With expert quotes: 4.1 avg citations | 1.7x |
| Statistical density | Minimal data: 2.8 avg citations | 19+ data points: 5.4 avg citations | 1.93x |
| Section length | Under 50 words per section | 120-180 words per section: 4.6 avg citations | 70% more citations |
Here is the statistic that should end every boardroom argument about whether "SEO covers us for AI search": only 12% of URLs cited by ChatGPT rank in Google's top ten. Let that sink in. Eighty-eight percent of ChatGPT's cited sources are pages that would not impress a traditional SEO dashboard. The overlap between the two systems is not just weak. It is nearly random.
The mechanical reason is straightforward. Google's algorithm rewards backlink profiles, click-through rates, and on-page keyword optimization. ChatGPT's citation filter rewards title-to-query alignment, content extractability, entity density, and freshness. These are different optimization surfaces. A page can dominate Google for a head term and be completely invisible to ChatGPT because it buries the answer below 800 words of preamble, lacks statistical claims, or has not been updated in six months.
The inverse is equally true. We have observed pages with modest Google rankings earning consistent ChatGPT citations because they are structured as direct, data-dense answers with clear entity grounding and recent timestamps. ChatGPT's retrieval pipeline does not care about your PageRank. It cares about whether your content can be surgically extracted to answer a specific sub-query. The brands still treating SEO performance as a proxy for AI search coverage are operating on inherited assumptions that the data has already disproven.
Retrieval systems need to resolve your brand as a discrete entity before they can cite it with confidence. This is where most companies fail without realizing it. Wikipedia remains the single most cited domain by ChatGPT, and brands with Wikipedia articles are significantly more likely to appear in AI-generated answers. Stanford research shows LLMs achieve 96% useful responses when combined with Wikidata parsing, compared to frequent errors without it. The entity layer is not optional infrastructure. It is the foundation that every other optimization builds on.
The mechanism works through entity resolution. When ChatGPT encounters your brand name during retrieval, it needs to determine whether "Acme" means Acme Corporation the SaaS company, Acme the cartoon explosives manufacturer, or acme the English word meaning pinnacle. Structured data through Schema.org markup, a Wikidata item with cross-linked identifiers, and consistent naming across platforms collapse that ambiguity. Without entity resolution, your brand is a string of characters the model cannot confidently attribute. With it, you become a canonical node in the knowledge graph that retrieval systems can match to queries deterministically.
For brands that do not yet meet Wikipedia's notability criteria, Wikidata offers a lower-barrier entry point. Unlike Wikipedia's strict editorial standards, Wikidata accepts any entity with verifiable public references. A properly structured Wikidata item with sameAs links, industry classification, and founding metadata gives the model enough disambiguation signal to resolve your identity during retrieval. Our data consistently shows that brands with resolved entity identities outperform those relying on raw web mentions, even when the latter have higher domain authority.
Knowing the citation predictors is one thing. Engineering content that consistently passes the filter requires a structural methodology. The research points to a specific content architecture that optimizes for how ChatGPT reads and selects sources.
Front-load the answer. The 44.2% positional bias means the first 30% of your content does the majority of the citation work. Open every page with the direct answer to the primary query. No context-setting preambles, no "in today's rapidly evolving landscape" filler. State the claim, provide the evidence, define the mechanism. ChatGPT's retrieval system reads top-down and moves on. If your answer lives in paragraph seven, it will never make the cut.
Structure sections at 120 to 180 words. This range hits the sweet spot for embedding precision and citation extraction. Shorter sections lack enough semantic context for the model to evaluate relevance. Longer sections force the model to parse multiple ideas from a single chunk, diluting the match signal. Each section should address one discrete question with its own subject, evidence, and scope boundary.
Use headings that mirror natural language queries. 78.4% of citations tied to questions came from content under headings that functioned as queries themselves. The model treats H2s as prompts and the following paragraph as the answer. Write headings as the questions your buyers actually ask, not as clever marketing copy.
Load the page with specific data. Pages with 19 or more statistical data points averaged nearly double the citations of data-sparse pages. Cited content averaged 20.6% proper nouns compared to 5 to 8% in typical English text. The model is looking for concrete, extractable claims it can synthesize into an answer with confidence. Vague assertions and qualitative hand-waving do not survive the citation filter.
Here is the number that should reframe every content strategy conversation: 82.9% of ChatGPT citations come from third-party sources. Only 17.1% point to a brand's own domain. You can optimize your website into a monument of structured data and front-loaded answers, and ChatGPT will still prefer to cite the industry publication, the comparison review, or the Reddit thread that mentions you.
This is not a bug. It is a feature of how trust propagation works in retrieval systems. ChatGPT's citation filter favors sources the model perceives as editorially independent. A brand saying "we are the best" carries less citation weight than a journalist or analyst saying "they are the best." The implication is that content marketing on your own domain is necessary but insufficient. The brands that consistently rank on ChatGPT are the ones earning mentions across the publications, forums, and knowledge bases that the model actually trusts.
The tactical priority is mention distribution: getting your brand named, with positive sentiment and specific claims, in the third-party sources ChatGPT already cites at high rates. Wikipedia, Reddit, industry journals, and established review platforms dominate the citation landscape. Semrush's three-month study of the most-cited domains confirms this pattern. Your own blog is a supporting actor, not the lead. The brands treating owned content as the center of their AI search strategy are optimizing the 17.1% while ignoring the 82.9%.
Ranking on ChatGPT connects retrieval mechanics, authority signals, content structure, entity infrastructure, and third-party distribution through a pipeline where each stage filters out sources that fail to meet specific thresholds. The relationships below map how the core concepts interact.
Fan-Out Query Expansiondecomposes > user prompts into multiple sub-queries that broaden the retrieval surface far beyond the original search termsdetermines > which content gets retrieved, since 32.9% of cited pages appear only in fan-out resultsrequires > content that answers reformulated questions, not just the exact query a user typesRetrieval Pipelinepulls from > OAI-SearchBot's index layered on Bing infrastructure, evaluating hundreds of thousands of candidate pagesfilters > 85% of retrieved pages before citation, selecting only 15% for inclusion in the final answerweighted by > title-to-query alignment, content position, readability, and authority signalsReferring Domain Authorityfunctions as > the single strongest predictor of citation likelihood in the retrieval pipelinefollows > a threshold curve where 32K+ referring domains trigger 3.5x citation probabilitydistinct from > Google's backlink model, which rewards link quality and anchor text rather than raw domain breadthContent Freshnessproduces > measurable citation lift, with recently updated pages earning 67% more citations than stale contentpenalized > more aggressively by ChatGPT than by Google's decay algorithmrequires > systematic update cadences rather than publish-and-forget workflowsContent Position Biasconcentrates > 44.2% of citations in the first 30% of page contentrewards > front-loaded answers and penalizes buried insights regardless of content qualitydemands > structural discipline where the answer precedes the explanationEntity Resolutionenables > the model to identify a brand as a discrete, citable entity rather than an ambiguous text stringbuilt through > Schema.org markup, Wikidata items, and consistent naming across platformsamplified by > Wikipedia presence, which remains the single most cited domain by ChatGPTThird-Party Citation Gravityaccounts for > 82.9% of all ChatGPT citations, dwarfing first-party domain citations at 17.1%driven by > editorially independent sources the model perceives as trustworthyrequires > mention distribution strategy across publications, forums, and knowledge basesSection-Level Chunk Architectureoptimizes for > 120 to 180 word sections that hit the embedding precision sweet spot for retrievaluses > headings as natural language queries that the model treats as promptsloads > specific data points and proper nouns to maximize extractability and citation confidence
ChatGPT runs a retrieval-augmented generation pipeline that decomposes user prompts into multiple sub-queries (fan-out), retrieves candidate pages from its index, and then filters roughly 85% of those pages before selecting sources for citation. The citation filter weighs title-to-query alignment, content position (answers near the top of the page win), referring domain authority, content freshness, readability, and information density. Only about 15% of retrieved pages survive this filter.
Referring domains are the single strongest predictor of citation likelihood according to the Search Engine Journal factor analysis. Sites crossing the 32,000 referring domain threshold are 3.5x more likely to be cited than sites with fewer than 200 referring domains. However, referring domain count alone is insufficient. Content must also be fresh (updated within three months), structurally optimized (front-loaded answers, 120 to 180 word sections), and entity-grounded through structured data and consistent naming.
The correlation is far weaker than most marketers assume. Only 12% of URLs cited by ChatGPT rank in Google's top ten organic results. ChatGPT's citation filter evaluates content extractability, entity density, freshness, and title-to-query alignment, which are largely distinct from Google's ranking signals of backlink profiles, click-through rates, and keyword optimization. Strong Google rankings are neither necessary nor sufficient for ChatGPT citation.
Research shows 44.2% of ChatGPT citations come from the first 30% of a page's content, following a "ski ramp" distribution pattern. The model reads content top-down during retrieval and evaluation. When the answer to a query appears in the opening paragraphs, the model can extract and cite it efficiently. Answers buried below extended introductions or contextual preambles are significantly less likely to be cited, regardless of their quality.
Entity resolution is a structural prerequisite for consistent citation. Without it, the model cannot disambiguate your brand from other entities with similar names, which reduces citation confidence. Wikipedia remains ChatGPT's single most cited domain, and Stanford research shows LLMs achieve 96% useful responses when combined with Wikidata parsing. Brands should maintain Schema.org markup, Wikidata items with cross-linked identifiers, and consistent naming across all platforms to establish themselves as canonical, machine-resolvable entities.
Third-party sources account for 82.9% of all ChatGPT citations. Only 17.1% of citations point to a brand's own domain. The model's citation filter favors editorially independent sources that it perceives as trustworthy. This means content marketing on your own website is necessary but insufficient. Earning mentions with positive sentiment and specific claims in industry publications, review platforms, forums like Reddit, and knowledge bases like Wikipedia is the primary driver of ChatGPT visibility.
This article reflects conditions as of March 2026. Reassess quarterly.
Kurt Fischman is the CEO and founder of Growth Marshal, an AI-native search agency that helps challenger brands get recommended by large language models. Read some of Kurt's most recent research here.
Drive more awareness in answer engines. Transfer more work to machines. Build the operating structure that will keep you ahead of whatever comes next.