What a real AI visibility audit looks like — inside the Citation Audit Method

Most “AI visibility audits” you can buy right now measure one thing: whether your site is built in a way that AI engines could read. That’s worth knowing. It’s also not an audit of your AI visibility. It’s a readiness check, using the word “audit” because the word sells better.

Here’s the distinction that the whole category blurs. A readiness check looks at your pages and asks, “Is the content structured, accessible, and authoritative enough for an AI crawler to parse and trust?” A citation audit asks a different and harder question. When a buyer in your category types a real commercial prompt into ChatGPT, Claude, Perplexity, or Gemini, does your name come back, or does a competitor’s?

You can pass the first and fail the second. Plenty of technically clean sites never get cited, because being readable and being chosen are not the same problem. This piece walks through what an actual citation audit measures, using the method Revenue Experts AI runs as the worked example. I’ll show the structure, the math, the five outcomes every prompt resolves into, and the three things this audit will not tell you. The limits are part of the honest answer.

We run two products, and they answer different questions. I’ll cover both, but the deep dive is the paid one, because that’s where the measurement actually happens.

New here? The Revenue Signal is my weekly read on what’s actually changing in AI search: one shift, one company that responded, and one move you can make. It goes out every Thursday, and it’s free.

The two questions, and which audit answers each

The free 60-second AI Visibility Audit answers a technical question: Is my site ready for AI engines to find, parse, and trust it? It runs an automated check on a single URL or up to 10 pages across a domain. It uses an 8-strategy fallback system to simulate how different AI crawlers actually see your pages, getting past the bot protection that often hides content from LLMs, then scores what it finds against five categories: citation readiness, content structure, authority signals, technical accessibility, and semantic clarity. You get a composite readiness score, a category breakdown, and a prioritized fix list. It takes a minute and costs nothing.

What it does not do is run a single prompt through a single AI engine. It never asks, “Are you actually cited?” It tells you whether the door is open, not whether anyone walked through it.

The $497 AI Visibility Audit answers the question the free tool can’t: in my real category, against my real competitors, am I being cited, and if not, why not? It’s the Citation Audit Method run end-to-end on your business. That’s the rest of this article.

If you haven’t checked your readiness yet, start with the free audit. If you want the citation map across your category, the paid audit is the one. They stack. They don’t replace each other.

The anatomy of the $497 audit: 50 × 4 × 3 = 600

Here’s the math that defines the engagement.

Fifty prompts. Four LLMs. Three runs each. Six hundred measured calls.

Each number is doing real work, so I’ll take them one at a time.

Fifty prompts, built for your niche. This is the part most audits skip. The prompts are not a generic template that gets reused across every client. A B2B fintech RAG company and a B2B AEO consultancy get different prompt sets, because their buyers ask different questions and their competitors are different companies. We build the 50 around your category, your named competitors, and the way your buyers actually research. A generic prompt set tells you how a generic company performs. You are not a generic company.

The 50 span the full buyer journey across three intent types:

Research intent. Early-stage questions like “What is RAG and why would a B2B company need it?” The buyer is learning. They don’t know vendors yet.
Comparison intent. Mid-stage prompts like “[Competitor] alternatives for B2B AI visibility. ” The buyer is building a shortlist. Your absence here is expensive.
Decision intent. Late-stage prompts like “Is fractional AI search advisory worth it for a Series B startup? ” The buyer is close to a choice.

A prompt set that’s all research intent flatters you, because those answers rarely name vendors anyway. A set that’s all decision intent misses where most buyers actually start. The spread across all three is what makes the result a map instead of a snapshot.

Four LLMs, because they don’t agree. ChatGPT, Claude, Perplexity, and Gemini pull from different indexes, weight sources differently, and cite differently. Being cited on Perplexity tells you almost nothing about whether you’re cited on Gemini. A single-model audit gives you a number that feels like coverage and isn’t. Running every prompt across all four is the only way to see where you win, where you lose, and where the engines disagree about your category.

Three runs each, because these systems are not deterministic. Ask the same model the same question three times, and you can get three different answers: different sources are cited, different vendors are named, sometimes you’re in, and sometimes you’re out. One run is an anecdote. Three runs per prompt per engine let us see whether a citation is stable or a coin flip. A result that shows up once in three is a fragile citation. A result that shows up as three for three is a position you own. The audit reports the difference because they mean different things for what you do next.

Fifty prompts, four engines, three runs: 600 calls. That’s the measured surface of one audit. Turnaround is 5 to 7 days.

A note on that 600 number, because it matters for how you read our other work. Six hundred calls is also the size of our research-grade verification runs, the independent studies where we test citation-overlap claims across engines and publish the protocol before we run anything. The commercial audit and the research study are different things. The audit maps your category for you. The research study tests claims about how AI search behaves in general. Same call volume, different purpose. I’m flagging it so you don’t read a cross-reference later and think they’re the same project.

The five outcomes: every prompt resolves into one of these

When a prompt runs across four engines three times, the result for that prompt lands in one of five buckets. This is the core of the deliverable, because it turns “Are we visible?” into something you can act on, prompt by prompt.

You’re cited. Your brand appears in the answer for that prompt. “Stable across runs” means you own it. “Intermittent” means you’re contested.
Competitor-only. The answer names vendors, but not you, only competitors. This is the most actionable bucket. The engine is willing to cite someone for this prompt; it just isn’t you. That’s a content and authority gap, not a “nobody gets cited here” problem.
Open territory. The engine answers without naming any vendor. Nobody owns this prompt yet. For early-stage research intent, this is common, and it’s an opportunity: the first brand to publish genuinely useful, citable content on the question can claim it.
Nobody cited or closed. The engine refuses to recommend or name vendors for this prompt at all, sometimes for compliance reasons, sometimes because the question doesn’t lend itself to citation. Don’t waste effort here.
Mixed/unstable. Across the three runs, the outcome flips: cited once, competitor-only twice, for example. This bucket exists precisely because we run three times. It tells you a citation is fragile and worth shoring up before a competitor takes it.

The reason five buckets beat a single score: “competitor-only” and “open territory” demand opposite responses. One means you’re losing a fight that’s being had; the other means there’s no fight yet, and you can walk in. A composite percentage hides that. The bucket’s surface is it.

From outcome to action: the gap-to-action matrix

A list of outcomes isn’t a plan. The audit pairs every gap with a diagnosis and a prescription, ranked by expected impact, so you know what to do first.

For each prompt where you’re not cited, the report answers two questions. Why aren’t you cited? And what specifically would change that?

The “why” usually comes down to one of a handful of causes: the relevant page doesn’t exist, the page exists but isn’t structured for AI parsing, the topical relevance is too thin, the schema is missing or wrong, or the content is stale and recency-sensitive engines like Perplexity have moved on. The “what to do” is specific to the cause, and it’s ranked, because a fix that moves you on a high-traffic comparison prompt matters more than one that moves you on a fringe research query.

The shape of the prioritization looks like this:

Gap type	Typical priority	Effort	Expected outcome
Competitor-only on a decision-intent prompt	High	Medium	New or restructured page targeting the prompt directly
Competitor-only on a comparison prompt	High	Medium	Comparison content + authority signals
Open territory on a research prompt	Medium	Low–Medium	Publish citable answer content; claim it first
Unstable citation (mixed across runs)	Medium	Low	Shore up an existing page; small structural fixes
Nobody cited; closed prompt	Skip	n/a	No action; the engine won’t cite anyone here

The point of the matrix is sequencing. You can’t fix 50 prompts at once. The audit tells you which five to fix this month.

What the deliverable actually looks like

The report is built around per-prompt details, not dashboard numbers. For each of the 50 prompts, you get the prompt itself, its intent type, the outcome across all four engines and three runs, which competitors surfaced where, the diagnosis for any gap, and the ranked prescription.

On top of that sit two summary views. The first is the citation rate, how often you surface in your own category versus how often each competitor does. The second is the competitive map, showing which competitors win which prompts on which engines. The competitive map is usually the part clients sit with the longest, because it’s the first time they see, prompt by prompt, who the AI engines actually consider the authority in their space. Sometimes it’s the competitor they expected. Often it isn’t.

A worked example: reading a competitive map

Abstract descriptions of the deliverable only get you so far, so here’s how the map reads in practice. The numbers below are illustrative, not from a specific client. They show the shape of the finding, not a real engagement.

Picture a B2B AEO consultancy that runs the audit. Across 50 prompts, the citation-rate summary comes back something like this: cited on 9 prompts, competitor-only on 22, open territory on 13, closed on 4, and unstable on 2. The headline reaction is usually disappointment at the 9. The useful reaction is to look at the 22.

Competitor-only on 22 prompts means the engines are willing to name a vendor on those questions, just not this one. That’s not an “AI doesn’t cite anyone in our space” problem. It’s a “someone else owns these answers” problem, and that’s fixable. Drilling into those 22, the per-prompt view shows which competitor surfaces where. Maybe one competitor owns the comparison prompts on ChatGPT and Perplexity but is absent on Gemini. Maybe a second competitor owns the decision-intent prompts everywhere. The pattern tells you who to study and where the soft spots are.

The 13 open-territory prompts are a quiet opportunity. Nobody is cited on them yet. They’re mostly research-intent questions where the engines answer without naming vendors today, but that behavior shifts as the engines get more comfortable citing sources, and the brand that has published the clearest, most citable answer when that shift happens tends to claim the slot. Open territory is where you plant flags cheaply.

The 4 closed prompts you ignore. The 2 unstable ones you note and recheck. That’s how a 50-prompt audit becomes a five-item plan: fix the highest-impact competitor-only prompts first, plant content on the best open-territory questions second, and leave the rest.

Why recency changes the answer engine to engine

One reason a single-engine audit misleads: the engines’ weight freshness varies very differently, so the same content can be cited on one and ignored on another.

Perplexity leans hard on recent, dated, source-linked material. Stale content drops out of its answers fast, which is why a page that was cited six months ago can vanish without you changing anything. Gemini and ChatGPT behave differently again, pulling on their own indexes and update cycles. Claude blends trained knowledge with search depending on the query. The practical consequence: a recency-sensitive engine rewards you for keeping content current and punishes you for letting it age, while a less recency-sensitive engine may keep citing an older page. The audit captures this by running every prompt across all four, so a “you’re cited” result comes with the context of where, and a competitor-only result on the recency-sensitive engine often points to a stale-content fix rather than a structural one.

How to read your own results

When you get the report, resist the urge to fixate on the composite citation rate. The number is real, but it’s the least actionable thing in the document. Three habits make the audit pay off.

Read the competitor-only bucket first. It’s the most actionable: the engines have already decided to cite vendors there, so the gap is yours to close, not the category’s to create.

Sort the gaps by intent, not by engine. A competitor-only result on a decision-intent prompt is worth more than three open-territory wins on research prompts, because decision-intent prompts sit closest to a buying choice. The matrix ranks for this, but it’s worth internalizing why.

Treat the report as a baseline, not a verdict. The honest use of a citation audit is the first measurement in a series. Run it, fix the top gaps, re-run it, see what moved. A single audit read as a permanent score will mislead you, because the systems it measures don’t hold still.

Three things this audit will not tell you

The honest framing matters more than the pitch, so here are the limits.

It won’t tell you your exact future traffic numbers. The audit measures citations, whether and where you’re named. It does not predict how many clicks or how much pipeline a given citation produces. Citation is the necessary condition for AI-sourced traffic, not a guarantee of volume. Anyone selling you a traffic forecast off a citation audit is selling a model they can’t validate.

It won’t tell you you’ll be cited next month if you do the work. These engines change. Indexes update, models get replaced, and ranking behavior shifts. The audit is a measurement at a point in time with a method you can re-run to track change. It is not a promise that a fix locks in a citation permanently. We retest for exactly this reason; a one-time audit treated as a permanent verdict will mislead you.

It won’t tell you the engines’ internal ranking logic. We measure observable behavior, what gets cited across runs and across engines, and we diagnose the likely cause from page structure, topical relevance, schema, and recency. We do not have access to how ChatGPT or Gemini rank sources internally, and neither does anyone selling you an audit. What we give you is a well-evidenced diagnosis, not a leaked algorithm. Be skeptical of anyone who claims the latter.

Naming the limits is the point. An audit that promises certainty about systems that are non-deterministic and changing is promising something it can’t deliver. The value here is a measured, repeatable map of where you stand right now and a ranked plan for the gaps you can actually close.

Where to start

If you’ve never checked whether AI engines can even read your site, run the free 60-second AI Visibility Audit first. It’s the readiness layer.

If you already know your site is technically sound and you want the actual citation map that surfaces your category, competitors, or nobody across all four engines, the $497 AI Visibility Audit is the one. Five to seven days, 600 measured calls, a per-prompt diagnosis, and a ranked fix list.

For the method behind both, the Citation Audit Method pillar lays out the full framework.

If you want this kind of breakdown in your inbox every week, subscribe to The Revenue Signal. Each Thursday, I take one shift in AI search, one company that responded to it, and one concrete move you can make. No theory padding, no recap of things you already know. Free, weekly, and built for B2B operators who’d rather measure than guess.