Running Citation Audits

Master a structured framework for identifying, categorising and auditing brand citations across major Large Language Models to benchmark visibility and site-source attribution.

12 min read
Foundations

Introduction

In the era of Generative Engine Optimisation (GEO), a brand's visibility is no longer measured solely by blue links. Instead, the metric of success is the frequency and quality of 'citations'—the references AI models provide to validate their claims. Running a citation audit is the process of systematically cataloguing where LLMs (Large Language Models) like ChatGPT, Claude, and Gemini are sourcing their information and how often your brand (or your client’s brand) is being featured.

This lesson provides a repeatable, data-led methodology for capturing these citations at scale. Unlike traditional SEO audits that rely on crawler data, citation audits require a mix of prompt engineering, sentiment analysis, and source attribution tracking to understand why a model chooses one source over another.

The Citation Audit Framework

A professional citation audit consists of four distinct phases: Discovery, Extraction, Categorisation, and Gap Analysis. By following this sequence, practitioners can move from anecdotal 'spot checks' to a comprehensive visibility profile.

1. The Discovery Phase: Defining the Prompt Set

You cannot audit 'everything'. You must define a representative set of prompts across the user journey. We categorise these into three buckets:

  • Brand/Navigational: "What is [Brand Name]?" or "Who founded [Brand Name]?"
  • Category/Commercial: "What are the best CRM tools for small businesses in the UK?"
  • Informational/How-to: "How do I calculate VAT for a remote workforce?"

For a standard audit, aim for a sample size of 50 to 100 prompts per target model to ensure statistical relevance.

2. The Extraction Phase: Capturing the Reference

When an AI generates a response, citations appear in various forms: inline footnotes, 'Sources' lists at the bottom, or embedded hyperlinks.

  • Direct Citations: Explicit links to a URL.
  • Implicit References: Mentioning a brand name without a link (this is still valuable for brand salience).
  • Attribution Weight: Does the AI quote your data directly, or merely list you in a 'Top 10' list?

3. The Categorisation Phase: Source Mapping

Once citations are collected, map them by source type. This helps identify which channels the AI trusts for your niche:

  • Owned Media: Your official website and blog.
  • Earned Media: PR pieces, news sites, and guest posts.
  • Community/Social: Reddit threads, Quora, and niche forums.
  • Aggregators: Review sites like G2, Trustpilot, or Capterra.

Worked Example: Auditing an Enterprise SaaS Brand

Let’s assume we are auditing 'FinTrack', a fictional expense management software.

Step 1: Prompting Specifically on Perplexity and ChatGPT, we run the prompt: "Compare the top 5 expense management tools for UK mid-market firms."

Step 2: Observation ChatGPT lists FinTrack as #3. It provides a footnote link. However, the link does not go to FinTrack.com. It goes to a 'TechRadar' review from 2022.

Step 3: Analysis The citation audit reveals that the 'Visibility Source' is not the brand site, but a third-party review. The takeaway: To improve this citation, FinTrack needs to update its profile on TechRadar or provide more authoritative, structured data on its own 'Compare' pages to encourage the AI to source from the official site.

Technical Considerations for Scaling

Performing this manually is time-consuming. To scale, use the following approach:

  1. API Integration: Use the OpenAI or Anthropic APIs to run your prompt list through a script.
  2. Web Scraping (Perplexity/Gemini): Since these models are connected to the live web, use tools like Browse.ai or custom Python scripts to scrape the 'Sources' section of the UI.
  3. Sentiment Tagging: Use a secondary AI layer to tag if the citation is positive, neutral, or negative.

Identifying Citation 'Leakage'

Citation leakage occurs when an AI discusses your product but attributes the information to a competitor or an outdated third-party source. During your audit, flag any instance where your brand is mentioned but the citation link points elsewhere. This is the 'Citation Gap'. Reducing this gap is the primary goal of an AI Visibility Practitioner.

Putting it into Practice

To begin your first citation audit, follow these steps:

  1. Select your targets: Choose the 3 most relevant LLMs for your audience (e.g., ChatGPT-4o, Perplexity, and Gemini).
  2. Build a spreadsheet: Create columns for 'Prompt', 'Response Headline', 'Brand Mentioned (Y/N)', 'Citation URL', and 'Source Type'.
  3. Run the 'Niche Authority' test: Use broad informational prompts like "What are the current trends in [Industry]?" and see which domains are cited most frequently. These are your 'Authority Benchmarks'.
  4. Cross-reference with SEO: Compare your citation list with your top-ranking pages in Google Search. If a page ranks #1 in Search but is never cited by AI, there may be a formatting or 'readability' issue for the AI model's training data/retrieval system.
  5. Report findings: Present the 'Share of Citations' (similar to Share of Voice) to the client to justify investment in AEO-specific content updates.

Visual diagram

[ diagram placeholder ]

A workflow diagram showing the 4 stages of a citation audit: Discovery (Prompts), Extraction (AI Output), Categorisation (Data Normalisation), and Gap Analysis (Insights).

Exercise

Take five high-volume keywords for your brand and run them as prompts through Perplexity. Record every URL cited in the 'Sources' section into a spreadsheet and categorise them as either 'Owned', 'Earned', or 'Competitor'.

Key takeaways

  • Citations serve as the primary validation metric for brand visibility in AI responses.
  • Audits should cover three prompt types: Navigational, Commercial and Informational.
  • A standard audit sample size is typically 50-100 prompts per AI model.
  • AI models cite sources through footnotes, 'Sources' panels, and inline hyperlinks.
  • Categorising citations by 'Source Type' (Earned, Owned, Community) reveals where trust lies.
  • Implicit mentions without links contribute to brand salience but represent a citation gap.
  • Third-party aggregators often serve as the primary source for AI comparisons.
  • Scaling audits requires API usage or scraping tools to handle bulk prompt sets.
  • Citation leakage occurs when your brand is discussed but credit is given to a third party.
  • Comparing AI citations to SEO rankings helps identify content that 'ranks' but isn't 'cited'.

Lesson Quiz

Pass at 70%.

1. What is 'Citation Leakage' in the context of an AI audit?
2. Which prompt type focuses on users looking for general industry knowledge?
3. Why is it important to audit multiple AI models like Gemini and Claude simultaneously?
4. In a citation audit, what does 'Earned Media' refer to?
5. What is a recommended sample size for a representative citation audit?
6. What tool would you use to scale citation extraction from models that lack an API?
7. If your site ranks #1 in Google but has 0 citations in LLMs, what is a likely cause?
8. Which of these is an 'Implicit Reference'?
9. A 'Compare' prompt usually falls into which category?
10. What is the final phase of the Citation Audit Framework?
Create a free account to save progress and earn a certificate.