Perplexity processes over 100M queries per month. Here is how it decides which websites to cite -- explained by a team that queries its API every day.
Why Perplexity SEO Is Different
Perplexity is not a search engine in the traditional sense. It does not return a list of ten blue links. It reads the web, synthesizes an answer, and attaches citations -- inline references to the sources it used. If your website is one of those citations, you get visibility. If it is not, you are invisible to that user.
This is a fundamentally different game from Google SEO. On Google, you optimize for ranking position. On Perplexity, you optimize for citation inclusion. There is no "page one." There is only "cited" or "not cited."
We know this from the inside. Our AI Citation Check feature queries Perplexity's Sonar API directly to track whether your website appears in AI-generated answers. We have run thousands of citation checks across hundreds of domains. What follows is what we have learned about how Perplexity selects its sources.
How Perplexity Discovers Content
Perplexity's answer engine has two layers: retrieval and generation. Understanding both is essential for optimization.
The retrieval layer
When you submit a query, Perplexity first retrieves candidate sources from its web index. This step is similar to traditional search -- it identifies pages that are topically relevant to the query. Perplexity's index overlaps heavily with what Google indexes, which means traditional SEO fundamentals still matter here. If Google cannot find your page, Perplexity probably cannot either.
But retrieval is not the whole story. Perplexity retrieves far more candidates than it ultimately cites. The real selection happens in the next layer.
The generation layer
Perplexity's language model reads the retrieved sources and generates a synthesized answer. During generation, it decides which sources to actually cite -- which pages contributed useful information to the final answer. This is where traditional SEO and Perplexity SEO diverge sharply.
A page can be retrieved but not cited. This happens when the page is topically relevant but does not contain a clear, extractable answer to the query. The model needs to be able to pull a specific fact, recommendation, or explanation from your content. If your page is vague, padded with filler, or buries its key points under walls of introductory text, it gets retrieved and discarded.
The 6 Signals That Drive Perplexity Citations
Based on our analysis of thousands of citation checks through the Sonar API, these are the signals that consistently separate cited sites from non-cited sites.
Direct answer density
Pages that state facts, figures, and conclusions explicitly get cited. Pages that hint, hedge, or require inference do not. Perplexity is looking for extractable statements -- sentences it can reference as a source for a specific claim in its answer.
Structural clarity
Content organized with clear headings, short paragraphs, and logical flow gets cited more often. The language model parses your page during generation. If it can quickly locate the relevant section, it cites you. If your content is a wall of text, it moves on to a source that is easier to extract from.
Topical authority signals
Sites that cover a topic comprehensively -- multiple pages on related subtopics, deep expertise, original data -- get cited over sites with a single shallow page. Perplexity's retrieval layer favors domains that demonstrate depth in a subject area.
Freshness and recency
For queries where recency matters (pricing, statistics, tool comparisons), Perplexity strongly favors recently updated content. We have seen pages lose citations within weeks of becoming outdated. Dates on your content matter.
Source diversity preference
Perplexity deliberately cites multiple sources per answer. It does not want to depend on a single domain. This means you are competing for one of 5-15 citation slots, not trying to be the only answer. Your content needs to be among the best sources, not the only source.
Machine-readable structure
Sites with llms.txt files, clean metadata, and structured data give the retrieval layer stronger signals about what the page contains. This is the technical foundation -- making it easy for AI to understand what your site is about before it even reads the content.
What Gets Cited vs What Gets Skipped
Here is a concrete breakdown from our citation data. These patterns hold across industries.
| Gets Cited | Gets Skipped |
|---|---|
| "Our pricing starts at $49/mo for teams of 5-20" | "Contact us for a custom quote" |
| "The 3 most common causes of foundation cracks are..." | "Foundation problems can be caused by many things" |
| A comparison table with specific specs and prices | A page that says "we offer competitive pricing" |
| A blog post updated in 2026 with current statistics | A blog post from 2022 with outdated numbers |
| Clear H2 sections answering distinct questions | A 3,000-word page with no subheadings |
| Original research, case studies, proprietary data | Rewritten content that exists on 50 other sites |
The pattern is clear. Perplexity cites content that makes specific, extractable claims. It skips content that is vague, generic, or forces the model to guess.
How the Sonar API Works (And Why It Matters)
Perplexity offers the Sonar API -- the same retrieval-augmented generation engine that powers perplexity.ai -- as a developer product. This is what we use for our citation checks at llmstxt.studio. Here is what happens under the hood.
When we send a query to the Sonar API, it returns two things: a generated answer and a structured citations array -- an ordered list of URLs that the model used as sources. This is not a list of "related links." These are the specific pages the model read and referenced while constructing its answer.
The citations array is the ground truth. If your domain appears in it, the model used your content. If it does not, the model either did not retrieve your page or retrieved it and chose not to cite it.
We check this programmatically. For each site on our platform, we generate queries across three categories -- brand discovery (can AI find this business?), topic authority (does AI treat this site as a knowledge source?), and competitive landscape (who dominates the broader space?) -- and run each query through Sonar. We then parse the citations array, check for the user's domain, and record which competitors appeared instead.
This is not a proxy metric. It is the actual citation data from the same engine that serves perplexity.ai users.
Perplexity GEO: A Practical Playbook
Generative engine optimization for Perplexity comes down to five actions. These are ordered by impact.
Make your content extractable
Rewrite key pages so that every important claim is stated explicitly in a single sentence. If someone asked "what does [your company] charge?" the answer should be on your pricing page in a sentence that starts with your pricing. Do not make the model hunt for it.
Structure pages for machine reading
Use descriptive H2 headings that match common query patterns. "How much does foundation repair cost?" is a better heading than "Our Services." Each section should be independently useful -- a model should be able to extract value from one section without reading the whole page.
Deploy an llms.txt file
Give the retrieval layer a structured summary of your entire site. An llms.txt file lists your key pages with descriptions, making it trivially easy for AI to understand your site's scope and expertise. This is the equivalent of submitting a sitemap to Google -- a direct signal about what you offer.
Publish original, specific content
Perplexity cites sources that add unique value. If your blog post says the same thing as 20 other blog posts on the topic, you are competing for citation slots against all of them. Original data, case studies, unique analysis, and specific examples give you an edge.
Monitor and iterate
Run citation checks regularly to see if your changes are working. Track which queries cite you, which cite competitors, and which cite no one in your space. Use this data to identify gaps and prioritize content updates.
The Citation Position Myth
In Google SEO, position matters enormously. The difference between #1 and #5 is a massive drop in click-through rate. Perplexity citations work differently.
The Sonar API returns citations in a numbered array, and each citation is referenced inline in the answer text (e.g., [1], [2], [3]). But the position in the array does not correlate with visibility the way Google rankings do. A citation at position [7] can appear in the most important sentence of the answer, while [1] might support a throwaway introductory fact.
What matters is whether you are cited at all and in what context. Being cited as the source for the core recommendation is worth more than being cited first for a background detail. This is why we track citation presence, not citation rank, as the primary metric.
Perplexity vs Google: What to Optimize Differently
| Perplexity | ||
|---|---|---|
| Primary goal | Rank on page one | Get cited in the answer |
| Content length | Longer often ranks better | Concise and extractable wins |
| Keywords | Critical for ranking | Helpful for retrieval, not generation |
| Backlinks | Major ranking factor | Indirect signal via domain authority |
| Metadata | Title tags and meta descriptions | llms.txt + structured data |
| Freshness | Matters for some queries | Matters for most queries |
| Measurement | Search Console rankings | Citation checks via Sonar API |
The key takeaway: Google rewards pages that match search intent. Perplexity rewards pages that answer questions directly. These often overlap, but not always. A page can rank #1 on Google by being comprehensive and well-linked. That same page can be skipped by Perplexity if it buries its answer under 800 words of context-setting.
Check Your Perplexity Citations
We built our AI Citation Check specifically for this. It queries Perplexity's Sonar API with prompts tailored to your business -- brand queries, topic queries, and competitive queries -- and shows you exactly who gets cited.
You see whether your domain appears, which competitors show up instead, and how your citation presence changes over time. No guessing. No manual spot checks. Actual citation data from the same API that powers Perplexity's answer engine.
Start with a free AI Readiness Check to see how AI-visible your website is right now. Takes 30 seconds.
