The Read

How to structure a page so AI engines actually cite it

AI visibility is not a mysterious new channel. It is retrieval, and retrieval rewards one boring page structure you can read straight off the mechanism.

Kenny Gimpert, Founder of Web Everything, on how to structure a page so AI engines cite it: generative engine optimization and retrieval

There is a new acronym doing the rounds in our corner of marketing. GEO, AEO, LLMO, pick your favorite. Every one of them arrives with the same posture: AI search is a mysterious new channel, the rules are unknowable, and you will need a specialist, often the person explaining this to you, to decode it.

I have spent the past year building a product that measures whether AI engines name a business when a real person asks. I have watched a lot of answers get assembled in real time, across five engines, for practices that had no idea they were being described at all. The honest version of what I saw is less mysterious than the acronyms want it to be, and more useful.

AI visibility is not a new channel. It is retrieval. And retrieval has a structure you can read straight off the mechanism.

What actually happens when an engine answers

The model does not know your client. It was not trained on your service page, and even if a sentence from it landed in the training data, that is not where the citation comes from. When someone asks, the engine retrieves passages from an index at query time and writes an answer over what it pulled back. The answer is only as good, and only as attributed, as what got retrieved in that moment.

Two things follow from that, and both matter more than any tactic.

First, the engines eat different diets. One leans on the Bing index, one pulls community threads heavily, one leans on Google and the map. Ask the same question across them and you will watch them disagree, because they are reading different sources, not because one is smarter. There is no single AI to optimize for. There is a panel, and it does not caucus.

Second, and this is the one people skip: the unit of citation is not the page. It is the passage. The engine lifts a small, self-contained chunk that answers the question and drops everything around it. Which means a page can rank beautifully, pass every audit, and still never get quoted, because the answer is buried three paragraphs down inside setup the retriever cannot cleanly excerpt.

That reframes the whole job. You are not optimizing a page. You are shipping quotable passages, and making it easy to attach your client’s name to them.

The structure

Once you see it as retrieval, the structure that wins is not clever. I now build every page that matters to the same shape, and the shape is boring on purpose.

Lead with the question, in the user’s words, as the heading. Not “Our Approach.” The actual thing a person types. The retriever is matching intent, so hand it the intent verbatim instead of making it infer.

Answer it in the first two sentences, self-contained. Write the answer so it survives being lifted out of the page with zero surrounding context, because that is exactly what will happen to it. The test is simple: if a reader needs the paragraph above to understand the sentence, the engine cannot quote it either.

Put one real number in it that exists nowhere else. The published research on how generative engines choose what to quote keeps landing on the same lever: original statistics beat almost everything else you can do to a page. Not the industry stat every competitor also cites. A number you counted, about your client, that no one else can publish. It is the hardest thing for a rival to copy and the easiest thing for an engine to prefer.

Mark it up so the machine and the human read the same text. Schema is not a growth hack, and there is no file you can drop in the root to manufacture a citation. What structured data does is remove ambiguity. It tells the parser which text is the question, which is the answer, and which entity is speaking. That is disambiguation, not magic, and it is worth doing for exactly that reason and no grander one.

Say the name and the facts the same way everywhere. Retrieval pulls from more than your site. If the founder’s name, the brand name, and the legal entity read as three different businesses across the directories and profiles, the engine splits the evidence three ways and none of them clears the bar to get named. Pick one canonical set of facts and point every surface at it.

Question, self-contained answer, one owned number, clean markup, consistent entity. None of it is impressive at a conference. All of it maps to how retrieval actually behaves.

My technical recommendations

The structure above is the writing. Under it sits a thin layer of engineering that decides whether any of the writing survives the trip through a retriever. Four things, in the order I check them.

Render the answer in the HTML, not after it. If your quotable passage is injected by JavaScript after the page loads, you are betting that every retriever executes the script before it reads. Some do not, and some that do give up first. Put the answer in the initial server-rendered markup, where it exists whether or not anything runs. This is the most common way a well-written answer goes invisible, and nobody catches it because the page looks perfect in a browser.

Use schema for disambiguation, and stop there. FAQ, Q&A, and Organization markup earn their place by telling the parser which text is the question, which is the answer, and which entity is speaking. That is the whole job. I have watched teams treat schema as a lever they can pull harder, stacking types, and get nothing back, because there was no clean answer in the page for the markup to point at. Structure the content first. The schema describes it; it does not rescue it.

Pick canonical facts and enforce them. One spelling of the name, one address format, one specialty phrasing, identical across the site, the profiles, and the directories. Retrieval assembles an entity out of scattered evidence, and inconsistency reads as three weak entities instead of one strong one. This is unglamorous data hygiene, and it outperforms most of what gets sold as AI optimization.

Let the crawlers in. Check your robots file and your bot rules before anything else, because the fastest way to guarantee zero citations is to block the exact agents you want quoting you. I have seen a site do everything else right and still go unread because a well-meaning security setting was quietly erasing it from the index. Absence is not a strategy.

None of this is exotic. It is the plumbing that lets good content reach the surface, and it is usually a one-time pass, not a program.

Why the boring version wins

The content treadmill was a search-era instinct. Publish more, cover more, refresh more. Retrieval does not reward more. It rewards legible and singular, and a fortieth interchangeable page saying what the other thirty-nine say gives an engine no reason to pick yours. Volume used to be leverage. Now it is noise the retriever has to sort through to find the one passage worth quoting, and often that passage belongs to someone else.

Here is the uncomfortable part for our industry, and I would rather say it than sell around it: most of this is a one-time structural fix, not a monthly retainer. You can bill a subscription against a problem that a good afternoon of structure work largely solves, but you should not pretend the subscription is the reason it worked.

We just shipped a tool that automates the tedious middle of this, turning a set of real answers into that question, answer, and schema structure without anyone hand-writing the markup. I am not going to pitch it here, and I am not going to tell you what it costs or where to get it. I mention it only because building it is what forced me to get specific about the structure, and the structure is the part worth sharing.

The engines are not a black box. They are retrieval systems with public, boring preferences. Write the answer, own a number, mark it up, keep the name straight. If someone tells you it takes more mystery than that, they are selling the mystery.

The Practice Workup

See what AI says about your practice.

One workup measures your visibility across all five AI engines and Google, then hands you a sequenced 90-day plan. $399, one time, typically in your inbox within an hour.

Or run the free AI check first

No Subscription. No call.