What Is Share of Model? The Core AI Visibility Metric
Share of model measures how often AI systems mention and recommend your brand. Learn to measure it with prompt sets, a 0-3 rubric, and cross-model sampling.
Your buyers now ask ChatGPT, Gemini, and Perplexity which vendors to shortlist before they ever load a results page, and the model’s answer either includes you or quietly buries you. The problem is that “are we showing up in AI?” is not a metric you can manage, defend in a board meeting, or trend over time. Share of model is the discipline that turns that vague anxiety into a number you can move.
What is share of model?
Share of model is the percentage of relevant AI-generated answers in which your brand appears, weighted by how prominently and how favorably it is mentioned. It is the AI-era successor to share of voice: instead of measuring your slice of ad impressions or organic rankings, it measures your slice of the synthesized recommendation that a large language model hands a buyer.
The metric exists because the surface has changed. In classic search there were ten blue links and a buyer could scroll. In an AI answer there is often one paragraph, three named vendors, and no second page. If the model recommends three competitors and omits you, you did not “rank lower” — you were structurally excluded from the consideration set. Share of model quantifies how often that exclusion happens across the questions your buyers actually ask. If you are still mapping that shift, GEO vs traditional SEO covers what changes when buyers ask AI first.
Why a single mention is the wrong unit
The naive version of this metric counts mentions: ask a question, note whether your brand appears, divide by the number of prompts. That number is almost always misleading, because not all mentions are equal:
- A brand named first, with a clean recommendation, is worth far more than one listed last with a caveat.
- A mention that states your pricing wrong actively costs you deals — it is a negative event, not a neutral one.
- A mention buried in a list of twelve “other options” barely registers with a buyer reading the top three.
So share of model is best understood not as a count but as a weighted score. The weighting is where practitioner craft lives, and it is what separates a real KPI from a vanity tally.
How do you measure share of model?
You measure share of model by running a fixed prompt set across multiple AI systems, scoring each response on a defined rubric, and aggregating those scores into a competitor-relative percentage. The method has four moving parts — the prompt set, the scoring rubric, cross-model sampling, and the sentiment-and-accuracy layer — and skipping any one of them produces a number you cannot trust.
The single most important principle is fix your inputs before you trust your outputs. Because model outputs are non-deterministic, the only way to detect a real change in your visibility is to hold everything else constant: the same prompts, the same rubric, the same sampling cadence. If you change the questions every quarter, you are measuring your questions, not your brand.
Step 1: Build a representative prompt set
Your prompt set is the denominator of the whole metric, so it has to mirror how buyers actually talk, not your internal category language. Aim for 30 to 60 prompts grouped by intent, and then freeze the list so it stays comparable across runs.
- Category prompts — “What are the best [category] tools for [segment]?”
- Comparison prompts — “[Competitor] vs alternatives for [use case]”
- Problem-led prompts — “How do I fix [the pain your product solves]?”
- Branded prompts — “What is [your brand] and who is it for?”
The most under-built category is problem-led prompts, because that is where buyers who have never heard of you begin. A brand that only surfaces for branded prompts has high recall and zero discovery — it is confirming demand, not creating it. This same prompt-set discipline underpins a full AI visibility audit; share of model is the recurring score you extract from it.
Step 2: Score every response on a 0-3 rubric
Scoring is where you convert a wall of generated text into a comparable number. A four-point scale (0 to 3) is a practical sweet spot: it is granular enough to distinguish “recommended” from “merely mentioned,” but coarse enough that two analysts will agree on the score without a three-hour calibration meeting.
Here is a core rubric to use as a starting point. Treat it as a template to adapt, not gospel:
| Score | Label | What it means in the answer | Example signal |
|---|---|---|---|
| 0 | Absent | Brand is not mentioned for a relevant prompt | Model lists 5 vendors, you are not one |
| 1 | Mentioned | Named, but passively or in a long tail list | ”Other options include [you]…“ |
| 2 | Considered | Named with context as a genuine option | ”[You] is a solid choice for mid-market teams” |
| 3 | Recommended | Named first, as a primary or default pick | ”The best option for this is [you], because…” |
To make the rubric defensible, write a one-line decision rule for each boundary (for example: “a brand only scores 3 if it appears in the first two sentences AND is framed as a lead recommendation”). Those written rules are what keep your scoring consistent when you re-run the set three months later or hand it to a different analyst.
Step 3: Sample across models, and repeat
You must run each prompt across several systems and multiple times per system, because a single query is statistically meaningless. Outputs are non-deterministic, coverage differs between platforms, and one lucky run can flatter you into complacency.
At minimum, sample across:
- ChatGPT — broad consumer and increasingly buyer-side research.
- Google Gemini and AI Overviews — tied to the largest search index.
- Perplexity — citation-heavy, and one of the clearest windows into which sources a model trusts.
- Microsoft Copilot — relevant for enterprise and Microsoft-stack buyers.
Sampling hygiene that practitioners tend to learn the hard way:
- Run logged-out, in fresh sessions so your own history does not personalize the answer.
- Query each prompt three to five times and average, rather than trusting one shot.
- Record the full response and any cited sources, not just your score — the citations tell you why you scored what you did.
- Stamp every run with a date, because model updates silently shift behavior and you will need to explain a swing later.
How do sentiment and accuracy change the score?
Sentiment and accuracy act as modifiers on top of the presence score, because a prominent mention that is hostile or factually wrong is not a win. This is the layer most dashboards skip, and it is exactly where the metric stops being a vanity count and starts reflecting commercial reality.
Layer in sentiment
After you assign a 0-3 presence score, tag the framing of each mention as positive, neutral, or negative. A score of 3 wrapped in “but it is expensive and support is slow” is a very different commercial event than a clean score of 3. Keep sentiment as a parallel tag rather than folding it into one blended number, so you can answer two distinct questions: how often do we appear? and how are we described when we do?
Treat accuracy as a hard gate
Accuracy deserves its own column because a confidently wrong mention is worse than absence — it shapes a buyer’s perception with information you would never publish yourself. When a model misstates your pricing, claims a feature you do not have, or confuses you with a competitor, flag it as a defect regardless of how prominent the mention is.
These factual errors are rarely random. Many assistants ground answers using retrieval-augmented generation, pulling from sources at answer time, so an inaccuracy usually traces back to a thin, outdated, or contradictory source the model trusted. That makes accuracy defects unusually actionable: fix the underlying source — your own site, a knowledge graph entry, a Wikidata record, or a high-authority third party — and the error often clears on the next sampling run. Strengthening the structured, machine-readable facts about your brand is the core of entity and knowledge-graph work, and it is frequently the fastest lever on a stubborn accuracy gap.
How do you turn share of model into a managed KPI?
You turn share of model into a managed KPI by aggregating your weighted scores into a competitor-relative percentage, tracking it on a fixed cadence, and assigning ownership for moving it. A number that nobody owns and nobody trends is a slide, not a KPI.
Make it competitor-relative
Share of model is meaningless in isolation — “we scored 1.8 on average” tells you nothing. The metric only lives when it is relative. Score the same prompt set for three to five named competitors and express your result as your share of the total available recommendation across the set. That reframes the conversation from “are we visible?” to “what slice of the AI consideration set do we own versus the people we lose deals to?”
Choose a cadence and hold it
Re-sample on a regular interval — monthly for fast-moving categories, quarterly for stable ones — using the identical frozen prompt set and rubric. Because individual runs are noisy, never react to a single quarter’s wobble; look for a sustained trend across two or three cycles before declaring a win or a regression. Continuous AI citation tracking automates the heavy lifting here, but the principle holds even with a manual spreadsheet: consistency of method beats sophistication of tooling.
A practical KPI checklist
- Prompt set is frozen, documented, and intent-balanced (30-60 prompts).
- Scoring rubric has written boundary rules, applied by the same person or a calibrated team.
- Every prompt sampled across 3+ models, 3-5 runs each, logged-out.
- Presence, sentiment, and accuracy tracked as separate columns.
- Scores are competitor-relative, not absolute.
- Fixed re-sampling cadence with a date stamp on every run.
- One named owner accountable for the trend line.
What share of model can and cannot tell you
Share of model reliably tells you whether and how AI systems represent you relative to competitors over time, but it cannot tell you the exact reason a model made a given choice. The ranking and retrieval logic inside these systems is proprietary and only partly documented, so anyone selling you a precise “ranking factor” breakdown is overselling. Treat your scores as directional evidence, not a decoded algorithm — and be honest about that uncertainty when you present them.
What the metric does expose, run after run, is which sources the models lean on. When you read the citations behind your scores, patterns emerge: assistants disproportionately trust well-structured, authoritative, frequently-referenced content. That points your actual work at durable fundamentals — earning real third-party citations through digital PR, publishing genuinely helpful, accurate content, and shipping clean structured data so machines parse your facts correctly.
A note on risky tactics
Some practitioners experiment with stuffing hidden instructions or keyword-loaded boilerplate into pages hoping to manipulate what models say. Treat these as high-risk and short-lived: they are fragile against model updates, can backfire when a system surfaces the manipulation, and they do nothing for the human reader. The durable path is the same one structured LLM SEO and GEO is built on — be the most accurate, most cited, most clearly-structured source in your category, and let the share of model number confirm it.
If you want to see your current share of model — scored, competitor-benchmarked, and broken down by where you are absent, misrepresented, or merely mentioned — request a free AI visibility audit. We will run your prompt set across the major models, hand you the rubric-scored results, and show you exactly which gaps to close first.