Scoring Rubric v0.1

Methodology

We don't guess what AI thinks about you. We ask each engine directly and score what it actually says. Every scoring decision is documented, versioned, and public.

Core principle

The thing being tested produces the output being scored.

When you test ChatGPT's response, ChatGPT is the one answering. The scoring is derived from analyzing each engine's real output against the rubric criteria. We don't ask one model what another model thinks. Every finding is grounded in a real response from the engine being evaluated.

Five pillars

40 criteria across 5 weighted pillars

Discoverability

25%

Can answer engines find you?

•Domain resolvability across AI crawlers

•Crawlability for GPTBot, ClaudeBot, PerplexityBot, Google-Extended

•llms.txt presence, completeness, and currency

•Sitemap freshness and AI-accessible format

•Schema.org completeness (Organization, Product, FAQPage)

•Canonical signal integrity

•Page render cost and JS dependency

•CDN / geo-availability for crawler access

Citation Presence

30%

When buyers ask, do you get cited?

•Brand named in direct "best [category]" queries (5 queries)

•Brand named in comparative queries (10 queries)

•Brand named in problem-framing queries (15 queries)

•Brand named in objection queries (10 queries)

•Brand named in feature queries (10 queries)

•Cited-and-positive vs cited-and-neutral vs cited-and-negative

•Not-cited detection across all query types

•Hallucinated-incorrectly detection

Answer Quality

20%

When cited, is the description accurate?

•Factual accuracy of feature claims

•Pricing accuracy across all engines

•Positioning clarity — how well engines describe what you do

•Competitive framing — how you're positioned vs competitors

•Recency of information signal

•Hallucination rate — fabricated claims about your product

Source Authority

15%

What sources do engines cite when mentioning you?

•Own-site citation rate

•Owned-media citation rate (blog, docs, changelog)

•Third-party citation rate (G2, Capterra, TechCrunch)

•Community citation rate (Reddit, HN, Stack Overflow)

•Stale source detection — how old are the sources engines use

Content Architecture

10%

How well is your site structured for AI consumption?

•FAQ density and structure

•Comparison page presence (/compare/you-vs-competitor)

•Use-case landing pages

•Pricing page transparency and structured data

•Documentation coverage and accessibility

•Semantic HTML (heading hierarchy, landmark elements)

•Answer-first paragraph structure

Scoring pipeline

From query to playbook in 6 steps

Evidence gathering

Query panel sends 50 prompts to each of the 5 AI engines. Each engine responds as itself. Playwright crawls your site for technical signals.

Response analysis

Parse each engine's raw response into structured evidence: citations found, claims made, sources referenced, factual statements extracted.

Rubric scoring

Structured evidence scored against criterion definitions. Deterministic criteria scored programmatically. Judgment criteria scored against anchored descriptors.

Aggregation

Weighted sum across criteria → pillar score → overall GOAT Score (0-100).

Playbook generation

For every criterion below threshold, generate engine-specific remediation playbooks grounded in the real findings.

Versioning

Every run is versioned. Methodology changes are auditable. audit_type field determines which rubric applies.

Disclosures

Synthetic panel ≠ real user queries

GoatEO runs a curated panel of 50 queries through AI engine APIs. These approximate what real buyers ask, but they are not real buyer sessions. The panel composition is published and versioned.

API responses ≠ consumer UX

API responses may differ from web and mobile app experiences. UI-specific features (like citation cards or image results) are not captured. Google AI Overviews are captured via SERP proxy to approximate the consumer experience.

Transparency stance

Published

✓Category structure and taxonomy

✓Pillar weights and rationale

✓Criterion definitions and detection methods

✓Scoring philosophy

✓Version history with changelogs

✓Panel query structure (not exact queries)

✓The "why" behind every scoring decision

Held back

—Raw prompt templates

—ML features and embeddings

—Proprietary benchmarks

—Exact panel query wording

—Cache architecture

—Abuse defense specifics