GGoatEO
Get your score
Scoring Rubric v0.1

Methodology

We don't guess what AI thinks about you. We ask each engine directly and score what it actually says. Every scoring decision is documented, versioned, and public.

Core principle

The thing being tested produces the output being scored.

When you test ChatGPT's response, ChatGPT is the one answering. The scoring is derived from analyzing each engine's real output against the rubric criteria. We don't ask one model what another model thinks. Every finding is grounded in a real response from the engine being evaluated.

Five pillars

40 criteria across 5 weighted pillars

Discoverability

25%

Can answer engines find you?

Domain resolvability across AI crawlers
Crawlability for GPTBot, ClaudeBot, PerplexityBot, Google-Extended
llms.txt presence, completeness, and currency
Sitemap freshness and AI-accessible format
Schema.org completeness (Organization, Product, FAQPage)
Canonical signal integrity
Page render cost and JS dependency
CDN / geo-availability for crawler access

Citation Presence

30%

When buyers ask, do you get cited?

Brand named in direct "best [category]" queries (5 queries)
Brand named in comparative queries (10 queries)
Brand named in problem-framing queries (15 queries)
Brand named in objection queries (10 queries)
Brand named in feature queries (10 queries)
Cited-and-positive vs cited-and-neutral vs cited-and-negative
Not-cited detection across all query types
Hallucinated-incorrectly detection

Answer Quality

20%

When cited, is the description accurate?

Factual accuracy of feature claims
Pricing accuracy across all engines
Positioning clarity — how well engines describe what you do
Competitive framing — how you're positioned vs competitors
Recency of information signal
Hallucination rate — fabricated claims about your product

Source Authority

15%

What sources do engines cite when mentioning you?

Own-site citation rate
Owned-media citation rate (blog, docs, changelog)
Third-party citation rate (G2, Capterra, TechCrunch)
Community citation rate (Reddit, HN, Stack Overflow)
Stale source detection — how old are the sources engines use

Content Architecture

10%

How well is your site structured for AI consumption?

FAQ density and structure
Comparison page presence (/compare/you-vs-competitor)
Use-case landing pages
Pricing page transparency and structured data
Documentation coverage and accessibility
Semantic HTML (heading hierarchy, landmark elements)
Answer-first paragraph structure

Scoring pipeline

From query to playbook in 6 steps

1

Evidence gathering

Query panel sends 50 prompts to each of the 5 AI engines. Each engine responds as itself. Playwright crawls your site for technical signals.

2

Response analysis

Parse each engine's raw response into structured evidence: citations found, claims made, sources referenced, factual statements extracted.

3

Rubric scoring

Structured evidence scored against criterion definitions. Deterministic criteria scored programmatically. Judgment criteria scored against anchored descriptors.

4

Aggregation

Weighted sum across criteria → pillar score → overall GOAT Score (0-100).

5

Playbook generation

For every criterion below threshold, generate engine-specific remediation playbooks grounded in the real findings.

6

Versioning

Every run is versioned. Methodology changes are auditable. audit_type field determines which rubric applies.

Disclosures

Synthetic panel ≠ real user queries

GoatEO runs a curated panel of 50 queries through AI engine APIs. These approximate what real buyers ask, but they are not real buyer sessions. The panel composition is published and versioned.

API responses ≠ consumer UX

API responses may differ from web and mobile app experiences. UI-specific features (like citation cards or image results) are not captured. Google AI Overviews are captured via SERP proxy to approximate the consumer experience.

Transparency stance

Published

Category structure and taxonomy
Pillar weights and rationale
Criterion definitions and detection methods
Scoring philosophy
Version history with changelogs
Panel query structure (not exact queries)
The "why" behind every scoring decision

Held back

Raw prompt templates
ML features and embeddings
Proprietary benchmarks
Exact panel query wording
Cache architecture
Abuse defense specifics