April 25, 2026 · 9 min read

Introducing NoticeMeAI: a Lighthouse-style audit for AI search and agent readiness

Score your site on 8 dimensions of LLM visibility — robots.txt, llms.txt, structured data, MCP, agent skills, x402, and more. We built the audit we kept wishing existed.

The web has reshaped itself for new readers before. It learned to speak to browsers, then to search engines. Now it has to learn to speak to AI agents — and most sites haven't.

Today we're launching NoticeMeAI, a Lighthouse-style audit that scores any website on how well it can be found, understood, cited, and acted on by ChatGPT, Claude, Perplexity, Gemini, and the next generation of autonomous agents. Run a free scan, see where you stand, and get a prompt-ready action plan you can hand straight to your coding agent.

This post walks through the eight categories we score, the standards we test against, and why each one matters as the web shifts from a human-read web to a machine-read web.

What NoticeMeAI does, in one paragraph

Paste a URL. We fetch the homepage, robots.txt, sitemap, llms.txt, .well-known endpoints, and a sample of internal pages. We run dozens of checks across eight categories — some basic (does the homepage return 200, is there a meta description), some emerging (MCP server card, x402, agent skills manifest). You get a 0–100 score per category, a Lighthouse-style report, and on paid scans an LLM-generated action plan ranked by impact and effort.

No headless browser, no crawl bombing your origin. Just lightweight HTTP fetches and a frontier model doing the synthesis on top.

The four questions an LLM asks about your site

Before we dive into the categories, it helps to remember what we're optimizing for. When ChatGPT, Claude, Perplexity, or an autonomous agent encounters your site, it runs through a checklist in milliseconds:

Can I find it? Is the homepage reachable, does robots.txt allow me, does a sitemap or llms.txt exist?
Can I read it? Is the content in the initial HTML, or locked behind JavaScript I can't wait for?
Can I trust it? Is there structured data, an about page, author info, verifiable facts?
Can I act on it? Are there discoverable APIs, a usable MCP server, a way to authenticate, a way to pay?

Miss any one and the model picks somebody else. We've written more about this in How to make AI search notice your website in 2026.

The eight categories we score

1. Discoverability

This is the foundation. We check that your homepage returns a clean 200, that robots.txt exists and doesn't accidentally block the world, that a sitemap is published and points at real URLs, and that Link response headers (RFC 8288) describe the important resources directly in the HTTP response — not buried inside HTML that an agent has to parse.

HTTP/1.1 200 OK
Link: </.well-known/api-catalog>; rel="api-catalog"
Link: </llms.txt>; rel="llm-index"

Most sites pass the homepage and robots.txt checks. Far fewer expose anything in Link headers, which is one of the cheapest agent-readiness wins on the list.

2. Crawl & bot access

Having a robots.txt is one thing. Saying useful things in it is another. We check three layers:

AI bot policy — do you have explicit rules for GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, and friends, or are they lumped into a single * block?
Content Signals — do you use the new Content-Signal directive to control training, inference, and search independently?
Web Bot Auth — do you publish a public-key directory at /.well-known/http-message-signatures-directory so well-behaved bots can authenticate themselves to you?

Content Signals are the under-rated one. Instead of a binary allow/disallow, you can split intent:

User-agent: *
Content-Signal: ai-train=no, search=yes, ai-input=yes

Translation: don't train on my pages, but feel free to surface them in answers and search. That's the policy most publishers actually want, and almost none of them have written it down.

If you only fix one robots.txt thing this quarter, separate search and retrieval user agents (OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot) from training-time crawlers (GPTBot, Google-Extended). The first group is traffic-sensitive. Block everything together and you can vanish from cited answers.

3. Content extractability

Once an agent is in, can it read what you've written? Three signals matter:

Server-rendered text density. We measure the ratio of useful text in the initial HTML against the total payload. JS-heavy SPAs that render content client-side score poorly here, because most retrieval-time agents won't wait for hydration.
llms.txt. A plain-text "site map for LLMs" at the root, listing your most important pages with descriptions an LLM can ingest in a single context window.
Markdown content negotiation. Send Accept: text/markdown to your URL and check whether the server returns a clean markdown version. Markdown can be 60–80% smaller than the equivalent HTML, which means cheaper, faster, more complete answers.

A minimal llms.txt looks like this:

# Acme Robotics
> Industrial automation for small manufacturers.

## Documentation
- [Getting Started](https://acme.example/docs/start.md)
- [API Reference](https://acme.example/docs/api.md)

## Changelog
- [2026 Releases](https://acme.example/changelog.md)

Three lines per section, ten minutes of work, double-digit points on the extractability score for most sites we scan.

4. Structured data & entities

<title>, <meta name="description">, <link rel="canonical">, Open Graph tags, and the big one: JSON-LD. We check for presence and validity, and on paid scans we validate the schema against schema.org and flag the common errors (broken @type, missing mainEntity, schema attached to the wrong page).

Structured data is what lets a model say "Acme Robotics, founded in 2019, headquartered in Lyon" instead of "a company called Acme". The first version gets cited; the second gets paraphrased away.

5. Citation readiness

Models cite sentences, not pages. We score how citable your content actually is: do your H1s describe the page, does the meta description make sense out of context, are there self-contained quotable passages with enough scaffolding to stand alone in a chat answer?

This is the category most teams skip because it feels like a copywriting issue. It is — but it has a measurable AI-search impact, and it's why a competitor with worse SEO sometimes gets cited instead.

6. Agent readiness

This is where the new web standards live. We probe the .well-known endpoints that agents look for first:

MCP server card at /.well-known/mcp/server-card.json — describes your Model Context Protocol server before an agent connects: tools, transport, auth.
A2A agent card at /.well-known/agent.json — Agent-to-Agent protocol metadata.
Agent Skills at /.well-known/agent-skills/index.json — a manifest of named skills agents can invoke.
WebMCP at /.well-known/webmcp — browser-side MCP discovery.
API catalog (RFC 9727) at /.well-known/api-catalog.
OAuth discovery (RFC 8414) and Protected Resource metadata (RFC 9728) — so agents can run a real OAuth flow on the user's behalf instead of hijacking a logged-in browser session.

A minimal MCP server card:

{
  "$schema": "https://static.modelcontextprotocol.io/schemas/mcp-server-card/v1.json",
  "version": "1.0",
  "protocolVersion": "2025-06-18",
  "serverInfo": { "name": "acme-mcp", "version": "1.0.0" },
  "transport": { "type": "streamable-http", "endpoint": "/mcp" },
  "authentication": { "required": false },
  "tools": [
    {
      "name": "search_products",
      "description": "Search the Acme catalog by keyword",
      "inputSchema": {
        "type": "object",
        "properties": { "query": { "type": "string" } },
        "required": ["query"]
      }
    }
  ]
}

Drop that file at the right path and you're ahead of 99% of the web. We track adoption across our scan history and the absolute count is still tiny — which means it's also the cheapest place to differentiate.

7. Prompt & model visibility

This is the synthesis tier. On paid scans we ask a frontier model — using the public content we fetched — to estimate how the site would surface for a handful of representative prompts in your category. The output is qualitative ("you're likely to be cited for X, unlikely for Y because Z") rather than a hard score, but it consistently points at content gaps that don't show up in any of the structural checks.

8. Commerce & API readiness

Agents are starting to buy things. The traditional checkout — add to cart, type a card number, click pay — falls apart when the buyer is an LLM. We check three emerging standards:

x402 — revives HTTP 402 Payment Required from the original spec. Agent requests a paid resource, server replies with a machine-readable payment description, agent pays and retries.
Universal Commerce Protocol (UCP) — broader product discovery and purchase flow.
Agentic Commerce Protocol (ACP) — OpenAI's variant, similar shape.

This category is weighted lightest in the score for now because the standards are early. But for ecommerce sites, being among the first to ship even a basic x402 endpoint is the kind of moat that disappears once everybody catches up.

What you actually get from a scan

The free scan covers the basics: homepage status, robots.txt, sitemap, AI bot policy, head tags, JSON-LD presence, llms.txt, markdown negotiation. It's enough to catch the obvious mistakes that knock 20–30 points off most sites.

The paid scan adds the agent-readiness .well-known probes, schema validation, server-rendered text density, the prompt-visibility synthesis, and an LLM-generated action plan that turns each failing check into a prompt you can paste straight into Claude Code, Cursor, or whatever your coding agent is. Pricing is a flat monthly subscription, no per-scan metering — see the pricing page for the details.

You also get the report as Markdown or JSON, exportable, so you can hand it to a teammate, an agent, or a CI job. We never persist scan reports beyond 24 hours — see the privacy section of our legal page for what we do and don't keep.

A note on what we don't do

We don't run a headless browser. We don't crawl your whole site. We don't store your reports forever, and we don't share them with anyone other than the user who ran the scan. We don't send your content to model training. We're transparent about every sub-processor we use — Stripe, Hetzner, OpenAI, Google and GitHub for OAuth — on the legal page.

If anything in the report is unclear, or you spot a check that disagrees with the spec, tell us. We'd rather fix the audit than be politely wrong about your site.

Make your site agent-ready today

The web is on the third major rewrite of who its readers are. The first sites to ship <meta> tags got found. The first sites to publish sitemaps got crawled. The first sites to expose an MCP server, an llms.txt, a Web Bot Auth key directory, an x402 endpoint — those are the sites agents will route real traffic and real transactions through over the next two years.

Run a free scan and see where your site lands across all eight categories. Five minutes, no credit card. The action plan does the rest.