Soft 404 · crawl budget · status codes

Is it a real 404
— or a soft 404?

A soft 404 looks "not found" to people but returns HTTP 200 to crawlers, so Google wastes crawl budget on it and flags it in Search Console. Paste a URL and soft404scan compares it against a guaranteed-missing page on the same host — then tells you the verdict and the exact signal that betrays it.

Try:

What it checks

How a soft 404 gives itself away

Status code

Whether the URL returns a clean 404/410, a 200 it shouldn't, a server error, or a redirect — the first tell of a soft 404.

The missing-URL baseline

It fetches a made-up URL in the same directory to learn what "not found" looks like on this host: a real 404, a catch-all 200, or a redirect to the homepage.

Content match

If your 200 page is near-identical to that guaranteed-missing page, it's the same not-found template with the wrong status — a soft 404.

Not-found wording

A page that returns 200 while its title or heading says "page not found" is a classic soft 404, even with unique styling.

Redirect-to-home

Redirecting a missing URL to the homepage instead of returning 404 is, by Google's own definition, a soft 404. It's caught explicitly.

JS-app honesty

Single-page apps serve the same shell for every URL. soft404scan detects that and says "verify in a browser" instead of crying soft 404.

Open methodology

Every rule, in the open

No mystery score. Here is exactly how each verdict is decided — so you can verify and trust the result.

Content similarity is a shingled-token Jaccard score (3-word shingles) over the server-rendered visible text. It reproduces the well-known soft-404 approach — comparing a URL to a guaranteed-missing baseline — but it is not Google's private classifier, so treat a verdict as a strong, transparent signal, not a guarantee.

Frequently asked questions

What is a soft 404?

A soft 404 is a page that tells a human "this content does not exist" but tells crawlers everything is fine by returning HTTP 200 (or by redirecting a missing URL to the homepage). Google then wastes crawl budget on it and may report it as "Soft 404" in Search Console. The fix is to return a real 404 or 410 status for content that is genuinely gone.

How does soft404scan detect a soft 404?

It fetches your URL and, at the same time, a made-up URL in the same directory that is guaranteed not to exist. It then compares the two: the HTTP status code, whether either redirects to the homepage, and how similar the page content is. If your URL returns 200 but is the same page a non-existent URL returns — or its title/heading says "not found", or it redirects to the homepage like a missing page does — that is a soft 404. Every rule is published in the methodology.

Is it free?

Yes — free, no account, no sign-up. Enter a URL and get an instant verdict. We keep no logs of the URLs you check.

Why does this matter for SEO?

Search engines have a limited crawl budget per site. Soft 404s burn that budget on pages that should not be indexed, can keep dead URLs lingering in the index, and muddy your coverage reports. Returning a clean 404/410 lets engines drop missing pages quickly and spend crawl budget on pages that matter.

Can I use it to confirm I fixed a soft 404?

Yes — that is a primary use case. After you change a missing URL to return a real 404/410, paste it here again: a "True 404" verdict confirms the fix is live and that engines will now drop the URL cleanly.

Does it run JavaScript / work on SPAs?

No — it reads the server-rendered HTML, like a crawler that does not execute JavaScript. Many single-page apps serve the same HTML shell for every URL, including ones that do not exist, so the static comparison can look identical. soft404scan detects that case and tells you to verify in a browser or Search Console rather than giving a false "soft 404" verdict.

Is this the same heuristic Google uses?

It reproduces the well-known, published approach (compare a URL against a guaranteed-missing baseline by status and content similarity). It is not Google's private classifier, so treat the result as a strong, transparent signal — not a guarantee of exactly what Search Console will say.

Is my data safe? Any SSRF concerns?

The scan runs on Cloudflare and only fetches public http(s) URLs; requests to private, loopback, link-local and cloud-metadata addresses are blocked, redirects are re-validated on every hop, and responses are size- and time-capped. We keep no logs of what you check.