Soft 404 · crawl budget · status codes
Is it a real 404 — or a soft 404?
A soft 404 looks "not found" to people but returns HTTP 200 to crawlers, so
Google wastes crawl budget on it and flags it in Search Console. soft404scan compares your URL against a
guaranteed-missing page on the same host and shows the verdict plus the exact signal that betrays it.
How a soft 404 gives itself away
A clean 404/410, a 200 it shouldn't return, a server error, or a redirect — the first tell.
A made-up URL in the same directory shows what "not found" looks like on this host: a real 404, a catch-all 200, or a homepage redirect.
If your 200 page is near-identical to that guaranteed-missing page, it's the not-found template with the wrong status.
A 200 page whose title says "page not found", or a missing URL redirected to the homepage, is a soft 404 by Google's own definition.
Why it matters
Search engines have a limited crawl budget per site. Soft 404s burn it on pages that should be gone, keep dead URLs lingering in the index, and muddy coverage reports. Returning a clean 404/410 lets engines drop missing pages fast. soft404scan finds them with an open, published methodology — no black-box score.
Frequently asked questions
What is a soft 404?
A soft 404 is a page that tells a human "this content does not exist" but tells crawlers everything is fine by returning HTTP 200 (or by redirecting a missing URL to the homepage). Google then wastes crawl budget on it and may report it as "Soft 404" in Search Console. The fix is to return a real 404 or 410 status for content that is genuinely gone.
How does soft404scan detect a soft 404?
It fetches your URL and, at the same time, a made-up URL in the same directory that is guaranteed not to exist. It then compares the two: the HTTP status code, whether either redirects to the homepage, and how similar the page content is. If your URL returns 200 but is the same page a non-existent URL returns — or its title/heading says "not found", or it redirects to the homepage like a missing page does — that is a soft 404. Every rule is published in the methodology.
Is it free?
Yes — free, no account, no sign-up. Enter a URL and get an instant verdict. We keep no logs of the URLs you check.
Why does this matter for SEO?
Search engines have a limited crawl budget per site. Soft 404s burn that budget on pages that should not be indexed, can keep dead URLs lingering in the index, and muddy your coverage reports. Returning a clean 404/410 lets engines drop missing pages quickly and spend crawl budget on pages that matter.
Can I use it to confirm I fixed a soft 404?
Yes — that is a primary use case. After you change a missing URL to return a real 404/410, paste it here again: a "True 404" verdict confirms the fix is live and that engines will now drop the URL cleanly.
Does it run JavaScript / work on SPAs?
No — it reads the server-rendered HTML, like a crawler that does not execute JavaScript. Many single-page apps serve the same HTML shell for every URL, including ones that do not exist, so the static comparison can look identical. soft404scan detects that case and tells you to verify in a browser or Search Console rather than giving a false "soft 404" verdict.
Is this the same heuristic Google uses?
It reproduces the well-known, published approach (compare a URL against a guaranteed-missing baseline by status and content similarity). It is not Google's private classifier, so treat the result as a strong, transparent signal — not a guarantee of exactly what Search Console will say.
Is my data safe? Any SSRF concerns?
The scan runs on Cloudflare and only fetches public http(s) URLs; requests to private, loopback, link-local and cloud-metadata addresses are blocked, redirects are re-validated on every hop, and responses are size- and time-capped. We keep no logs of what you check.