DNSSEC BOGUS and SERVFAIL: the Complete Troubleshooting Guide

Fix DNSSEC BOGUS and SERVFAIL errors fast: clock drift, stripped RRSIGs, expired signatures, EDNS fragmentation — diagnosed with dig, delv, and DNSViz.

A DNSSEC BOGUS verdict means your resolver attempted cryptographic validation of a DNS answer and the validation failed — so it returns SERVFAIL to the client rather than hand over data it cannot trust. The frustrating part: SERVFAIL is the same generic error a dozen unrelated problems produce, and the real cause is usually not the domain but your clock, your upstream, or your network path.

Most advice on this is scattered across GitHub issues and forum threads, each covering one cause. This guide consolidates them into one diagnostic sequence that works for Pi-hole, Unbound, BIND, systemd-resolved, and EliHole alike.

The diagnostic checklist

Work through these in order — sorted by how often each turns out to be the culprit:

  1. Check the system clock. Run timedatectl status on the validating machine. A skewed clock invalidates every signature’s validity window.
  2. Bypass your resolver. Run dig @1.1.1.1 example.com +dnssec. If a known-good validating resolver answers fine, the problem is local, not the domain.
  3. Check the domain itself. Paste it into DNSViz. Red nodes mean the operator broke their own signatures — nothing on your side will fix that.
  4. Verify your upstream returns signatures. Run dig @<your-upstream> example.com A +dnssec and look for RRSIG records. No RRSIGs means your validator can never succeed.
  5. Rule out local zones. If the failures are all .lan, .home, or router-generated names, your validator is correctly rejecting unsigned private answers (see below).
  6. Test TCP and a smaller EDNS buffer. Run dig example.com DNSKEY +dnssec +tcp and +bufsize=1232. If TCP works where UDP fails, a middlebox is eating large responses.

Each step maps to one of the six root causes below.

Secure, insecure, bogus: what the verdicts mean

DNSSEC validation, defined in RFC 4035, classifies every answer into one of three states:

VerdictMeaningWhat the client sees
SecureFull chain of trust verified from the root down; signatures check outNormal answer (AD bit set)
InsecureThe zone is provably unsigned — a validated proof shows no DS record at some delegationNormal answer, no AD bit
BogusThe zone should be signed, but validation failed: missing, expired, or invalid signaturesSERVFAIL

The key distinction: insecure is not an error. Most domains on the internet are unsigned, and a validating resolver passes their answers through untouched. Bogus is the only failure state: validation was attempted and the cryptography didn’t hold up.

Cause 1: system clock drift

Every DNSSEC signature (an RRSIG record, RFC 4034) carries explicit inception and expiration timestamps. If your validator’s clock sits outside that window, every signature on the internet looks expired or not-yet-valid, and everything goes bogus at once.

This is the classic Raspberry Pi failure mode: no battery-backed RTC, so after a power cut the clock boots wrong — and broken DNS can prevent NTP from ever correcting it. Even a drift of a couple of minutes is enough to trip tightly-signed zones.

Check and fix:

timedatectl status          # look at "System clock synchronized"
sudo timedatectl set-ntp true
sudo systemctl restart systemd-timesyncd

For Pis that lose power often, a DS3231 RTC module costs a few euros and ends the problem permanently. To break the bootstrap deadlock, point NTP at an IP address instead of a hostname.

Telltale sign: every signed domain fails at once, starting after a reboot or power outage.

Cause 2: your upstream strips RRSIGs or isn’t DNSSEC-aware

A validating forwarder — Pi-hole with DNSSEC enabled, EliHole, dnsmasq — doesn’t run full recursion itself. It asks an upstream for the records plus their signatures (by setting the DO bit) and validates what comes back. No RRSIGs from upstream, no validation.

Common offenders:

  • ISP resolvers without DNSSEC support
  • Home routers acting as DNS proxies that drop EDNS0 options and RRSIG records
  • Forwarding to the router’s IP instead of a real resolver

Test your upstream directly:

dig @192.168.1.1 example.com A +dnssec +multi

A DNSSEC-capable upstream returns an RRSIG A ... record alongside the A record. If you only get the bare A record, that upstream can never feed a local validator — switch to one that passes signatures through: 1.1.1.1, 8.8.8.8, or 9.9.9.9.

This is also where Pi-hole’s “BOGUS (refused upstream)” message comes from: the upstream itself answered SERVFAIL or REFUSED, so the local validator had nothing to check.

One related point: encrypting the path to your upstream with DoH or DoT protects the query in transit, but it does not authenticate the DNS data — only DNSSEC does that. The two solve different problems, as covered in running DoH without cloudflared.

Cause 3: the domain’s signatures are genuinely broken

Sometimes the verdict is correct: the operator let an RRSIG expire, rolled a key without updating the parent DS record, or signed with a mismatched key. Validation then fails for everyone, and only the operator can fix it.

Confirm it’s them, not you:

# Full validation trace — shows exactly which link in the chain fails
delv example.com A +rtrace

# Inspect signature validity dates by hand
dig example.com A +dnssec +multi
# RRSIG fields: expiration then inception, format YYYYMMDDHHmmSS

delv (ships with BIND utilities) is the most honest tool here: it performs validation itself and prints fully validated, unsigned answer, or the precise failure reason — RRSIG has expired, no valid signature found.

For a visual second opinion, DNSViz renders the entire chain of trust from root to domain and marks every broken edge in red, including which RRSIG expired and when. If DNSViz shows errors and dig @1.1.1.1 also SERVFAILs, stop debugging your setup — the domain is broken at the source. Wait, contact the operator, or add a temporary per-domain exception if you must reach it.

Cause 4: local TLDs and private zones (.lan, .home, .local)

Private names like nas.lan or printer.home go bogus for a structural reason: the root zone is signed, and it can cryptographically prove that the .lan TLD does not exist. When your router fabricates an answer for nas.lan, a strict validator sees a positive answer for a name the root proves cannot exist — exactly what a spoofing attack looks like.

This matters for negative answers too: in DNSSEC, “this name doesn’t exist” must itself be proven with signed NSEC or NSEC3 records (RFC 4035 §3.1.3). An unsigned NXDOMAIN carries no such proof, so it can’t validate either.

Fixes, in order of preference:

  • Use a real subdomain you own for internal names (nas.home.example.com) — it inherits a legitimate chain of trust, or is provably insecure if unsigned.
  • Tell the validator the zone is intentionally insecure: domain-insecure: "lan." in Unbound, or answer the zone locally before validation happens.
  • .local belongs to mDNS/Bonjour and should never reach a unicast resolver at all. If .local queries hit your sinkhole, fix the client’s resolver configuration.

Cause 5: double validation

Running a validating forwarder behind a validating recursive resolver — say, EliHole or Pi-hole with DNSSEC on, forwarding to your own Unbound that also validates — works most of the time but fails confusingly at the edges:

  • Clock skew between the two boxes: a signature near its validity boundary passes on one and fails on the other.
  • The upstream answers from cache without the full signature set the downstream needs, producing a bogus verdict for an answer the upstream itself considers secure.
  • CD-bit handling: the downstream should set the Checking Disabled bit so the upstream hands over even bogus data. If the upstream validates anyway and SERVFAILs first, the downstream can’t tell why — “refused upstream” with no detail.

The practical rule: validate in exactly one place, wherever you have the best visibility. The symptom to watch for: one resolver says a domain is fine while the box in front of it intermittently reports bogus.

Cause 6: middleboxes mangling EDNS0 and large UDP responses

DNSSEC makes DNS answers big. A root-zone DNSKEY response has reached 1,425 bytes during key rollovers — beyond what many networks deliver in one unfragmented UDP packet. Fragmented UDP is widely dropped by firewalls, and some middleboxes strip EDNS0 options outright. The result: signatures silently never arrive, and the validator reports bogus for answers that are perfectly fine.

Diagnose by comparing transports and buffer sizes:

dig example.com DNSKEY +dnssec            # default UDP, large buffer
dig example.com DNSKEY +dnssec +bufsize=1232
dig example.com DNSKEY +dnssec +tcp

If TCP succeeds where plain UDP times out or truncates, the path is the problem. Fixes:

  • Allow TCP on port 53 end to end — TCP fallback after a truncated answer is mandatory resolver behavior, and DNSSEC frequently needs it.
  • Cap the EDNS buffer at 1232 bytes, the DNS Flag Day 2020 recommendation that avoids fragmentation on practically all real-world paths.

How EliHole turns this from guessing into reading

These bugs take evenings to chase because most setups give you only a bare SERVFAIL and hide the verdict.

EliHole validates the full DNSSEC chain of trust from the ICANN root anchor itself and labels every query in its log with a per-query verdict — secure, insecure, or bogus — turning SERVFAIL debugging from blind guessing into reading a label. All-bogus-everywhere points at your clock or upstream, a single bogus domain points at DNSViz, and bogus .lan names point at private-zone handling.

Two design choices matter here. Enforcement mode is off by default — verdicts are recorded but answers still flow, so enabling DNSSEC can’t break your network while you investigate; switch it on and EliHole returns SERVFAIL on bogus answers and sets the AD bit on secure ones. And for unsigned delegations, EliHole requires signed NSEC/NSEC3 proofs of DS absence before treating a zone as insecure, closing the downgrade attack where an attacker strips signatures and claims the zone was never signed.

If you’re weighing it against Pi-hole’s dnsmasq-based validation, the EliHole vs Pi-hole comparison covers the differences, and installing with Docker takes about five minutes.

The toolbox, summarized

ToolWhat it tells you
timedatectl statusWhether your clock can invalidate every signature at once
dig +dnssec +multiWhether RRSIGs arrive, and their validity windows
dig @upstream domain DNSKEYWhether your upstream is DNSSEC-capable
delv +rtraceA local validation run with the exact failure reason
dig +tcp / +bufsize=1232Whether middleboxes are eating large responses
dnsviz.netThe whole chain of trust, with broken links in red

SERVFAIL with DNSSEC enabled is rarely random. It’s a clock, an upstream, a broken domain, a private zone, a duplicated validator, or a hostile middlebox — roughly in that order. Walk the checklist top to bottom and you’ll have the answer in minutes, not evenings.

Frequently asked questions

What is the difference between DNSSEC bogus and insecure?
Insecure means the zone isn't signed: the resolver found a proven absence of a DS record at a delegation, so there is nothing to validate. That's normal and most of the internet works this way. Bogus means the zone claims to be signed but validation failed — missing, expired, or wrong signatures. Resolvers return SERVFAIL for bogus answers, never for insecure ones.
Why does Pi-hole show BOGUS (refused upstream)?
Your upstream resolver refused the query or returned SERVFAIL itself, so the local validator never got signatures to check. The usual culprits are an upstream that isn't DNSSEC-capable, a router DNS proxy stripping RRSIG records, or system clock drift on the Pi. Check timedatectl status first, then switch to a validating upstream like 1.1.1.1 or 9.9.9.9.
How do I tell whether the domain is broken or my resolver is at fault?
Query a known-good validating resolver directly: dig @1.1.1.1 example.com +dnssec. If that also returns SERVFAIL and dnsviz.net shows red errors in the chain, the domain operator's signatures are broken and only they can fix it. If 1.1.1.1 answers normally, the problem is on your side — clock, upstream, or network path.
Should I disable DNSSEC to fix SERVFAIL errors?
Not as a first move. SERVFAIL on a bogus answer is DNSSEC doing its job — refusing data that fails cryptographic checks. Disabling validation removes your protection against cache poisoning and spoofed answers. Work through clock, upstream, and network causes first; if a single domain is genuinely misconfigured, wait it out or add a temporary per-domain exception.