This week I got paged at 2am because a service running in an EKS cluster suddenly couldn’t resolve any of our internal *.internal.company.com hostnames. Public DNS resolved fine. The Route 53 private hosted zone had all the records. The VPC was associated with the zone. I spent an hour in a panic before realizing the VPC had been updated to use a custom DHCP options set a few hours earlier — one that pointed DNS to a corporate forwarder that didn’t know about the private zone. Route 53 was blameless. The DHCP change had cut the VPC off from VPC DNS.
Route 53 issues are often not Route 53 issues — they’re DHCP, VPC, or Resolver rule issues that manifest as DNS failure. Here’s how to trace them quickly.
The Problem
DNS queries fail or return unexpected results:
| Symptom | What It Means |
|---|---|
NXDOMAIN for a private record |
Private hosted zone not associated with the VPC, or VPC not using VPC DNS |
SERVFAIL intermittently |
Resolver endpoint unavailable, or forwarding loop |
| Correct answer but wrong IP | A conflicting record in another zone (e.g., split-horizon failure) |
| Resolution works from EC2 but not from Lambda | Lambda VPC config uses different subnets without DNS attributes |
| Health check flapping | Health checker source IPs blocked, or TCP check hitting a slow endpoint |
The DNS layer fails silently — applications usually log it as a connection error, a timeout, or a generic exception, which sends teams hunting the wrong problem.
Why Does This Happen?
- VPC not associated with the private hosted zone: A private hosted zone is only resolvable from VPCs explicitly associated with it. New VPCs, new accounts in an organization, or cross-account VPCs all require explicit association — creating the zone is not enough.
- VPC DNS hostnames or DNS resolution disabled: Every VPC has two flags,
enableDnsHostnamesandenableDnsSupport. If either is off, instances can’t use the VPC’s DNS resolver at.2and private hosted zones won’t work. - Custom DHCP options set pointing to external DNS: A DHCP options set with custom DNS servers overrides the VPC resolver. Instances then use the listed servers, which don’t know about private zones unless you’ve built Resolver rules to forward them.
- Resolver endpoint rules missing or misrouted: In a hybrid setup, outbound Resolver endpoints forward specific domains to on-prem DNS. If the rule targets aren’t reachable or the rule domain is too broad (e.g., forwarding
amazonaws.com), it breaks AWS service resolution. - Health check source IPs blocked: Route 53 health checkers query from a specific set of IP ranges. If a security group or WAF blocks those ranges, every check fails and the health check alarm fires even though the endpoint is healthy.
- DNS answer TTL too long during migration: A record’s TTL was 1 hour when you changed it — clients keep hitting the old value for up to an hour, making the “DNS change” appear to have failed.
The Fix
Step 1: Confirm Resolution Path from the Failing Host
First, find out which DNS server the host is actually using. From Linux:
cat /etc/resolv.conf
The nameserver line should be the VPC’s .2 address (e.g., 10.0.0.2). If it’s something else, your DHCP options set is overriding.
Test resolution directly against the VPC resolver:
dig @10.0.0.2 internal.company.com
If that works but dig internal.company.com (default resolver) doesn’t, the DHCP options set is the problem.
Step 2: Verify VPC DNS Attributes
Both flags must be true:
aws ec2 describe-vpc-attribute \
--vpc-id vpc-0abc123def \
--attribute enableDnsSupport \
--query "EnableDnsSupport.Value"
aws ec2 describe-vpc-attribute \
--vpc-id vpc-0abc123def \
--attribute enableDnsHostnames \
--query "EnableDnsHostnames.Value"
If either is false, enable it:
aws ec2 modify-vpc-attribute \
--vpc-id vpc-0abc123def \
--enable-dns-hostnames '{"Value":true}'
aws ec2 modify-vpc-attribute \
--vpc-id vpc-0abc123def \
--enable-dns-support '{"Value":true}'
Step 3: Verify Private Hosted Zone Association
List all VPCs currently associated with the zone:
aws route53 get-hosted-zone \
--id Z1ABCDEF12345 \
--query "VPCs[*].{VPC:VPCId,Region:VPCRegion}" \
--output table
If the failing VPC is missing, associate it:
aws route53 associate-vpc-with-hosted-zone \
--hosted-zone-id Z1ABCDEF12345 \
--vpc VPCRegion=us-east-1,VPCId=vpc-0abc123def
For cross-account association, the zone owner must first authorize the association:
aws route53 create-vpc-association-authorization \
--hosted-zone-id Z1ABCDEF12345 \
--vpc VPCRegion=us-east-1,VPCId=vpc-0abc123def
Then the VPC owner runs associate-vpc-with-hosted-zone using their own credentials.
Step 4: Check DHCP Options Set
aws ec2 describe-vpcs \
--vpc-ids vpc-0abc123def \
--query "Vpcs[0].DhcpOptionsId" \
--output text
aws ec2 describe-dhcp-options \
--dhcp-options-ids dopt-0abc12345 \
--query "DhcpOptions[0].DhcpConfigurations" \
--output json
If domain-name-servers is anything other than AmazonProvidedDNS, your VPC is using custom DNS. For private hosted zones to work, either revert to AmazonProvidedDNS or set up Route 53 Resolver rules forwarding your private domain to VPC DNS.
Step 5: Fix Hybrid DNS via Resolver Rules
If you need custom forwarding (e.g., forwarding corp.local to on-prem), use Resolver rules instead of DHCP options:
aws route53resolver create-resolver-rule \
--creator-request-id $(date +%s) \
--name forward-corp-local \
--rule-type FORWARD \
--domain-name corp.local \
--resolver-endpoint-id rslvr-out-abc12345 \
--target-ips "Ip=10.1.1.53,Port=53" "Ip=10.1.2.53,Port=53"
Then associate the rule with the VPC:
aws route53resolver associate-resolver-rule \
--resolver-rule-id rslvr-rr-abc12345 \
--vpc-id vpc-0abc123def
This lets the VPC resolver handle everything else while still forwarding specific domains on-prem.
Step 6: Fix Health Check Failures
Check the actual health check status and failure reason:
aws route53 get-health-check-status \
--health-check-id abc12345-def6-7890-ghij-klmnopqrstuv \
--query "HealthCheckObservations[*].{Region:Region,Status:StatusReport.Status}" \
--output table
If some regions succeed and others fail, the target is regional-sensitive. If all fail uniformly, check whether you’re blocking the Route 53 health checker IP ranges:
aws route53 get-checker-ip-ranges \
--query "CheckerIpRanges" \
--output text
Add these CIDR ranges to your security group or ALB’s allow list.
Step 7: Verify Resolution and Propagation
After changes, flush the caller’s cache and resolve again:
dig +short internal.company.com
For TTL-related issues, check current TTL on the record:
aws route53 list-resource-record-sets \
--hosted-zone-id Z1ABCDEF12345 \
--query "ResourceRecordSets[?Name=='internal.company.com.'].{Name:Name,TTL:TTL,Type:Type}" \
--output table
Reduce TTL before making planned record changes in the future:
aws route53 change-resource-record-sets \
--hosted-zone-id Z1ABCDEF12345 \
--change-batch file://change.json
Is This Safe?
Mostly yes. All the describe and get commands are read-only. Associating a VPC with a private hosted zone is additive — it doesn’t remove existing associations. Enabling VPC DNS attributes is non-disruptive. The riskier changes are DHCP options modifications and Resolver rule associations, which take effect on DHCP lease renewal or immediately for new queries — these can surprise long-running connections that cached the old DNS. Always coordinate DHCP options set changes during a maintenance window if possible.
Key Takeaway
When DNS is broken in AWS, the actual root cause is almost never in Route 53 itself. It’s usually the VPC’s DHCP options pointing to external DNS, the VPC not being associated with the private hosted zone, or VPC DNS attributes disabled. Start by checking what resolver the host is actually using (resolv.conf or equivalent) and work outward from there. Use Route 53 Resolver rules instead of DHCP options whenever you need custom forwarding — they’re specific to domains and don’t break everything else.
Have questions or ran into a different Route 53 issue? Connect with me on LinkedIn or X.