This page eventually gets around to the point, but the intro paragraphs don't tell me what's actually happening, and are written in an extremely confusing way. I almost just closed the tab since it's not even clear that any explanation is coming.
"This change affects only sites which operate software which is not following published standards. Are you affected?"
Affected _in what way_? Instead of linking to the Wikipedia page about what DNS is (if your visitor doesn't know that, then the rest of the page is pointless), there needs to be a sentence or three up top telling me what is _actually happening_ and what the _actual impact_ could potentially be, and for that matter, which standards you are even talking about. Then, tell me what the forms you are asking me to type my domain names into are going to do.
As it is, this page is just going to leave a lot of people confused.
A number of DNS software and service providers have announced that we will all cease implementing DNS resolver workarounds to accommodate DNS authoritative systems that don’t follow the EDNS protocol.
Domains served by DNS servers that are not compliant with the standard will not function reliably after February 1, 2019, and may become unavailable.
If your company’s DNS zones are served by non-compliant servers, your online presence will slowly degrade or disappear as ISPs and other organizations update their resolvers. When you update your own internal DNS resolvers to versions that don’t implement workarounds, some sites and email servers may become unreachable.
Why make this change now?
Extension Mechanisms for DNS were specified in 1999, with a minor update in 2013, establishing the ‘rules of the road’ for responding to queries with EDNS options or flags. Despite this, some implementations continue to violate the rules. DNS software developers have tried to solve the problems with the interoperability of the DNS protocol and especially its EDNS extension (RFC 6891 standard) by various workarounds for non-standard behaviors. This is not unlike the way a driver with the right-of-way might hesitate at an intersection before proceeding if there were another driver in the intersection behaving erratically. These workarounds excessively complicate DNS software and are now also negatively impacting the DNS as a whole.
The most obvious problems caused by these workarounds are slower responses to DNS queries and the difficulty of deploying new DNS protocol features. Some of these new features (e.g. DNS Cookies) would help reduce DDoS attacks based on DNS protocol abuse.
> This domain is going to work after the 2019 DNS flag day BUT it does not support the latest DNS standards. As a consequence this domain cannot support the latest security features and might be an easier target for network attackers than necessary, and might face other issues later on. We recommend your domain administrator to fix issues listed in the following techical report: redacted.
It means the authoritative servers of your domain simply does not support EDNS (a 20-year-old protocol...), but it doesn't falsely advertising EDNS support, which is what the Flag Day's targeting, so your situation is good.
Route53 absolutely does support EDNS0 and large answers without needing to truncate.
The minor error reported is that Route 53 does not respond with a "BADVERS" error in response to a query reporting EDNS version 1.
There is no EDNS version 1 standardized, and the specs say to respond with an error when you see a version you don't support. Per the specs Route 53 is "in the wrong". The thinking from the spec authors is that this kind of behavior will make it easier for resolvers to negotiate future EDNS versions and figure out what works. I'm skeptical that this will ever actually work in practice: most protocols tend to have backwards-compatible version upgrades that don't require round-trips.
The thinking on the Route 53 side, or at least my thinking - I wrote Route 53's EDNS support - is that we see incorrect versions occasionally on the wire, from weird clients where people have clearly made some kind of mistake, and that it is better to fail gracefully and give them /an/ answer than an error that could lead to outages. I might be completely wrong about that, but that's the thinking. We tend to always heavily lean in the direction that is likely to increase availability and avoid any potential for outages.
The problem is that that behavior means people won't have to fix the software that's emitting bad version numbers, which will then make it difficult to upgrade to a real EDNS version 1 (or other version) in the future. A similar problem often occurs with other Internet protocol upgrades, as well as with things like Linux kernel syscalls that don't error out on unknown flags. If you don't give an error on things you don't understand, you make it much harder to build newer software.
If you see "incorrect versions occasionally on the wire from weird clients", perhaps the people building those weird clients will know to fix them if they get BADVERS errors.
Personally I have no appetite for breaking any customer. Sure they may be "in the wrong" because they have
an old dig recipe, or a load balancer health check tool, or a latency measurers, or some kind of DNS canary, that mistakenly used a wrong EDNS version, but if we change our behavior and break them they will a) feel it viscerally and b) rightly, blame us for breaking things. We're pretty serious about maintaining backwards compatibility always and treating every API like a promise.
The other side of this is that being too tolerant can lead to network ossification. This impacted TLS1.3's roll-out, which had to be made to look similar enough to TLS1.2 session resumption that many network middle boxes can't tell the difference and let the traffic through. This is less elegant than a cleaner new design that isn't tainted by having to appear like the old formats.
Personally, I worry about this a lot less and consider the extra effort to be backwards compatible very worth it. It think it will still be very easy to find totally unused never-seen-in-the-wild EDNS version numbers; there's no shortage. On the TLS working group we've been through this a few times, having to find unused magic numbers that hadn't been burned as part of the various experiments. We've done with this protocol versions, cipher suites, etc.
On the DNS side; the proposed hypothetical EDNS version negotiation scheme that might show up in the future requires round-trips and state. Resolvers would have to first send an EDNSv1 query, observe it fail, store that state somewhere and then try with EDNSv0. What if the authoritative server rolls back their support? What if the auth service is mixed and only some of the fleet supports the new version? These challenges tend to make that kind of version negotiation impractical ... hence my skepticism that it will ever work like that.
A small part of all this is that the DNS WG isn't really stewarded with these practical issues front of mind. DNAME and DNSSEC have each had backwards-incompatible implications that real-world operators had to fudge around and ignore precisely what the specs say, to be able to satisfy customers. Due to that history, IMO the DNS specs have lost a certain level of moral authority that other specifications still enjoy.
> Personally I have no appetite for breaking any customer. Sure they may be "in the wrong" because they have an old dig recipe, or a load balancer health check tool, or a latency measurers, or some kind of DNS canary, that mistakenly used a wrong EDNS version, but if we change our behavior and break them they will a) feel it viscerally and b) rightly, blame us for breaking things. We're pretty serious about maintaining backwards compatibility always and treating every API like a promise.
That's completely understandable.
What I'm wondering is whether you could accept it for now, but reach out to customers you see it from and provide guidance, in addition to blogging about the issue.
I'm not suggesting "turn it off tomorrow and break people", I'm suggesting "work towards a long-term plan of turning it off".
That's probably what we will do. Rightly or wrongly, the new tool will probably accelerate it because I'm sure we'll be getting support queries from needlessly worried customers. Sometimes that kind of effort is good, but in this case I have sad feelings about that result, because the impetus is misguided. I totally predict that any future EDNS rev will not use a trial-and-fallback kind of negotiation, making all of this work pointless.
What do other DNS services do? If they do the same thing, this behavior makes sense to me. But, if other services do fail these invald requests, then I really don't see how this helps anyone. If these "weird" broken clients can only talk to route53, it's difficult for me to believe that they are production clients and worth the effort of this workaround.
Says someone whose clearly never tried to maintain production uptime :)
Seriously, if you have responsibility for a large system (ie, think the GE network) - you can take the AWS route - and uptime will be good, or you can do the backwards incompatible changes - and everyone will hate you. Seriously, just the printer DNS lookup clients / the email lookup clients / the amount of cruft in a big system is mind bending.
The other thing I'm not getting, your going to be doing DNS round trips to get to version match? DNS is on the critical path to web page response - I hope this is a joke. Put this into a hint in a recursive lookup or something so when I look up what DNS server handles abc.com I (can optionally) see what EDNS version that server supports? Or TXT field or something? Does the error returned say what is supported? Or will the client need to do multiple R/T's to find the correct version? That's can't be right - I'm not a DNS expert, but I'm not seeing the point of this approach.
Quote for those without access:
There is a known bug with Route 53 DNS servers in which we are not RFC compliant in how we handle a specific kind of invalid query. Namely, when Route 53 gets a query with an unknown EDNS version, Route 53 treats the query as a non-EDNS query instead of responding with BADVERS as ednscomp expects.
We expect to have this bug fixed within a year or two. The good news is that this bug is not impacting, so you'll be ok even if we're slow to fix this. The "dnsflagday" news means that servers that don't support EDNS will be treated as unavailable to resolvers. But Route 53 generally supports EDNS0, valid queries will continue to work regardless.
If you have N domains, all served from the same DNS server, you only need to test one of them with their tester, right? This is about whether or not the server handles the EDNS protocol, not about anything to do with the content of any particular domain records, so it either works for them all or for none of them?
I can't really find a simple explanation of what this means. The warnings in the EDNS compliance tester aren't really helpful either. Is there a simple explanation somewhere?
To most people, it's not the "domains" that should be checked, but DNS servers and DNS resolvers, both the authoritative and recursive type. If you are using a major DNS provider for your domain, no action is needed, but just to be sure, use the test tool on the webpage to see if your provider has broken EDNS, and do check your local recursive server.
Classic DNS messages carried by UDP were restricted to 512 bytes, EDNS boosted this restriction and also introduced some flags, and it has been enabled by major DNS servers since 1999. But in practice, many deployments on the authoritative servers are broken, they signal EDNS support, but EDNS replies are silently dropped, due to broken DNS servers, misconfigured router, broken NAT, broken ISP installations, or broken firewalls or other middleboxes.
Previously, various DNS resolvers contained a workaround that disables EDNS as reaction if a DNS query timeout is detected. Now the workaround will be removed. If a DNS resolvers has EDNS but it's broken, it will be marked as a dead server.
"Starting February 1st, 2019 there will be no attempt to disable EDNS as reaction to a DNS query timeout.
This effectivelly means that all DNS servers which do not respond at all to EDNS queries are going to be treated as dead."
So it basically means that authoritative DNS servers are now required to support the EDNS protocol. If not, it is no longer guaranteed the domain will resolve on DNS resolvers.
This is a performance improvement, because the EDNS fallback method requires a timeout.
Edit: My answer isn't correct, see the comments below.
No. It means that content DNS servers (and their concomitant network infrastructure) must either support EDNS properly or ignore it properly. The halfway house of having clients fall back to re-trying without EDNS, because some bad servers failed to send replies (or the network infrastructure that they communicated over failed to send on those replies) in response to EDNS queries, is going away.
No, authoritative servers are simply required to follow standards, EDNS support is not required, none of this is about any sort of forced migration, it is only about discontinuing support for broken software after 20 years of compatibility hacks.
The EDNS fallback method does not require a timeout. A standards-compliant DNS server that does not support EDNS should ignore EDNS records in the request and simply send an old-style response that any EDNS capable resolver is still required to handle correctly. The problem is only with servers that simply don't respond to requests containing EDNS records at all, even though they are required to by the (old, pre-EDNS) standard.
This issue isn't affected by the configuration change on flag day and judging by the slow adoption so far, by the time where edns support really gets mandatory, the current version of the distro at that time will long have updated their built-in PowerDNS version.
The DNS protocol can be layered over UDP or over TCP. In its original form DNS/UDP has some quite draconian packet size limits that are reached quite quickly in the modern world. Originally, this mandated falling back to DNS/TCP. But TCP is significantly more expensive as a transport protocol, especially as the client has already had to try to perform the transaction once over DNS/UDP before falling back to it, and trickier for servers to implement than DNS/UDP.
EDNS0 ameliorated this greatly, allowing clients and servers to keep talking DNS/UDP without falling back to DNS/TCP, up to much larger packet sizes. That is primarily why one would want it, even if one did not want any of the other things that it incorporates.
While packet size is one thing you can do with EDNS, it’s really a mechanism to allow the DNS protocols to add new features, as there’s no version in the DNS header.
Also, a lot of DNS hosts do not allow for large packet sizes over UDP to attempt to reduce the effect of reflection attacks.
"The use of the EDNS(0) padding only provides a benefit when DNS
packets are not transported in cleartext. Further, it is possible
that EDNS(0) padding may make DNS amplification attacks easier.
Therefore, implementations MUST NOT use this option if the DNS
transport is not encrypted."
Apparently it does have a padding proposal, but it wasn't thought through very well. They only had the use case of confidentiality in mind, and decided to deal with amplification by forbidding cleartext use, no matter what the response:request size ratio is.
basically small clarifications and enhancements of the original DNS standard with rather huge impact which allow the functioning of modern DNS features like DNSSEC (due to the need for larger messages, which need EDNS compliance, as EDNS mandates larger minimum supported message sizes), and EDNS in itself is a signaling mechanism to indicate support for these newer features (like bigger message sizes, DNSSEC, etc)
If I remember correctly, the big thing about EDNS is that DNS UDP datagrams were originally restricted in size to a total length that would be unlikely to see IP fragmentation on a network of 1980s hardware, meaning that in practice DNS packets had to be much smaller than the maximum reasonable size of an IP packet. In addition to setting up extended options, EDNS0 also allows DNS packets to be large.
Yes, this page provides virtually no detail either before or after getting a SLOW or SOME PROBLEMS result. I spent a lot of time in the weeds of EDNS a couple years back and I tested that company's domain, and have frankly no idea what the results mean, what the change is, etc. Seems like a source article trying to encourage change could have been a little helpful beyond "upgrade yo' shit" without explaining what it means.
If you're running a DNS server and have problems, it's probably not a simple matter of blowing out the latest version of your software. You probably have a rat's nest of thousands of entries added over a decade or more, different firewall policies in front of different authoritative servers, caches, etc.
I am very familiar with DNS. I run the tests and get minor failures. The test results barely make any sense and don't really offer good pointers on how to resolve them.
The documentation is obtuse in typical ISC fashion.
I could probably figure it out with more effort. But the average DNS Joe won't, they'll ignore this and move on.
> This domain will face issues after the 2019 DNS flag day. It will work in practice, BUT clients will experience delays when accessing this domain. We recommend you request a fix from your domain administrator!
Weird since they are listed as one of the supporters
I think the test works by testing querying an authoritative server with EDNS on, and see if the request times out. Naturally, there will be false positives due to random packet loss on the Internet.
More over, this isn't about "you have to implement EDNS" (you should, but you don't have to). This is about "Either implement EDNS or when asked for EDNS, just say you don't offer that".
This isn't so much "You must offer a vegetarian option" as "Don't offer a vegetarian option but then when anybody orders it you force a steak down their throat while screaming 'Choke on it you hippie!'".
The industry's complacency in getting broad EDNS support has been a huge embarrassment imnsho. This is proof that we don't need to put up with "but the middle boxes, argh!!?" excuses if the industry works together to force people to be compatible with standards.
Thanks to all those involved with making this project a reality. It really helps everyone on the internet, whether they know it or not.
Presumably they are not going to remove the workarounds until the software has actually had a chance to disseminate? For example the resolvers they list are all new releases and aren't in any current Linux distro. Which means they are saying that in a couple weeks all Linux resolvers could stop resolving some domains unless they adopt an unsupported release of the resolver?
They don't list it, but presumably the same applies to the Authoritative servers. I run a pretty vanilla NSD install and it fails several tests. NLNETLABS (the authors/maintainers of nsd/unbound) is listed as a supporter, but they don't mention anything about this on their site nor does their documentation mention anything about settings to control these features.
If this will actually have the consequences they describe it was terribly mismanaged.
Just to clear the record for NSD users... I misread the results and my NSD setup (4.1.0) does pass the test. It was my secondary at buddyns.com that is having issues. I'll report the issue to them, but I'm still concerned about the supported recursive resolvers not being compliant.
I am a bit confused about the versions they say are safe. I'm running Bind 9.11 and their tester says there are no issues, and it supports all the necessary EDNS behaviour.
I think the point is that those versions mentioned will remove support for the bad behaviour, but it's not clear what you do need to do to prepare. I guess the only steps you can take are to run the tests and update to the latest version available.
The default powerdns server in debian stable also fails. This is a reasonable common setup for DNS servers. It is a bit weird to have a server running current/stable that does not pass the test.
Apparently the fix is to use the external repository from powerdns.
Because the OP link is spreading awareness. Your link states nothing about what it does and probably the output of the one you linked is much less understandable to the population at large as well.
Well, I guess 'DNS flag day' is a catchy name, and the text is intentionally FUD-y, so it has better chance of viral distribution. I just like the concise summary on ISC page better, needs less scrolling.
I have conflicting thoughts on this. I'm having a problem with the credibility of this whole thing. Very few people know about it. Twitter @dnsflagday has 94 followers, telling me it's just an individual account with no public validation yet. The ISC.org site also doesn't seem to be authoritative.
I understand the technical problem but I'm not convinced this is as big a deal as it seems. There are some big names associated with the flag day event, lending credibility. But there are also big names like Slack.com, Twitter.com, GitHub.com, BitBucket.com, and web hosting companies, that all fail the EDNS/dig test.
So - the stated threat is that sites will go dark on 1 Feb, but do we really expect that to happen? Might these companies simply be threatening to reject invalid EDNS responses? Will people blame Them for rejecting queries rather than the services being resolved and their DNS?
I can't decide if this is something to scream loudly about, or to just let it happen. I'm not asking for opinions on that. I'm asking for a credible source of information to confirm what is truly expected.
The site doesn’t claim that slack.com, twitter.com, etc. will go down when the flag is flipped. Just try entering those domains into the testing widget within the “Domain owners” section, and you can see that their status is not so dire:
> Minor problems detected!
> This domain is going to work after the 2019 DNS flag day BUT it does not support the latest DNS standards. As a consequence this domain cannot support the latest security features and might be an easier target for network attackers than necessary, and might face other issues later on. We recommend your domain administrator to fix issues listed in the following
technical report [link to report]
As other comments have said, the new rule does not require that all domains support EDNS, merely that all domains respond with “EDNS not supported” when applicable instead of pretending that they support it. Those sites you linked follow that rule.
Flag Day means a software change that is not backward-compatible, especially in a network system where upgrade is scheduled early and everybody must deploy upgrade at (before) the date, or the system will stop working. It came into use when a change was made to the definition of the ASCII character set during the development of Multics. The change was scheduled for Flag Day (a U.S. holiday, https://en.wikipedia.org/wiki/Flag_Day_(United_States) ), June 14, 1966.
which is the cheapest DNS service that allows one to buy "Registry Lock" (not the simpler registrar lock). I'm talking out-of-band confirmation of any DNS changes ?
The big ones like Mark Monitor, etc are super expensive. I thought Cloudflare Business DNS would be cheaper, but they start at 1000-2000$ per year.
Any recommendations ? I have heard of Safenames..but not many others.
It depends on the TLD. For example for my national domain, I can just send a request form with my signature to the national TLD registry operator, and that's it. My domain becomes untransferable, and I also have an option to disable change of NS set.
It's free, except for the official signature verification, that can be done at the post office for $1.5.
Tinydns doesn't need patches to continue working after Feb. 1, 2019. The article website is about authoritative servers that give broken EDNS answers. You either want to correctly support EDNS, or send normal pre-EDNS answers. Both work, and tinydns does the latter.
I don't know of any latest-version DNS software that breaks in this case. AFAIK this is exclusively a middleware problem, as is quite often the case.
tinydns speaks the unextended protocol, but properly handles clients that try to speak EDNS0 to it.
It does not exhibit any of the faulty behaviours that are common to bad EDNS0 implementations. It does not blindly echo various parts of the query back in the response, for example. Indeed, the ISC test actually complains that tinydns does not blindly echo an OPT record back. (-:
But you cannot speak DNS/UDP to it with datagrams over 0.5KiB. Patches are needed to make it do anything but wholly ignore EDNS (of any version) and treat it as though it were the unextended protocol. But that is not what this flag day is about.
Indeed, given that tinydns does not send DNS/UDP responses greater than 0.5KiB, it does not trigger any EDNS0 problems in the network infrastructure that a server is using, either.
Ironically, one of the people who tried to rewrite djbdns to have EDNS0 support discovered exactly the problem that this flag day is about: The bad content DNS servers for outlook.com in 2017-2018 gave bad responses to clients that attempted to speak EDNS0.
One pernicious problem I had with tinydns and EDNS support was in an environment where one firewall, for whatever reason, did not permit tcp/53. So DNS replies worked, EDNS0 replies worked, but not from certain clients that insisted on retrying the query over TCP.
Needless to say, after tons of time getting into the weeds with EDNS, and STILL not understanding why certain clients weren't working, that was a fun one to solve.
This likely means that your DNS server IP is actually a load-balanced pool of servers, with some running compliant software and others not. If you see any errors when running the test repeatedly, I’d check with your DNS provider, and share the test results.
I've got 2 clients using them and wish I could say this info surprised me, but they are a terrible registrar & DNS provider. I've asked both of my clients to reach out to them about the issue but don't have any information past that.
What is happening is that all of the stuff that I have been explaining in the "Local fix" section for a decade and a half, alongside the practice that clients had of timing out and then retrying the transaction without the extensions, is now being declared moot. If DNS lookup does not work because of this problem, it is being declared entirely the server-end's problem. Clients are no longer going to be expected to put such local fixes in place, or have timeout and retry bodges.
As someone who's been required to maintain bug-compatible software before, this fills me with great joy. Congratulations on getting to deprecate the brokenness!
EDNS is basically DNS2, and it has better support for handling larger messages (needed for DNSSEC and such), and EDNS itself is about signaling what features you support.
More than that: DNS servers made special exceptions for servers that do not implement DNS properly and ignore some requests entirely. They're now stopping that: servers either have to support a 20-year old extension spec to DNS, or follow the original, 30+ year old specs and answer the query while ignoring the extension.
"This change affects only sites which operate software which is not following published standards. Are you affected?"
Affected _in what way_? Instead of linking to the Wikipedia page about what DNS is (if your visitor doesn't know that, then the rest of the page is pointless), there needs to be a sentence or three up top telling me what is _actually happening_ and what the _actual impact_ could potentially be, and for that matter, which standards you are even talking about. Then, tell me what the forms you are asking me to type my domain names into are going to do.
As it is, this page is just going to leave a lot of people confused.