Cloudflare 1.1.1.1 Outage Explained: How CNAME Ordering Broke the DNS Service (2026)

The recent article titled "What came first—the CNAME or the A record?" sheds light on a significant issue that caused a disruption in Cloudflare's widely used 1.1.1.1 DNS service. The core of the problem stems from ambiguities in the Request for Comments (RFC) specifications related to the order of DNS records. This uncertainty ultimately led to outages for users relying on Cloudflare’s service, prompting the company to suggest a more explicit specification.

On January 8, during a routine update, the sequence in which CNAME records were returned in DNS responses was altered. This change resulted in certain DNS clients experiencing failures when trying to resolve domain names because they were designed to expect the CNAME records to appear first in the response. Although most contemporary software does not consider the order of DNS records to be critical, the Cloudflare team discovered that some implementations are built on the assumption that CNAME records should precede other record types.

When this record order was modified, it led to a failure in DNS resolution, causing a notable outage for the popular 1.1.1.1 DNS service. Sebastiaan Neuteboom, a systems engineer at Cloudflare, elaborated on the situation and the timing of the change:

"While we were working on improvements to reduce memory usage in our caching system, we made a slight adjustment to the order of CNAME records. This change was made on December 2, 2025, tested on December 10, and began deployment on January 7, 2026."

When a DNS resolver queries a name associated with a CNAME record, it typically encounters a chain of alias records that connect the original name to the final IP address. As part of this process, the resolver caches each step along the way, complete with its own expiry time. Cloudflare explains that if any segment of this chain has expired from the cache, the resolver will only fetch the outdated component again, combining it with the valid parts to create a full response.

Previously, the coding process would generate a new list, placing the existing CNAME chain at the start before appending new records. However, to optimize memory usage, the code was updated to attach the CNAMEs directly to the existing answer list instead. Consequently, this modification resulted in instances where CNAME records were found at the end of the response, following the final resolved address.

For example, a typical response may look like this:
;; QUESTION SECTION:;; www.example.com. IN A;; ANSWER SECTION:cdn.example.com. 300 IN A 198.51.100.1www.example.com. 3600 IN CNAME cdn.example.com.

While many DNS client implementations, like systemd-resolved, do not rely heavily on record order, others—such as the getaddrinfo function found in glibc—process the resolution chain by maintaining an expected name for the records and progressing sequentially, anticipating that CNAME records will appear before any final answers. This discrepancy has sparked discussions among users. One Redditor commented:

"On one hand, I greatly admire their thorough approach to post-mortems and their high engineering standards. But, I can't shake the feeling that they might lack proper testing protocols and a culture that recognizes the global impact of their changes."

A lively debate unfolded on Hacker News, where users examined whether the RFC was indeed ambiguous regarding the subtle differences between RRsets and RRs in message sections or if the Cloudflare developers misinterpreted it. Patrick May notably pointed out:

"This is a classic illustration of Hyrum's Law: 'With a sufficient number of users of an API, it doesn't matter what you promise in the contract; all observable behaviors will be relied upon by someone.' This also reflects a failure to adhere to Postel's Law: 'Be conservative in what you send, be liberal in what you accept.'"

In response to the incident, Cloudflare has prepared an Internet-Draft to be discussed at the IETF, proposing an RFC that clearly outlines how CNAME records should be handled in DNS responses.

According to the timeline shared by Cloudflare, they initiated a global rollout of the fix on January 7 and managed to reach 90% of their servers by January 8 at 17:40 UTC. The company promptly acknowledged the incident and began reversing the changes shortly after, completing the rollback by 19:55 UTC on the same day.

This incident raises important questions about the reliability of DNS standards and the responsibilities of companies managing such critical infrastructure. How can we ensure that future updates do not lead to similar outages? What measures should be taken to strengthen testing protocols and communication within development teams? Share your thoughts and engage in the conversation!

Cloudflare 1.1.1.1 Outage Explained: How CNAME Ordering Broke the DNS Service (2026)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Duncan Muller

Last Updated:

Views: 5606

Rating: 4.9 / 5 (79 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Duncan Muller

Birthday: 1997-01-13

Address: Apt. 505 914 Phillip Crossroad, O'Konborough, NV 62411

Phone: +8555305800947

Job: Construction Agent

Hobby: Shopping, Table tennis, Snowboarding, Rafting, Motor sports, Homebrewing, Taxidermy

Introduction: My name is Duncan Muller, I am a enchanting, good, gentle, modern, tasty, nice, elegant person who loves writing and wants to share my knowledge and understanding with you.