Public Suffix Lookups Without Parsing the PSL

Tackling a New Challenge with the DNS

During my work at SSE (I was working on the security-first DNS provider deSEC), I was facing the need for quickly looking up the so-called Public Suffix for a given domain name. If the domain name is, say, amazon.co.uk, then the Public Suffix would be co.uk. Such lookups are usually done by loading and parsing the Public Suffix List (PSL) and then matching the last part(s) of the domain name against the list, eventually settling on the longest match.

As it turns out, this approach requires application awareness of the PSL and comes with significant maintenance overhead — more on that later. Let’s first understand both benefits and quirks of the PSL, and then see why it’s more complicated than it looks at first sight. Employing the DNS as a key-value store will lead an elegant way out of the mess.

The Idea behind the PSL

The PSL has a wide range of applications, as there are several (usually security-related) scenarios in which applications or service providers need to make a policy decision based on the public suffix of the domain name at hand. A special case is when the domain name is itself a public suffix. For example, Certificate Authorities most likely would not want to issue a wildcard TLS certificate for the name *.co.uk, although a wildcard certificate for *.t.co (Twitter’s URL shortener service) may be perfectly fine.

Here are some common applications in which knowledge of a domain name’s public suffix is required:

In order to decide how exactly cookie scoping should be restricted across domains, browsers determine public suffixes. Some browsers also highlight a domain’s public suffix in the address bar to aid visual inspection and hamper phishing.
Certificate Authorities (CAs) checking for wildcard misissuance should not allow a wildcard certificate for a public suffix name such as *.co.uk.
Some CAs such as Let’s Encrypt limit the number of certificates that can be requested for a given domain, including subdomains (within a given interval, such as per week). CAs need to be aware of each domain’s public suffix in order to decide on which part of the domain the limit should be applied. For example, a certificate request for console.cloud.google.com should be counted towards the google.com name (2nd-to-last label) while a certificate request for www.google.co.uk would be counted towards google.co.uk (3rd-to-last label!).
The DMARC email authentication protocol, intended for fighting spam by validating the message origin, is configured via DNS records on the registrable domain (organizational domain in DMARC speak), i.e. the domain name whose direct parent is the domain name’s public suffix. For example, DMARC for the address admin@services.staff.example.org is configured as a DNS record under example.org (to be precise: as a TXT record at _dmarc.example.org). DMARC validators need to know where to look for that record: Should they use _dmarc.staff.example.org or _dmarc.example.org? The PSL answers this question.
As a consequence of a little known subtlety in the specification of the DNS, DNS providers too need to be aware of what’s a public suffix.

I first encountered the need for ad-hoc PSL lookups while working on the deSEC DNS hosting platform, and so I would like to devote a separate section to this last use case. If you are not interested in the intricacies of setting up a DNS platform, you can skip the next section.

Deep Dive: The Relevance of Public Suffixes for a DNS Provider

Here’s why: In DNS, it is possible to store conflicting information on the same name server. For example, one customer could register the zone example.com and create a DNS record for the subdomain www with IP address 1.2.3.4, while another can register the zone www.example.com and create a record there, with IP address 6.6.6.6. Each customer "owns" a part of the global DNS tree, but the two overlap. If this situation occurs, RFC 1034 Sec. 4.3.2 prescribes that the most specific zone (subtree) wins.

In other words, DNS queries for www.example.com are answered with 6.6.6.6, which might not be what the owner of example.com was expecting. (Such subzone takeovers may include dangerous names such as _acme-challenge.www.example.com, allowing the attacker to obtain a TLS certificate for the parent name. You can find some real world cases in a paper I wrote in 2018.)

To avoid this problem, we introduced a check at deSEC to reject zone registrations for domain names if any parent domain name is owned by another user. — Great, we’re safe then, aren’t we?

Unfortunately, no. With this check in place, a malicious (or imprudent) user may register co.uk and, as a consequence, cause all other users to be blocked from registering their legitimate <something>.co.uk names: the security check would interpret such registrations as a hijacking attempt. We find ourselves in a catch-22: it seems that we have no choice but to either allow a malicious user to hijack another customer’s subdomains, or — with the security check in place — to allow them to occupy a large chunk of the DNS tree, such as co.uk!

This is where the PSL comes in: unless you’re a registry and managing a public suffix (such as a top level domain), there is no legitimate use case for any user to register a domain that is a public suffix itself. After all, domain owners purchase a domain name under a public suffix, and then need DNS services for that registrable domain, not for the public suffix. Thus, barring a few very special cases, it is safe to reject registrations of domain names which are public suffixes themselves.

By combining the two security checks (rejecting both subzone registration and public suffix registration), safe operation of the DNS service is ensured. But it comes at a price: At deSEC, we need to perform a public suffix lookup for each domain name for which registration is attempted.

Practical Complications

All of the above types of applications, and possibly even more, need to perform lookups in the PSL. The list, ever changing, is now over 200 KB large (there are thousands of public suffixes), and only loosely structured. The usual approach is to distribute the official PSL file in text format along with the application itself, and update it once in a while.

This poses several problems:

Parsing the PSL is not trivial. For example, it supports wildcards as well as exceptions: all direct children of kawasaki.jp are declared as public suffixes with the PSL entry *.kawasaki.jp, but city.kawasaki.jp is exempt (!city.kawasaki.jp).
When a given domain name is matched by several entries, only the longest one is the public suffix: the public suffix of some-bucket.s3.amazonaws.com is s3.amazonaws.com (and not com — but both are on the list).
In light of these parsing issues, applications need to transform the PSL text file into a suitable representation in which lookups are efficient (typically a tree structure; a good choice requires some expertise).
Also, a storage solution is required. It’s easiest to simply store the PSL file in the file system, but with the above considerations in mind, it may not be the best solution.
Developers have to come up with some sort of update mechanism. Do random updates at deployment time suffice?

PSL users have to worry about all of this. While the above problems have been solved several times in various implementations for different browsers, programming languages, etc., there is no one-stop solution so far.

That leads to the obvious question: Is there a better, more generic approach?

Solution: Mapping the PSL onto the DNS

The DNS, being a domain-based key-value store, is well-fit to serve as a directory for domain-related lookups. Both keys and values are rather arbitrary: the constraints are only that the key must be a valid (sub-)domain name, and the value must fit one of the established DNS record types (e.g. A for IP addresses, TXT for strings, or PTR for mapping a key onto another domain name). In recent years, a few interesting DNS-based applications have evolved, such as storage of TLS public keys using TLSA records (DANE).

The suffixes listed in the PSL qualify as proper DNS names (keys), with the exception of wildcard exception rules (let’s worry about that later). It is thus possible to map the PSL onto the DNS structure itself, forming a tree structure, and then "mount the PSL" as a subtree somewhere in the DNS. We chose to use the domain publicsuffix.zone as our home, and use query.publicsuffix.zone as the PSL mount point.

To represent PSL information, we use PTR records and encode all public suffixes as the values of such records. Keys are set up such that one can simply take any domain name and query a PTR record for that name, with .query.publicsuffix.zone appended. The zone contains crafty CNAME redirects and nifty wildcard configurations, such that the DNS lookup eventually arrives at a PTR record which does indeed point to the public suffix of the domain of interest. To ensure authenticity, we use DNSSEC (as is the case for all deSEC-managed domains).

As an example, let’s figure out the public suffix of www.google.co.uk. Here’s what we have configured under query.publicsuffix.zone:

There is a PTR record at **co.uk**.query.publicsuffix.zone with a value of co.uk. This is the record that the query reply will be expected to contain.
We also have configured a CNAME redirect record for all domains under this suffix, pointing one level up in the DNS hierarchy: ***.co.uk**.query.publicsuffix.zone CNAME **co.uk**.query.publicsuffix.zone.

The lookup then proceeds as follows:

We ask a PTR query for **www.google.co.uk**.query.publicsuffix.zone.
The DNS resolution process will encounter the CNAME record at ***.co.uk**.query.publicsuffix.zone and redirect the question to **co.uk**.query.publicsuffix.zone.
The PTR record at **co.uk**.query.publicsuffix.zone will be returned, yielding the answer co.uk. Voilà! 🎉

PSL wildcard rules are accommodated effortlessly with this approach: we simply prepend the wildcard label *. to the value of the PTR record. Wildcard exceptions are taken care of by adding explicit records at the domain name representing the exception rule, cutting the corresponding DNS subtree out of the wildcard’s sphere of influence. Consider the following example:

*.kawasaki.jp is a wildcard public suffix, and city.kawasaki.jp is an exception.
As usual, we define a PTR record at *.kawasaki.jp.query.publicsuffix.zone for the wildcard public suffix, with value *.kawasaki.jp.
To take care of the exception, we set an explicit PTR record at city.kawasaki.jp.query.publicsuffix.zone, overriding the wildcard rule with an explicit PTR value of jp.
Finally, we configure a CNAME record at *.city.kawasaki.jp.query.publicsuffix.zone, pointing one level up.

Things Worth Considering

Generic and platform-independent. The DNS-based PSL lookup solution solves all of the problems mentioned above. Applications only need the ability to perform DNS queries (which should almost always be the case). Parsing of the PSL is not required on the application layer; no decisions on storage or internal representation need to be made. Also, PSL information in the DNS is always up to date: we propagate changes from the official list on a daily basis.

Library support. For convenience, there is a Python library (psl-dns) that comes with a handy interface to "simply answer your question", such as psl.is_public_suffix('city.kawasaki.jp') or psl.get_public_suffix('some-bucket.s3.amazonaws.com'). The library provides some extra convenience: For internationalized domain names, it will preserve the Unicode vs. Punycode format choice between question and answer. For those who can’t get enough, it also allows listing all PSL rules pertinent to a domain name using psl.get_rules() (there may be several override rules in certain wildcard configurations). For information on library support for other languages, check out the documentation at publicsuffix.zone. In any case, the use of a library is by no means necessary: if you like to stick to the KISS principle or if your language is not supported, simple DNS lookups will get you there as well.

PTR record support. Some DNS resolvers (especially those run by consumer Internet access providers) do not support PTR queries. The problem can be worked around by using another resolver (such as 8.8.8.8), or by querying the ANY record type instead (which will return all records for the given name, including PTR). When asking for ANY records, you may find that the response will sometimes contain additional TXT records. Those are purely informational and represent PSL rules that cover the domain name, but were overridden by some other rule (again, this can happen in case of wildcard exceptions). Unless you are interested in these no-op rules, you can ignore them completely.

Edge cases. Domain names are limited to 63 labels (they can contain at most 62 dots). Appending .query.publicsuffix.zone to the name of interest costs 3 labels, so only domains names with up to 60 labels are supported. This will not usually be an issue. [Update 02/2022: This remainder of the paragraph no longer applies, as the PSL specification has been updated to allow wildcards only at the first label.] Furthermore, the PSL specification allows rules with inline wildcards, such as inline.*.wildcard.test. Such constructs cannot be mapped onto the DNS, as DNS requires wildcards to be in the leftmost position. However, the PSL does not currently contain any actual rules of this kind, and PSL maintainers are planning to drop inline wildcard support entirely. Consequently, this edge case is currently not a practical issue, and it most likely never will be.

Privacy. It is clear that when querying a DNS service, the involved DNS service provider(s) will learn about the names queried. It would thus be a bad idea if a browser vendor decided to query our PSL service for cookie policing, as this would expose important aspects of their users’ browsing activities to SSE and deSEC (the operators of publicsuffix.zone) as well as other parties (when using a resolver). In other contexts, such as during certificate issuance checks, the concern is lessened and a DNS query may be acceptable: the subject names of newly issued TLS certificates are publicly logged anyways, and a DNS query leaks the same information. Privacy concerns thus depend on the specific use case and must be evaluated on a case-by-case basis. By all means, we do not operate the service to collect any data (and in fact we do not keep query logs). We are open to providing full copies of the PSL zone to parties who would like to run a private on-site deployment of the service — just get in touch!

Conclusion

Knowing the public suffix of a given domain name is an important piece of information in several widespread applications. However, keeping an up-to-date copy of the list and parsing it correctly is a challenging task from which many issues arise.

Looking up public suffixes by utilizing a DNS-based representation of the PSL solves these issues, at the same time demonstrating that there are still novel applications to the DNS. While it may be surprising to some, I would like to make the case that this is a great illustration of how DNS is exactly the right technology to tackle the issue of storing information that is closely related to domain names: decentralized, ubiquitous, and, with DNSSEC, authentic.

This service has been made possible through the continued contribution of SSE to the security-first, free, and open-source DNS hosting service deSEC. It was while working on deSEC that I first became aware of the need for ad-hoc PSL lookups. We are glad to have found a low-maintenance solution for our own needs that at the same time we can provide as a free, public service.

Dr. Peter Thomassen

Peter is passionate about creating security solutions for both individual enterprises and the Internet infrastructure in general. He has significant experience in designing Internet protocols and software systems.