Surveying public keys used on the Internet

by Daniel Lenski. Posted on Jan 23, 2025

“Draw me an abstract image of millions of cryptographic keys being gathered from across the Internet, all around the globe,” by DALL-E 4.

🎬 YouTube
A discussion with the author about these findings on SandboxAQ’s channel.

Introduction

Algorithms based on public key cryptography are fundamental to the security of modern networks, in which they are used for proving identity and authenticity, and for agreeing on secure end-to-end symmetric keys in cryptographic protocols such as TLS and SSH.

In order to establish identity and to authenticate peers communicating on the Internet, protocols like TLS and SSH use long-term pairs of public and private keys.¹ These long-term public keys are the subject of this survey. Modern versions of TLS and SSH still require these long-term key pairs for peer authentication, even though they do not use them for key agreement.²

The main asymmetric cryptographic algorithms used today with long-term keys are RSA (first introduced in 1978) and ECC (widely introduced in the late 1990s). Although neither is resistant to attacks by large quantum computers, in the absence of post-quantum cryptography standards both are considered to be highly secure when used correctly and with appropriate parameters and sufficient key lengths. The CNSA 1.0 suite advanced by the US government in 2022 specifies a minimum key size of 3072 bits for RSA, and ECC algorithms with a key size of 384 bits.

The security of RSA depends on the difficulty of factoring very large integers into the product of two prime numbers. The past 40 years have seen many advances in computational power generally as well as in general-purpose factorization algorithms such as GNFS. Additionally, techniques such as batch GCD and Fermat factorization have proved highly effective in cases where RSA primes are chosen with insufficient randomness. As of 2024, it’s possible to factor a 256-bit RSA key on a laptop in a minute or two using msieve, and to factor a 512-bit RSA key in a few hours and at a cost of less than US$100 using cloud computing services and tools such as eniac/faas.

Because of advances in both cryptanalysis and computing power in the past decades, recommendations for safe RSA key sizes have been steadily increasing. In 2002, RSA Labs recommended using key sizes of at least 1024 bits, as of 2015 NIST recommends at least 2048 bits, and as of 2022 CNSA requires 3072 bits.

Previous research on long-term keys used on the Internet (including Bernstein, Heninger, and Lange 2012, Heninger et al. 2012, Lenstra et al. 2012, Barbulescu et al. 2016, and Cryptosense 2016) found substantial numbers of RSA keys which could be factored easily using the methods described above. Heninger (2012) largely attributed this to insufficient entropy and consequent poor random number generation, especially on small headless/embedded systems like home routers.

At SandboxAQ, we have undertaken a new survey of public keys used on the Internet in 2024.

How we did it

Over the course of several weeks, we gathered approximately 20 million unique public keys from multiple publicly available sources to observe their usage on the Internet:

~200K public keys from DNSKEY records of well-known domains
~600K public host keys from Internet-accessible SSH servers
- Where servers had multiple host keys (e.g. Ed25519, RSA, and DSA) we attempted to record them all.
~2.7M public keys from the leaf certificates of Internet-accessible TLS servers (primarily on TLS-dedicated TCP ports such as 443 for HTTPS, but also including servers that require STARTTLS mechanisms)
- Where servers primarily offer certificates with ECC keys, but will offer RSA to older clients, we attempted to record them all.
~5M SSH public keys from users of GitHub as well as GitLab and other public servers using the GitLab API (using similar techniques as in Lenstra et al. 2012 and Cryptosense 2016)
~11M public keys from certificates logged in recent Certificate Transparency logs and typically destined for use with TLS

There’s a large amount of double-counting of keys from TLS certificates, due to the common practice of reusing public keys when renewing certificates, and for related servers and domains.

We took care to record multiple keys associated with a given entity as described above, as well as to capture public keys even when they were found in obsolete or broken formats, and recorded them in a database that normalizes their format across sources. One of our key innovations, relative to scanning software such as zgrab2³, is that we emulated both modern and older clients in order to induce servers to offer keys and certificates associated with older versions and algorithms:

In order to check for matches against keys used elsewhere, we also loaded 791,515 RSA and DSA keys generated with insufficient randomness by OpenSSL and OpenSSH in Debian and its derivatives due to a 2006-2008 bug, and cataloged at hdm.io/tools/debian-openssl and github.com/HARICA-official/debian-weak-keys.

What we found

About 11.5M of the unique public keys we found were RSA keys, while the remaining 6M were primarily ECC keys (including Ed25519 and Ed448), with around 43,000 DSA keys.

Of the RSA keys, the overall majority are of 2048 bit length, while only around 120,000 keys are of 1024 bit length or smaller. About 98% of the RSA keys use the de facto standard public exponent of 65537, while about 1.7% use the public exponent 37 (as generated by OpenSSH prior to version 5.4), 0.2% use the public exponent 35 (as generated by PuTTY prior to version 0.77), and very few use other values (see the chart below for further examples from the long tail of RSA keys with unusual exponents).

Of the ECC keys, the overwhelming majority use either the NIST P-256 curve (via ECDSA in TLS certificates) or Curve25519 (via Ed25519/EdDSA in SSH host or user keys). A very small number use Ed448 in SSH host keys, or the brainpoolP256r1 curve in TLS certificate keys.

Of the DSA keys, more than 99% are of 1024 bit length, but around 15 out of 43,000 are of 512 or 640 bit length.

= Easily-crackable key size and/or parameters that suggest improper key generation)
= Ed25519/Ed448
= RSA
= EC with NIST or Brainpool curves
= DSA

Cryptographic weaknesses we observed

We used a Rust batch-GCD implementation to attempt factorization of the moduli of the millions of RSA public keys that we had gathered.

First, the good news! Unlike earlier research from 2012-2016, we found no RSA keys at all that could be easily factored into large primes (>40% of the total key length) using batch-GCD (excluding the infamous Debian weak keys which do in some cases share prime factors due to the insufficient entropy used in generating them).

Secondly, the conception and deployment of Certificate Transparency — spearheaded by Google in the early 2010s — appears to have succeeded in its intended effect to motivate public Certificate Authorities to scrutinize the certificates that they are signing. We found zero weaknesses or oddities in millions of keys from recent CT logs.

Debian weak keys

Although the bug that caused the Debian weak keys to be generated existed only in 2006-2008, and although there was a concerted effort to identify and eliminate usage of these keys, some of them are still found on the Internet in 2024!

We identified 5 users of Git-based hosting sites using Debian weak keys as their SSH user keys, including at least two highly active users. This means that their identities can be trivially forged on these platforms.

We also found 12 Internet-accessible SSH servers using Debian weak keys as their host keys. This means that these hosts’ identities could be trivially forged by a MITM attack between the client and the server, and traffic could be intercepted.

Finally, we found 14 Internet-accessible TLS servers using Debian weak keys in their TLS certificates. All of these are self-signed. As with the SSH servers, these servers’ identities could be trivially forged by a MITM attack, and traffic could be identified.

💡 In two of the cases of SSH server host keys, the servers have both a DSA key and an RSA key which are Debian weak keys, and which were generated using consecutive process IDs, thus very clearly demonstrating the insufficient and predictable entropy that caused this bug.

Keys that are so small that they are broken

We found 4 users of Git-based hosting sites using 256-bit RSA keys as their SSH user keys. These keys are so small that they can be trivially factored, and the private key recovered.

We also found approximately 500 cases of 512-bit RSA keys (as SSH user keys, as SSH server host keys, as TLS certificate keys, and as DNSKEYs) and 14 cases of 512-bit DSA keys. These keys are so small that they can be factored at a cost of less than US$100 each using cloud computing services and tools such as eniac/faas; a researcher recently gained administrative access to a UK-based energy provider by factoring its 512-bit RSA key for a cost of $70. In the case of the TLS servers, many of these appear to be abandoned, unpatched, or unmonitored servers, using long-obsolete versions of the TLS protocol. In the case of the DNSKEYs, a number of reputable companies, educational institutions, and government agencies from around the world are using 512-bit RSA DNSKEYs. This corroborates other recent research which found many 512-bit RSA keys used as DKIM keys (DKIM also stores public keys in DNS records, but for the purpose of email origin authentication).

We also found around 50 TLS and SSH servers with 768-bit RSA keys, and one 768-bit SSH user key. As in the case of the 512-bit keys, many of these TLS servers appear to be forgotten or abandoned. This RSA key size is possible to factor with current technology, but the time and expense is much higher than for 512-bit keys.

Keys that are not future-proof

While the largest general-form semiprime integer known to have been successfully factored is 829 bits in length (RSA-250), experts such as Bruce Schneier believe that it is plausible that it will be technically and economically feasible to factor 1024-bit integers – and thus to break 1024-bit RSA keys – within the coming years.

Despite this foreseeable insecurity, 1024-bit RSA (and DSA) keys remain extremely widespread in the real world. We found thousands of cases of 1024-bit RSA and DSA keys as SSH user keys and as SSH server host keys. We found thousands of cases of 1024-bit RSA keys as DNSKEYs (including for prominent government agencies in the USA and EU). We also found thousands of cases of 1024-bit RSA keys as TLS server keys; while many are self-signed certificates, we did find more than 200 websites from the Chrome Top Million list with 1024-bit RSA keys, including several which offer certificates with ECC keys to modern clients but will fall back to 1024-bit RSA keys for older clients.

Notably, we observed zero recent certificate transparency log entries signing certificates with RSA keys of less than 2048 bits; as described above, CT appears to be doing a good job of motivating certificate authorities to set safe minimum standards and to scrutinize the certificates that they sign.

⚠️ Warning
OpenSSH continues to support the generation of new 1024-bit RSA keys as of 2024, and OpenSSL still supports the generation of new 512-bit RSA keys. Neither is the default, but both will do it without any clear warning messages.

Keys that were not generated using best practices

We observed hundreds of SSH user keys and DNSKEYs with RSA moduli having small factors, ranging from common cases of 3 or 5 or 13 up to one particular 2048-bit modulus with small prime factors of 2 (!), 3, 0x259, and 0x8f59, for a total of 27 bits of factors. The immediate reduction in security of the key is relatively minor (because the remaining unfactored 2021 bits may still be a product of two 1010-bit primes), but it does indicate that whatever tool generated this key did not do a good job at picking prime numbers; the generating tool must not have performed trial division to discover the small factors. Notably, we only observed these RSA with small factors as SSH user keys for GitLab and as DNSKEYs; we never observed them as TLS keys. After earlier work by Cryptosense highlighted the existence of user keys with small factors on GitHub, they implemented checks for small RSA factors when a new SSH user key is enrolled; we were able to empirically verify that GitHub now rejects any user key that contains factors of 3 or 5, at least. Importantly, in our use of batch-GCD we did not discover any cases where any of the millions of RSA keys we cataloged shared large prime factors of approximately half the total modulus length, which would fully break their security.

Metadata for some SSH user keys with small factors or non-standard sizes (e.g. 2047 bits) indicates that they were added to GitLab in recent months, leading us to wonder whether there is a current SSH key generator in widespread use which does not check for small factors when generating RSA keys. We tested recent versions of OpenSSH’s ssh-keygen on both Linux and Mac, and generated around 200,000 RSA test keys on both, but found zero cases where they generated RSA keys with small factors.

Leakage of identifying information

The major public Git-based hosting sites offer APIs that make it straightforward to determine the SSH public keys uploaded by arbitrary users.

Having gathered SSH public keys from millions of users, we find many cases where the same SSH public key is associated with a user on multiple sites. Most of these are innocuous and unsurprising: the user accounts linked in this way have similar-or-identical usernames, and they are not making any particular efforts to hide their real-world identities. However, among these tens of thousands of cases, we also find a few where the associated usernames are completely different (e.g. clark.kent on GitHub and sup3rman on GitLab) and where the affected users might be surprised or alarmed to learn that it is possible to link these real-world identities.

Additionally, we find approximately 200 cases where public keys are shared between SSH server host keys and SSH user keys enrolled on Git-based hosting sites. Because of this reuse, we can determine the real-world owners or operators of these servers, and often infer geographic and other information about those individuals from the IP addresses of the servers they operate. It’s unclear why anyone would intentionally reuse keys in this way; it seems quite possible that Git-based hosting site users have inadvertently uploaded their server keys when confronted with a form instructing them to enter the contents of a file named ssh-rsa.pub or similar. SSH typically uses “bare” keys and lacks a mechanism like the X.509 key usage extension which can be used to define the intended purpose of a key, and thus limit its inadvertent misuse for an unrelated purpose.

As summarized above, the vast majority of public keys we observed fall into a small number of groupings in terms of their cryptographic parameters: 2048-bit RSA with public exponent of 65537, ECC with NIST P-256 curve, or Ed25519. There is, however, a very long tail of unique or nearly-unique cryptographic parameters:

We observed exactly 93 distinct RSA public keys with a public exponent of 65337; this value appears to result from a typographical error or misunderstanding about the decimal representation of the common public exponent of 65537 (= 2^16 + 1). All of these keys were used as DNSKEYs; the vast majority of them are associated with .fi domain names (Finland), but they also include domain names of one US government agency. Because of the extreme rarity of this RSA public exponent, and the fact that it only appears in geographically-clustered DNSKEYs, there is a strong probability that some common key-generating codebase is shared by the DNSSEC infrastructure for these domains.
Similarly, we observe exactly 32 distinct RSA public keys with a public exponent of 4294967297 (= 2^32 + 1, rather than the common 2^16 + 1). As in the previous case, these are only observed as DNSKEYs, and the vast majority are associated with .cz or .sk domains, though again there are a few US government agencies as well. Once again, it seems likely that the DNSSEC infrastructure for these domains shares a common codebase for key generation.
We observed about 5 ECC TLS certificates using the brainpoolP256R1 curve. As of 2024, common TLS implementations do not support Brainpool curves by default, but the German federal government mandates or strongly encourages their use in some technical guidelines. Thus, one can reasonably infer the geographical location and industry affiliation of an Internet-facing server with this type of key, even if nothing else about it is publicly-accessible.
We observed about 200 SSH servers offering Ed448 host keys. This key type is extremely rare because the de facto standard implementation, OpenSSH, does not yet support it. LANcom LCOS routers appear to be one of the only (if not the only) server implementations that supports Ed448 host keys, so one can infer that any SSH server offering this key type in 2024 is almost certainly a LANcom router.

None of the above examples represent cryptographic vulnerabilities, but they are cases where the use (and reuse) of particular cryptographic attributes leaks more identifying information than the users probably intended.

How AQtive Guard can help

SandboxAQ’s AQtive Guard is an end-to-end cryptographic management platform that provides a full inventory of existing cryptography use, including vulnerability and compliance analysis, and a path to centrally managed, robust, and agile cryptography.

By using AQtive Guard to manage and analyze cryptographic code and assets, customers can ensure that they are adhering to best practices for cryptographic parameters and usage, and thereby avoid many of the weaknesses that we have discovered in this survey of public keys on the Internet.

AQtive Guard, for instance, can identify the reuse of cryptographic keys and the use of deprecated or insecure key types (for example, 512-bit RSA keys), and provide customers with detailed accounting for their inventories of cryptographic assets such as certificates and keys.

Long-term keys are distributed through mechanisms such as: public key infrastructure (PKI), the standard way to distribute and validate TLS server certificates on the public Internet; trust on first use (TOFU), the de facto standard for how SSH clients establish and validate their trust of a server’s public key; OpenSSH’s authorized_keys file, the de facto standard for how SSH servers enroll a client’s public key for SSH public key authentication. ↩︎
For example, modern versions of TLS use ephemeral Diffie-Hellman key exchange (in either its finite-field or elliptic-curve forms). This involves generating an ephemeral or transient key-pair for each exchange, using it for one exchange, and then discarding it. While the security properties of these ephemeral keys are important, they are outside the scope of this survey. ↩︎
The commercial Internet scanning service and database Censys appears to use zgrab2 directly. A similar service, Shodan, provides results that exhibit the same limitation of returning at most one key and certificate per port scanned. ↩︎