Batch Me if You Can
Batch Signatures, Revisited
The Post-Quantum (PQ) signature schemes chosen for standardization by NIST all suffer from performance issues; they are computationally slower and consume much more bandwidth than the current standards we use today, such as ECDSA. Thus, for some applications and protocols such as TLS, switching to PQ signatures has the potential to severely increase latency and reduce throughput. In what follows, we explain an approach to mitigate these issues by signing messages in batches, rather than individually, and present experimental data showing the benefits of this approach when used within TLS.
This blog post is a short introductory version of the paper “Batch Signatures, Revisited” by Carlos Aguilar-Melchor, Martin R. Albrecht, Thomas Bailleux, Nina Bindel, James Howe, Andreas Hülsing, David Joseph, and Marc Manzano. It can be accessed on IACR ePrint and the results will be presented at the NIST PQC Seminar #7 scheduled for July 21 at 10am UTC-4.
Introduction
After 7 years of waiting, the Post-Quantum Cryptography (PQC) algorithms that NIST intends to standardize in 2024 have been selected (see our blog post for more details). In particular, the Key Encapsulation Mechanism (KEM) Kyber — to provide key sharing, and the three digital signature schemes Dilithium, Falcon, and SPHINCS+ — to provide authentication, have been selected. These two properties are fundamental to our digital lives as they allow us to securely communicate (enabled by sharing a secret to then encrypt messages) with the entity we want to communicate with (provided through authentication). Using these algorithms in applications should ensure security even in the presence of quantum adversaries.
Some of these applications require a huge amount of digital signatures, often in a short amount of time; for example, Hardware Security Modules (HSMs) generate many certificates. Unfortunately, Dilithium2 and Falcon-512 have roughly a 2.5x and 5x slower runtime in comparison to ECDSA-p256, respectively, whilst also having significantly larger signature sizes. Therefore, replacing ECDSA with Dilithium or Falcon will severely impact the scalability and functionality of systems in such high-throughput settings. HSMs with dedicated instructions to generate PQ signatures (similar to Intel’s ECCPSignDSA instruction for generating ECDSA signatures) will be created to relieve some of the stress on these applications in the near future. However, they will unlikely resolve the problem entirely. To relieve stress further, an approach is to use batch signatures. Our experiments show that using such batch signatures enables the same performance (with increased latency of less than 25% for all schemes) as using plain ECDSA-p256 as shown in the diagram below.
The idea of Batch Signing for TLS has been proposed in an RFC draft to solve existing scalability challenges of classical digital signature standards in a high-throughput TLS scenario in 2020. In this instance, RSA signing was described as “CPU-intensive compared to many other algorithms used in TLS”. At the heart of batch signing is the concept of a Merkle tree constructed from a batch (i.e. a collection) of messages. Instead of signing each message individually, we only need to sign the root of the Merkle tree. This effectively authenticates a large batch of messages with a single signature generation. A signature for a message then consists of the root signature and its respective, individual path in the Merkle tree. This means the overall signature cost is slightly increased to accommodate this path, e.g., batch signing 32 messages adds less than 100 bytes to each signature (totalling 2536 bytes for Dilithium2, see diagram), but the amortised cost reduces the 32 signatures previously required to just one signature, with the addition of a few cheap hash function computations. The throughput improvement we see in response is significant; with a 72% increase for Dilithium2, a 54% increase for Falcon-512, and a 467% increase for SPHINCS+-128. This means that (i) with only a small increase in communication costs we achieve a large increase in computation efficiency and (ii) in scenarios where only the path needs to be shared, but not the entire signature (such as large-scale container mesh networks described below), communication cost is as small as for ECDSA.
Unfortunately, the RFC draft has not been finalized and has been deprecated by now. In our paper we resurrect this idea, formally analyze the security and privacy guarantees of batch signing, and show its advantages particularly for the upcoming transition to post-quantum cryptography.
Batch signing use cases: TLS
Integrating PQ cryptography into TLS is known to be one of the bigger issues facing the community. In particular, some of the largest PQC experiments to-date were conducted on this topic at an early stage; the CECPQ1 and CECPQ2 experiments (taking place in 2016 and 2018, respectively) were conducted just after the PQC candidates were announced, both finishing before NIST’s second stage started. The experiments focused on the PQ key sharing components (instead of authentication using signatures), known as KEMs, which, when added with authentication, make up the core (asymmetric) components for creating secure communications on the Internet.
While the original RFC on batch signing focused mainly on improving the server’s performance during TLS connections, other TLS-related applications would benefit greatly from batch signing.
For example, using batch signatures it is possible to increase the maximal number of requests per second and gracefully decrease the latency of Load Balancers, which are often used as TLS termination proxy (meaning they decrypt, encrypt, and sign the incoming and outgoing HTTPS traffic) to relieve the backend servers from the cryptographic computations. Batch signatures can be used to avoid load balancers themselves becoming the performance bottle-neck of the TLS connection.
Another example is to reduce the bandwidth within large-scale container mesh networks that often use TLS certificates to secure connections between its proxies. If the certificates of all proxies are signed in the same batch, the major part of the certificate signature is the same for all proxies. This means it is sufficient to communicate the paths of size 98 bytes, instead of the full batch signature of 4693 bytes, between peers if Dilithium-V is used (see table above). As for the load balancer, generating the certificates benefits from batch signing as well. In summary, this approach would reduce the storage of all mesh certificates for, say a ten thousand container mesh using Dilithium, from hundreds of megabytes to hundreds of kilobytes.
There are numerous benefits to incorporating batch signatures into cryptographic protocols, security infrastructures, and more in order to reduce computational or communication costs. In particular, this will significantly reduce the expected performance hit when we migrate our systems to post-quantum cryptography. These systems can be as widespread as resurrecting batch signatures for the IETF, or as ad-hoc as embedded batch signatures in one’s own security systems or architectures.