The key ingredients to a better blockchain, Part VI: Privacy
Our understanding of what makes a blockchain successful is becoming clear. What will it take to succeed?
Wherever you go, there you're watched. Maintaining good privacy can sometimes feel like a losing battle. Fortunately, blockchain technology gives us some tools to fight back. Photo by the author.
[This is part of a multi-part series on the key ingredients to a better blockchain. I recommend starting with part one, Tech and Protocol. See the full list of articles in the series.]
Table of Contents
- Introduction
- Educate your users
- Privacy requires more than plausible deniability
- Avoid privacy theater
- Maximize the size of the anonymity set
- Maintain auditability
- Default to “on”
- Support standards
- Encourage good account hygiene
- Sidechains and state channels
- VM support
- Keep data private in transit
- Pseudonymity as a social norm
- Cost of deanonymization
- Be wary of trusted setups
- Invest in better UX
- Conclusion
Introduction
Most people would agree that privacy is important, but how many have reflected on what “privacy” actually means? Before talking about privacy, it’s useful to define it.
Let’s start with what it’s not: privacy does not mean total seclusion or isolation. In fact, it means freedom from unwanted observation or intrusion. In other words, privacy is the ability to decide when, and with whom, we interact or disclose information. Privacy means selective expression, and is closely related to ideas such as community and consent.
Up to now, the most popular blockchain platforms, such as Bitcoin and Ethereum, have deemphasized privacy in favor of more pressing concerns such as scalability and usability, and the market capitalization of these platforms suggests that the market agrees with this decision. Most of the progress that has been made on privacy technology has therefore happened at layer two of these platforms, as well as on privacy-focused platforms such as Monero and Zcash.
Regardless of how important you think privacy is relative to these other challenges, however, if blockchain is ever to deliver on its greatest promise—unique properties including censorship resistance and permissionlessness—privacy will have to become more of a priority. Put simply, if it’s easy to tell who is party to a transaction, then it’s relatively easy to censor transactions involving a particular actor or otherwise prevent that actor from participating in the network.
Bitcoin and blockchain more generally would not have been possible without privacy technology such as public-key cryptography, but the relationship goes both ways. Blockchain has reinvigorated interest and investment in, and development of, privacy technologies from messaging to private payments, to whistleblowing, to private browsing and data exchange. These are some of the most natural applications of blockchain: while the technology is still immature, its key properties such as decentralization can enable stronger privacy (vs., e.g., storing all data with a trusted third party). What’s more, users who highly value privacy are more likely to put up with the limitations of the technology today, such as high cost and poor usability—indeed, such users have been putting up with the slow speed of Tor for quite some time.
While public- and private-keys are used to sign transactions on platforms like Bitcoin and Ethereum today, and to attest to ownership of assets and data that live on-chain, transactions and data on most blockchain platforms are in fact not encrypted and are therefore visible to everyone.
It’s possible to achieve a limited degree of privacy through correct use of accounts and keys (e.g., by never reusing the same wallet address in a UTXO-based system such as Bitcoin), and mixers (which help obfuscate transaction amounts and routing). However, these methods are convoluted and error prone, even for relatively technical users. (See these examples, which are both informative and indicative of how complex Bitcoin privacy can be.) And simply obfuscating some transaction data from passive observers isn’t enough.
As Zcash co-founder Ian Miers puts it, “blockchain privacy is not intuitive.” By default, your transactions are broadcast to everybody, and while it’s not that hard to obscure those data from passive observers, this is not enough to guarantee strong privacy since you _also need privacy from people you transact with._In fact, we need to go one step further than simply hiding transaction participants and amounts: ideally, we want to hide the entire transaction graph, to make transaction traceability and linkability difficult or impossible.
To address this complexity and make privacy easier, more sophisticated privacy-preserving protocols have been developed at both layers one and two.
At layer one, blockchain platforms like Zcash, Monero, Beam, and Grin bake cutting-edge cryptographic technologies into the protocol so that, in many cases, all transactions have at least some degree of privacy. These technologies include zero-knowledge proofs, ring signatures, Confidential Transactions, CoinJoin, and Dandelion Routing to obfuscate transaction data and make it difficult or impossible to link transactions and transacting parties. At layer two, projects such as Aztec Protocol and Matter Labs have developed technologies that move transaction processing and validation off-chain, where they can be performed privately, such that only a zero-knowledge proof attesting to the correctness of a set of transactions needs to be published to the main blockchain. (Such technology has the added benefit of potentially increasing throughput, since many more transactions can be processed off-chain than on-chain, but that’s not the main focus of this article.)
It is a widely-held belief that blockchain technology will probably fail to achieve mass adoption if it is unable to deliver strong privacy. After all, how many users want their entire transaction history visible to the whole world? Such data are a goldmine for data analytics firms, which perform sophisticated analysis on the transaction graph. In many cases, this allows them to deanonymize accounts and trace transactions. Chainalysis, one of the largest and oldest such firms, does millions of dollars’ worth of work a year for governments, including for agencies such as the IRS and ICE in the United States. A lack of even basic privacy allows authorities to observe cryptographic transactions and wealth in near-perfect fidelity. (Incidentally, I believe this is one reason governments have not done more to discourage adoption of Bitcoin.) This is bad enough in the free world and in places where trust in government is high; in countries that are less free, where blockchain stands to have the biggest positive impact on society, the dangers are greater still.
The best way to prevent such analysis, in theory, is the mix network (known as a “mixnet” for short). A mixnet routes messages through one or more intermediate nodes known as mixers. Each of these nodes is privy only to who sent it the message and where it should be sent next. A mixer may collect many inbound messages, then release them in batches, breaking the link between the sender and the recipient of a message, thus making it much harder to trace a message as it transits the network.
While the mixnet concept has inspired privacy technology such as Tor and anonymous remailers, real, the idea faces practical challenges and production-ready mixnets remain elusive. However, there are some very real things that blockchain platforms and communities can do to promote strong privacy. Here are a few.
Educate your users
The most important thing you can do to promote privacy is education. No privacy system is perfect, so it’s essential that users understand the strengths and weaknesses of a given protocol: what information (e.g., transaction sender, recipient, amount, origin IP address) is being exposed, to whom, and what can or can’t be done about it. Otherwise, the user may leak information without realizing it. Education doesn’t require writing or testing code and therefore doesn’t involve technical risk. In spite of this, many projects underinvest in education, often because developers generally prefer writing code over writing documentation. Cryptography, and the way in which it relates to security and privacy, is also an incredibly complex and counterintuitive subject, which makes explaining it a daunting prospect even to experienced technical writers and educators.
Privacy requires more than plausible deniability
As Ian Miers explains, in spite of the fact that it was the “original model for privacy in Bitcoin” (”you can’t prove that transaction was mine”), plausible deniability is simply not enough in an age of machine learning, sophisticated transaction tracing techniques, and mass surveillance. It’s not going to “get you out of a search warrant.” We must rely on the far more sophisticated privacy technologies available today if we want stronger protections.
Avoid privacy theater
Privacy theater refers to listing privacy-related technologies out of context. Just because a protocol has impressive-sounding features, such as “cutthroughs, confidential transactions, zk-snarks, and Tor,” does not by itself mean the transactions on that network are private. As Miers points out, “the technique isn’t the solution.” Communicate about and evaluate protocols based not on their feature set, but on what they actually do: what they’re supposed to hide, how well they hide it, and how this impacts the rest of the functionality.
Maximize the size of the anonymity set
While perfect privacy remains elusive, a reasonable alternative is “lost in the crowd” privacy, as popularized by Tor. In the blockchain world, this is exemplified by the mixer: my transaction, and many other transactions, enter at one end. Once a sufficient number of transactions have flowed in, I can withdraw my deposit, using one or more additional transactions. If the mixer is well designed and there are sufficient transactions flowing into and out of it, it’s difficult to draw a line connecting a deposit into the mixer with a particular withdrawal. The tradeoff here is that adding more transactions into the mixer means longer processing times.
At the highest level, layer one privacy solutions work in a similar way: by “mixing” a number of transactions together into something called an “anonymity set,” i.e., a set of transactions aggregated together, and in theory indistinguishable from each other. An observer can tell that the sum of the inputs of the set equals the sum of the outputs, but cannot link specific inputs and outputs. In Monero, for instance, each transaction becomes part of an anonymity set with at least 11 other decoy transactions, known in Monero as “mixins.”
This approach is not without some challenges. In Grin, the anonymity set includes all of the transactions in the same block, an approach which also has proven limitations and is further weakened by the fact that, as of publication, most blocks produced by the Grin network contain very few transactions. In Zcash, by contrast, the anonymity set includes all of the other shielded transactions in the same pool. While this is, in information theoretical terms, the largest possible anonymity set, in practice there are ways to reduce its size using heuristics and analysis.
While they’re better than nothing, mixers and other privacy systems based on anonymity sets are not foolproof and there are many known attack vectors.
Maintain auditability
One of the downsides to keeping transaction amounts private is that it makes auditability quite difficult: if balances are all private, then you can’t ever be sure the overall supply is what you think it is. Zcash ran into this problem with a recent counterfeiting vulnerability: while the network has policies in place that, in theory, allow detection of counterfeiting (which vaguely refer to “defensive measures” that “can be implemented” in case counterfeiting were detected), in practice we really have no choice but to rely on the core team when they tell us that, “We believe that no one else was aware of the vulnerability and that no counterfeiting occurred.” Privacy-vs-auditability is another fundamental tradeoff that we must contend with. However, all other things being equal, it should be as easy as possible for anyone to verify that certain fundamental constraints, such as the size of a particular pool of funds, have not been violated.
Default to “on”
If privacy doesn’t default to “on,” the size of the anonymity set will be reduced, because many users won’t understand how to enable privacy, or will opt out of doing so due to the added complexity and/or cost. As an example, while all transactions are private by default on Monero, Zcash requires its users to explicitly choose to shield their transactions, something many users don’t understand. Even after more than three years, under 5% of the ZEC on the network is stored in shielded addresses.
Support standards
Standards such as BIP32, hierarchical deterministic wallets, and BIP39, mnemonic phrases for generating deterministic keys, have existed since the early Bitcoin days. Since then, many other such standards have emerged. Platforms should lean on these widely-adopted standards to make it as easy as possible for their users to generate and manage many keys, and to ensure support in existing hardware and software wallets. Choosing an address format already supported by popular wallets makes everyone’s lives easier. Secp256k1 is probably the most widely-supported elliptic curve used to generate blockchain keypairs, due to its support in Bitcoin, Ethereum, and many other platforms. Another curve that’s rising in popularity is ed25519, but key derivation is harder since the curve has a non-linear key space, and standards are still lacking (though there are proposals).
Encourage good account hygiene
The ability to generate many accounts is only one part of the puzzle. Users must understand how to generate and manage many addresses and accounts, and why it’s important to do so. They must also be encouraged to do so at every opportunity. Apps must also support this UX pattern.
As Decrypt recently demonstrated, even many high-profile members of the blockchain community practice poor account hygiene, buying and linking named domains to their primary accounts, making it trivially easy to identify and trace their transactions. Users must be educated about and encouraged to enforce separation of privilege in their use of accounts: one account per application, or per use case, at minimum. (This is obviously more of an issue for account-based systems, such as Ethereum, than it is for UTXO-based systems, such as Bitcoin, but the same general principle applies to both.) A user may also opt to store their funds in a multiple signature wallet (or “multisig”), which gives control of those funds to a set of addresses rather than a single address. UX patterns such as Universal Login take this one step further, allowing a single account or identity to be represented by a set of keys, each with its own privileges.
All of this can be facilitated by the protocol, for example, by making it easy for one “delegate” account to receive rewards or interest from staking on behalf of another. Tools for managing multiple accounts should also be improved, and applications should support easily switching among multiple accounts and addresses.
Sidechains and state channels
Platforms such as Bitcoin that don’t support private transactions at the base layer can benefit from more private layer two technologies. While transactions on a sidechain or in a state channel are still visible to a subset of participants, and thus the privacy is imperfect, it’s an improvement over broadcasting every transaction to the entire network. In the Bitcoin world, transactions on the Lightning Network (other than channel opens and closes) don’t appear on the main Bitcoin blockchain. What’s more, onion routing means that intermediate nodes don’t know the sender, recipient, or amount of a transaction. As another example, transaction amounts in the Bitcoin Liquid Sidechain are obfuscated, although the sender and receiver are not.
VM support
Adding support for efficient, cheap cryptographic primitives to your VM makes it much easier and cheaper to develop privacy-preserving applications at layer two. As one positive example, despite having launched without them, Ethereum later added support for elliptic curve arithmetic as part of the Byzantium upgrade in 2017, and then made them cheaper in Istanbul in 2019. While this had the effect of reducing the gas cost for sophisticated layer two privacy applications, these applications are still quite expensive to run on-chain, and could likely become cheaper still with further research and optimization.
Keep data private in transit
You may wonder, in networks like Zcash and Monero that blind transaction sender, recipient, amount, etc. as described above, what’s the point of hiding a transaction if the data in that transaction are already obfuscated? The answer has to do with metadata analysis. While someone who observes a transaction in transit across the network may not know the sender, recipient, or amount, they can still glean meaningful information from the patterns in the metadata. For example, they might be able to associate an IP address with the source of a transaction or set of transactions, opening a vector to an old-fashioned denial of service attack. They might be able to link multiple transactions and see who is transacting with a known account (say, of someone on a sanctions list).
Other things being equal, it is therefore better to obfuscate not only the contents of a transaction, but also metadata such as its origin IP address, when it was created and broadcast, the path it took as it traversed the network, etc. Transport layer encryption and technologies such as Dandelion Routing help, but, as described in the Mimblewimble linkability attack, they have their limits in a permissionless network—that is, one where anyone can operate many nodes and observe transactions as they traverse the network.
Pseudonymity as a social norm
Privacy is about more than code and transactions. It’s also about organizational and community culture. In blockchain communities, and among teams building blockchain-based applications, it should be completely acceptable to contribute pseudonymously—whether to coding, governance, community, or anything else. Satoshi Nakamoto wasn’t the only pseudonymous cypherpunk: the founder of the Grin blockchain also chose to use a pseudonym in order to be “less of a victim” and to “keep egos in check.” I worked on a team with an active, pseudonymous contributor for some time, and while it took some getting used to, the team quickly adjusted and it had no impact on productivity. To the greatest extent allowed by law, it should also be possible to pay contributors in cryptocurrency, and KYC requirements should be kept to an absolute minimum. It should be possible to purchase tickets to community events anonymously, and to attend events without needing to show government ID or use one’s legal name.
For the record, this is about more than privacy. It’s also about inclusivity and self-expression, as some community members may have a different name or gender listed on their legal identification.
Cost of deanonymization
Privacy can be difficult to quantify, but one reasonable attempt to do so is the “cost of deanonymization.” As explained in Privacy Is a Feature, Not a Product, blockchains like Monero mix together transactions to increase the size of the anonymity set. As described above, this creates imperfect but useful “lost in the crowd” privacy for each transaction. However, in Monero, transactions are public, addresses are reused, and there’s nothing to prevent an attacker from participating in a large number of these transactions—other than the transaction cost.
Last year, a group of researchers proposed an attack called FloodXMR, claiming they could deanonymize 50% of Monero transactions over the course of a year for only $1,700. While the study made some naive assumptions and the math was a little off, the actual cost to perform such an attack against Monero is still “well within the budget of any 3 letter agency,” and the same class of attack would work against CoinJoin and many other “lost in the crowd” privacy systems. As this analysis suggests, the cost of deanonymization may in fact be several orders of magnitude higher for layer two privacy solutions, built on top of networks with more transactions and higher market caps, such as Bitcoin and Ethereum.
Be wary of trusted setups
Many classes of popular privacy technology, including the zk-snarks that are the cornerstone for Zcash, require a “trusted setup,” where a group of actors perform a “ceremony” to securely generate a random number that ends up embedded in the protocol. Each participant is instructed to delete the intermediate data that they generate (Zcash cheekily calls this “toxic waste”). As long as there is at least one honest participant who faithfully deletes the data, then the outcome of the setup is secure; however, if, hypothetically, all of the participants were to collude and keep the data (or were all compromised), they (or an adversary) could use the data to forge transactions or secretly mint coins for themselves.
The cryptography community is divided on the security of trusted setups, but they’ve been used in multiple production systems (Aztec Protocol is another such example). The original Zcash ceremony included six participants. Zcash holds a new one for each network upgrade, and the most recent included over 90 participants. One possible single point of failure for such a ceremony is a backdoor in the code used to run the trusted setup, since all participants usually run identical code. One of the participants in the original Zcash ceremony, Peter Todd, later expressed concerns about this fact, and the fact that the ceremony took place without reproducible builds.
The trusted setup is one of the most difficult tradeoffs in privacy. Many find the ceremony idea distasteful and insecure, but, even so, it can lead to reduced computational complexity, smaller proof sizes, and therefore better UX down the road. Where a trusted setup is unavoidable, such as in the case of zk-snarks (which have many nice properties), the ceremony should include as many participants as possible, and should use reproducible builds for the setup code.
Invest in better UX
The best privacy technology in the world is useless if it’s too difficult, slow, or expensive to use. Poor UX is likely one of the reasons under 5% of Zcash is stored in shielded addresses. It used to take 37 seconds and more than 3gb of memory to calculate a proof and generate a shielded transaction. (The most recent Zcash network update reduced that to two seconds and 40mb.) All standard operations, including making and receiving a private payment, checking your balance, etc. should work on mobile, and these operations should not require downloading the entire block history. Blockchain applications already have enough UX challenges, from key management, to gas, to transaction finality. Privacy has the potential to make that even worse without careful thought and attention.
Conclusion
Privacy is one of the explicit goals described in the Bitcoin whitepaper—in fact, it got its own section! Ten years ago, the extent of blockchain privacy technology was, “a new key pair should be used for each transaction” (as Satoshi admonished). While perfect privacy remains elusive, we have far better and more powerful privacy tools at our disposal today. It’s encouraging to see these tools begin to make their way into production blockchain systems.
It’s incumbent upon us to keep the cypherpunk spirit alive and to continue to research, develop, and educate the public about these tools and why they’re so important. In an age of increasingly powerful, high-tech, mass surveillance, strong privacy is an essential bulwark against an all-knowing, unaccountable state. Blockchain technology can and should be a conduit for putting this power in the hands of everyday people everywhere.
This article is part of a multi-part series on the key ingredients to a better blockchain. Check out the other articles in the series:
- Part I: Tech and protocol
- Part II: Decentralization
- Part III: Community
- Part IV: Constitution
- Part V: Governance
- Part VI: Privacy
- Part VII: Economics
- Part VIII: Usability
- Part IX: Production Readiness
- Part X: Sustainability
Special thanks to Jaye Harrill, Yael Hoffman, Noam Nelke, and James Prestwich for invaluable pre-publication feedback and corrections.