So far, we have covered several important infrastructure-related topics in our series on the convergence of AI and blockchain technology. Everything from the potential benefits of AI and limitations they must overcome to the cross-chain interoperability, enhanced risk assessment, and the abstract future of quantum computing.
Yet, there is one hotly contested issue that we have yet to explore. We are, of course, referring to the issue of privacy within open distributed ledgers. While the exact identification of a wallet holder is obfuscated by alphanumerical public addresses, block explorers and forensic analytic companies like Chainalysis specialize in doxing this information for AML, fraud prevention, and on-chain analysis.
This article will explore why privacy is important and how blockchains have incorporated privacy-preserving protocols with tools like coinjoining, zero-knowledge proofs, tumbling, and mixing users' funds. We then examine how AI provides enhanced privacy options through federated learning and homomorphic encryption before concluding where privacy is headed in blockchain technology and AI.
The creation of publicly accessible immutable ledgers where transactions are stored permanently on-chain can sometimes be a double-edged sword when it comes to privacy. It ultimately gives the blockchain its power, but when advanced forensic blockchain analytics are involved, it can run counterintuitively to the end goal of a project. For instance, when blockchains are leveraged to carry out transactions with highly sensitive information, like in healthcare, doctor-patient confidentiality could be compromised.
Another example would be if blockchain technology were used to carry out elections. In fair elections, voters' decisions must be allowed to be made privately without fear of repercussion.
Even for simple financial transactions, privacy remains a topic of immense interest. It is not enough merely to carry out censorship-resistant transactions if those transactions can ultimately be used to identify an individual involved. It could identify where an individual lives and how they are being paid or used to identify high-net-worth individuals. Those individuals could then be maliciously targeted in other areas of their lives, de-platformed from their legacy banks, or ostracised from participating in regular life.
Some of the earliest attempts at circumventing the paper trail of crypto transactions were made using tumblers and mixers. These involved sending transactions to a third party that pooled the resources and resent the funds in separate transactions to the end destination. This helped obfuscate a transaction's start and end point, allowing for a degree of privacy to occur.
However, using mixers and tumblers changes the nature of a P2P transaction by requiring a middleman and reintroducing trust into the equation. The unregulated nature of tumblers and mixers easily enables fake tumblers and malicious actors to target users. Since users are at the mercy of the third party to follow through on resending their funds and because transactions are final, there is no way to recoup these losses if they occur. The other major issue with tumblers and mixers is that because they operate as a custodial service, they are regulated as a Money Service Business (MSB) by federal agencies. Tornado Cash found this out the hard way when its founders Roman Storm and Roman Semenov were arrested for operating an MSB and illegally laundering money.
Coinjoins use a similar method in how they operate but a non-custodial method for obfuscating users' funds and protecting the privacy of transactions. Coinjoins leverage the power of smart contracts to combine multiple user inputs into a single transaction. However, despite being non-custodial by nature, it has not stopped regulators from targeting them as MSBs. Samurai Wallet CEO Keonne Rodriguez and CTO William Lonergan Hill also found themselves in the crosshairs of federal regulators and the DOJ.
Zero-Knowledge protocols have a deep and rich history that dates back to the early cypherpunks and cryptographers of the 1980s that we saw in our inaugural piece on the next great industrial revolution. However, it wouldn't be until 2011 that ZKsnark protocols would be introduced at the International Symposium on Theory of Cryptography for use in modern-day blockchain projects. Zero-knowledge proofs have continued to advance with the advent of zero-knowledge STARKs, which are quantum-resistant and more scalable than ZKSNARKs. Protocols like Plonky2 and Zcash’s Halo2 are leading the charge in this field.
A simple way to think about how zero-knowledge protocols work is that they essentially obfuscate specific information within transactions and only release parameters of what is necessary to validate a transaction. Proof and validation can be achieved without the whole scope of a user’s identification within the transaction. We can use the example of how a borrower and lender interact to make a large investment in something like a house. The lender would need to verify that the borrower has an established credit rating or a minimal balance in their bank account for a downpayment. The borrower could verify these facts without divulging their specific credit rating or exactly how much they had in their account total. The transaction would either be approved or denied based on the parameters outlined at the start.
These zero-knowledge protocols enhance the privacy and security of the entire process by preventing the unnecessary doxing of specific information. They have immense benefits in establishing the flow of information between parties and approving/denying facts in question. This can be especially effective within elections, where a voter's decision can be identified without identifying the voters themselves.
We have seen throughout the series exactly how AI models leverage quality data sets to train their models. The higher the volume and the better the data quality, the more reliable the result. However, once data has been introduced into a model, it can be recalled for later training, which can inadvertently dox private information to the public. This feature of leaking sensitive information through AI training models forced many companies to ban employees from engaging with AI protocols altogether. Notably, Samsung was forced to crack down on its employees in May of 2023 after it revealed that sensitive internal data had been shared and was now being recalled in public inquiries of ChatGPT.
One strategy to ensure privacy is maintained in training AI models is federated learning, where the individual raw data sets are trained in a modular federated setting. These separated modular federations then pass the parameters and vectors from the local models onto a mother teaching model that combines all the parameters into a single larger model. Which it then reshares with the local models. This way, no specific raw data is ever shared between the local models, but each still benefits from the parameters of the other through the combined teacher model.
Federated AI learning models are ideal for blockchain projects that need to maintain privacy but can still leverage the data for building models. Healthcare is a prime example of how a federated model may be built to combat the spread of highly contagious diseases. Hospitals could share the raw data parameters they have collected without giving away sensitive patient information.
The concept of homomorphic encryption has been around since the 1970s; however, it wouldn’t be until 2009 that Craig Gentry first developed an algorithm for it. Unfortunately, the lattice-based encryption model he based it on required immense computational demand, making it impossible to scale at the time. By his own estimations, he predicted that a simple homomorphic encrypted Google search would require a trillion times more compute than a nonhomomorphic encrypted search.
The main concept of homomorphic encryption is that a user on one end can add a layer of encryption to their data that is locked behind a private key. The data can still be accessed without the specifics of the data being revealed via a public key by other users. Those users can then process and manipulate the data as if they could see the specifics behind the private key. If you think about the process as if you were in a lab behind a wall with gloves that could reach the other side. You could access and examine items on the other side of the wall without seeing them directly.
AI can assist with homomorphic encryption by optimizing the encryption process. We have already seen how AI excels in analyzing large volumes of data sets and can create more efficient compute processing.
Individuals and companies are learning quickly about the risks of providing private sensitive data to AI protocols, just as they had to learn about what immutable meant with blockchain technology. Yet, if the blockchain industry is any precursor to the public understanding of privacy, most individuals will accept the trade-off of convenience over privacy.
There are additional privacy-preserving protocols beyond the ones mentioned here. Specific privacy blockchains like Monero or Zcash exist for this reason but remain niche by comparison. AI models like Venice.ai store all conversations locally on the user’s browser history instead of on any database they control. If a user or industry truly wants to maintain their privacy, there are methods and protocols they can access, such as the ones mentioned above.