Author: @Web3_Mario
Abstract: Recently, I have been searching for new project directions and encountered a technology stack I had not previously engaged with while working on product design. Therefore, I conducted some research and organized my learning insights to share with everyone. In general, zkTLS is a novel technology that combines Zero-Knowledge Proofs (ZKP) and TLS (Transport Layer Security Protocol). In the Web3 space, it is mainly used to verify the authenticity of off-chain HTTPS data provided without trusting a third party within an on-chain virtual machine environment. The authenticity here encompasses three aspects: the data source does indeed originate from a specific HTTPS resource, the returned data has not been tampered with, and the timeliness of the data can be guaranteed. Through this cryptographic implementation mechanism, on-chain smart contracts acquire the capability to access off-chain Web2 HTTPS resources reliably, thereby breaking down data silos.
What is the TLS Protocol
To grasp the value of zkTLS technology more deeply, it is necessary to provide a brief overview of the TLS protocol. First and foremost, TLS (Transport Layer Security) is used to provide encryption, authentication, and data integrity in network communications, ensuring secure data transmission between clients (such as browsers) and servers (such as websites). For those who are not focused on network development, you may notice that some domain names use HTTPS as a prefix, while others use HTTP. When accessing the latter, mainstream browsers typically alert users that the connection is not secure. In contrast, the former may encounter messages like “Your connection is not private” or HTTPS certificate errors. The reason for these alerts lies in the availability of the TLS protocol.
Specifically, the HTTPS protocol guarantees the privacy and integrity of information transmission based on the HTTP protocol using the TLS protocol, making the authenticity of the server verifiable. We know that HTTP is a plaintext transmission network protocol that cannot verify the authenticity of the server, leading to several security issues:
1. The information you transmit with the server may be intercepted by third parties, resulting in privacy leaks.
2. You cannot verify the authenticity of the server, meaning your requests could be hijacked by malicious nodes, returning harmful information.
3. You cannot verify the integrity of the returned information, which means data loss could occur due to network issues.
The TLS protocol was designed to address these problems. To clarify, some may be familiar with the SSL protocol; in fact, the TLS protocol is developed based on version SSL 3.1. Due to some commercial considerations, it was renamed, but they are fundamentally the same. Therefore, in some contexts, the two terms can be used interchangeably.
The main approach of the TLS protocol to solve the above issues is:
1. Encrypted Communication: Using symmetric encryption (AES, ChaCha20) to protect data and prevent eavesdropping.
2. Identity Authentication: Verifying the server’s identity with a digital certificate issued by a trusted third party (such as an X.509 certificate) to prevent Man-in-the-Middle (MITM) attacks.
3. Data Integrity: Ensuring that data has not been tampered with using HMAC (Hash-based Message Authentication Code) or AEAD (Authenticated Encryption).
Let’s briefly explain the technical details of the HTTPS protocol based on the TLS protocol during data exchange, which can be divided into two stages. The first stage is the Handshake phase, in which the client and server negotiate security parameters and establish an encrypted session. The second stage is the Data Transmission phase, where encrypted communication occurs using the session key. The specific process consists of four steps:
1. Client sends ClientHello:
The client (such as a browser) sends a ClientHello message to the server, which includes:
– Supported TLS versions (e.g., TLS 1.3)
– Supported encryption algorithms (Cipher Suites, such as AES-GCM, ChaCha20)
– Random number (Client Random) (used for key generation)
– Key exchange parameters (e.g., ECDHE public key)
– SNI (Server Name Indication) (optional, for multi-domain HTTPS support)
The purpose is to inform the server of the client’s encryption capabilities and prepare the security parameters.
2. Server sends ServerHello:
The server responds with a ServerHello message, which includes:
– Selected encryption algorithm
– Server random number (Server Random)
– Server’s certificate (X.509 certificate)
– Server’s key exchange parameters (e.g., ECDHE public key)
– Finished message (to confirm the handshake is complete)
The purpose is to inform the client of the server’s identity and confirm the security parameters.
3. Client verifies the server:
The client performs the following actions:
– Verifies the server’s certificate: Ensures the certificate is issued by a trusted CA (Certificate Authority) while checking that it is not expired or revoked.
– Computes the shared key: Uses its own and the server’s ECDHE public keys to calculate the Session Key, which is used for subsequent symmetric encryption (e.g., AES-GCM).
– Sends Finished message: Proves the integrity of the handshake data to prevent Man-in-the-Middle (MITM) attacks.
The purpose is to ensure the server’s authenticity and generate the session key.
4. Starts encrypted communication:
The client and server now use the negotiated session key for encrypted communication.
– Using symmetric encryption (e.g., AES-GCM, ChaCha20) to encrypt data, enhancing speed and security.
– Data integrity protection: Using AEAD (e.g., AES-GCM) to prevent tampering.
After these four steps, the HTTP protocol’s issues can be effectively resolved. However, this foundational technology, widely applied in Web2 networks, has caused difficulties in Web3 application development, particularly when on-chain smart contracts wish to access certain off-chain data. Due to data availability concerns, on-chain virtual machines do not open up the ability to call external data, ensuring the traceability of all data to guarantee the security of the consensus mechanism.
After a series of iterations, developers recognized that DApps still had a demand for off-chain data, leading to the emergence of various Oracle projects such as Chainlink and Pyth. These projects act as relay bridges between on-chain and off-chain data, breaking the phenomenon of data silos. To ensure the availability of relayed data, these Oracles generally implement a PoS consensus mechanism, making the cost of malicious behavior for relaying nodes higher than the rewards, so that economically, they do not provide incorrect information on-chain. For example, if we wish to access the weighted price of BTC on centralized exchanges like Binance or Coinbase within a smart contract, we rely on these Oracles to aggregate the data accessed off-chain and transmit it to be stored within the on-chain smart contract for use.
What Problem Does zkTLS Solve
However, it has been found that this Oracle-based data acquisition solution has two issues:
1. High Costs: To ensure that the data relayed to the chain is authentic and has not been tampered with, the PoS consensus mechanism must guarantee this, but the security of the PoS consensus mechanism is built on the amount of staked funds, which incurs maintenance costs. Additionally, there is often a significant amount of redundant data interaction within the PoS consensus mechanism because data sets need to be repeatedly transmitted, computed, and aggregated across the network to reach consensus, raising the cost of data usage. Consequently, Oracle projects typically maintain only the most mainstream data, such as prices of major assets like BTC, for free. For specialized needs, payment is often required. This hinders application innovation, especially for long-tail and customized demands.
2. Low Efficiency: Generally, the consensus required by the PoS mechanism takes time, resulting in latency in on-chain data, which is disadvantageous for high-frequency access scenarios, as the data obtained on-chain has significant delays compared to the real off-chain data.
To address these issues, zkTLS technology has emerged, primarily aiming to introduce the ZKP zero-knowledge proof algorithm, allowing on-chain smart contracts to act as third parties that can directly verify that the data provided by a node is indeed data returned after accessing a specific HTTPS resource and has not been tampered with. This can avoid the high usage costs associated with traditional Oracles due to consensus algorithms.
Some may wonder why we do not simply build the capability to call Web2 APIs directly in the on-chain VM environment. The answer is that this is not feasible, as the necessity to maintain a closed data environment on-chain ensures the traceability of all data. In the consensus process, all nodes must have a unified evaluation logic for the accuracy of a given piece of data or execution result, which serves as an objective verification logic. This guarantees that, in a fully trustless environment, most benign nodes can rely on their redundant data to judge the authenticity of direct results. However, with Web2 data, it is challenging to construct such a unified evaluation logic because due to certain network latency reasons, different nodes may obtain varying results when accessing Web2 HTTPS resources, complicating consensus, especially in high-frequency data fields. Furthermore, another key issue is that the security of the TLS protocol, upon which HTTPS relies, depends on the random number (Client Random) generated by the client (used for key generation) and the key exchange parameters, facilitating the negotiation of encryption keys with the server. However, the on-chain environment is open and transparent; if smart contracts maintain the random number and key exchange parameters, critical data would be exposed, thereby compromising data privacy.
Thus, zkTLS adopts a different approach, aiming to replace the high costs of data availability brought by traditional Oracles based on consensus mechanisms through cryptographic protection. This is analogous to the optimization of ZK-Rollup over OP-Rollup in Layer 2. Specifically, by introducing ZKP zero-knowledge proofs and computing proof from the requests made by off-chain relay nodes to access certain HTTPS resources, related CA certificate verification information, temporal proofs, and data integrity proofs based on HMAC or AEAD, it maintains necessary verification information and verification algorithms on-chain. This allows smart contracts to verify the authenticity, timeliness, and reliability of the data source without exposing critical information. The specific algorithmic details will not be discussed here; interested parties can delve deeper into it.
The greatest benefit of this technical solution is the reduction in the cost of achieving availability for Web2 HTTPS resources. This has stimulated many new demands, particularly in reducing the on-chain price acquisition for long-tail assets, utilizing authoritative websites in the Web2 world for on-chain KYC, thereby optimizing the technical architecture design of DID and Web3 Games, among other areas. Of course, we can observe that zkTLS also poses challenges to existing Web3 enterprises, especially against current mainstream Oracle projects. To respond to this challenge, industry giants such as Chainlink and Pyth are actively pursuing research in related directions, attempting to maintain a dominant position during the technological iteration process, while also giving rise to new business models, such as transitioning from time-based charging to usage-based charging, or Compute as a Service. Certainly, like most ZK projects, the challenge remains in how to reduce computational costs to render it commercially viable.
In summary, when designing products, you may also want to pay attention to the development dynamics of zkTLS and consider integrating this technology stack in appropriate areas, as it may reveal new directions for business innovation and technical architecture.