Let’s get one thing straight – zkML sounds a little nuts. The pitch is almost too good to believe: prove you ran a massive, compute-heavy machine learning model without revealing the weights, inputs, or intermediate steps, and do it fast enough to be useful. It’s the kind of thing that makes a cryptographer’s pulse quicken, but if you’ve ever tried to squeeze a billion-parameter model through a zero-knowledge proof, you’re probably shaking your head right now.
And you’d be right to be skeptical. Zero-knowledge proofs (ZKPs) are notorious for scaling like a rusty tractor on a mountain road. Every multiplication, every addition, every nonlinear twist and turn adds a fresh layer of complexity. Dense layers in models like GPT-4 or BERT are a perfect storm for ZKPs – massive, interconnected, and packed with operations that need to be proved succinctly if you want to keep your compute bill under control.
But the landscape is shifting. While trillion-parameter behemoths still hog the spotlight, a quiet revolution is happening around smaller, tighter models – the kind that fit in the palm of your hand, metaphorically speaking. These are the models being eyed for confidential smart contracts, private DeFi analytics, and agent-to-agent interactions in the burgeoning Web3 space. They don’t need to know the entire internet’s worth of information – they just need to prove that they can make smart, accurate decisions without giving away the game.
At the heart of this revolution is matmult – the unsung workhorse of machine learning. It’s the backbone of dense layers, attention mechanisms, and feed-forward networks, accounting for 70-90% of the total FLOP (floating-point operations) count in most neural nets. In other words, if you can make matrix-matrix multiplication sing in a ZKP context, you’ve already won half the battle.
For example, a single dense layer in a BERT model might involve multiplying a 4096×1024 weight matrix by a 1024-element input vector – millions of multiplications and additions in a single breath. If you tried to naively prove this in zero knowledge, you’d drown in constraints, buried under an avalanche of polynomial evaluations.
This is where the latest zkML tricks come into play. Protocols like GKR (Goldwasser-Kalai-Rothblum) [1], CCT (Custom Constrained Techniques), and batched sum-check are rewriting the rulebook on how to efficiently verify matrix operations. Instead of painstakingly checking each individual multiplication, they collapse entire matrix operations into tight, compressed relations that require a fraction of the overhead.
This is where recent advances come in. Protocols like GKR, CCT, and batched sum-check are changing the game. These techniques dramatically cut the cost of proving matrix operations, reducing both prover time and memory overhead by orders of magnitude.
GKR is particularly well-suited for layered circuits, allowing you to verify each step of a multi-layer network without exploding the number of constraints. It treats matrix-matrix multiplications as a series of layer-wise operations, efficiently handling the addition and multiplication steps separately. This means that the cost of verifying a dense layer can be reduced to the size of the matrix itself, rather than the square of its dimensions.
Custom Constrained Techniques (CCT) take this further, collapsing the entire matrix multiply into a single compressed polynomial relation. This trick transforms what would be millions of individual multiplications into a single check, dramatically reducing the number of necessary constraints.
Batched Sum-Check adds yet another layer of optimization, grouping together many small polynomial checks into a single large one, reducing the overhead of the verifier and allowing more aggressive parallelization.
Combined, these techniques can reduce the memory footprint of a prover by 1000× and cut prover time by 10× or more, pushing zkML into a practical range for real-world models.
But matmult calculations aren’t the whole story. Real models also include non-linearities like ReLU, softmax, and batch normalization, which are notoriously expensive to verify. This is where lookups come in.
JOLT [2], a SoTa zkVM, has been designed to efficiently process lookup-heavy operations. Instead of directly proving each non-linear operation, JOLT uses lookup tables and batched lookup arguments to verify these steps with much less overhead. This approach is critical for high-performance zkML, as it cuts down the cost of verifying activation functions, gating mechanisms, and other non-linearities that would otherwise dominate the cost of a full model proof.
One often overlooked aspect of efficient zkML is the role of sparsity. As models grow larger, they tend to exhibit a natural sparsity, either through explicit pruning or through the structure of certain activation layers. This sparsity can dramatically reduce a provers work, cutting the effective computational cost for both matrix-matrix multiplications and other tensor operations.
This is where elliptic curves (or Lattice) and protocols like GKR, batched sum-check, and Custom Constrained Techniques (CCT) can shine. Modern lookup arguments that leverage sparsity and the reduced work of small values in multi-scalar multiplication that are particularly interesting [3]. This can lead to massive savings in both prover time and memory overhead, making zkML practical even for larger models.
One-hot encoding is a simple but powerful representation for categorical data. It transforms each category into a binary vector where only one bit is set to 1 (or true), and all others are set to 0. This is common in machine learning for representing classes or tokens in a fixed-size vocabulary.
For example, if you have four possible classes – Cat, Dog, Bird, Fish – a one-hot encoding might look like this:
Cat - [1, 0, 0, 0]
Dog - [0, 1, 0, 0]
Bird - [0, 0, 1, 0]
Fish - [0, 0, 0, 1]
This representation is efficient when the data is inherently sparse, meaning that most values are zero, which is common in natural language processing (NLP) and recommendation systems.
One-hot encodings are widely used in ML because they:
In zero-knowledge proofs, one-hot encodings are valuable for a slightly different reason. They allow the prover to efficiently represent sparse vectors without explicitly committing to every zero element. This reduces the cost of both commitments and polynomial evaluations in protocols like:
For example, in a zkML setting, if your model weights are sparse or structured (e.g., many zero entries due to pruning), one-hot encodings can allow the prover to skip over these zero values, cutting both time and memory overhead dramatically.
One particularly powerful combination is using one-hot encoding alongside elliptic curve-based multiplications. This approach:
Combining one-hot encodings with elliptic curve optimizations can significantly reduce prover time and memory, making zkML not just possible, but practical at scale.
Combine all of these ingredients – GKR, CCT, batched sum-check, efficient lookups, and sparse encodings – and you get something genuinely new. Together, they promise to enable scalable, privacy-preserving AI at the edge, transforming the way machine learning models are trained, deployed, and verified in decentralized environments.
As these technologies mature, zkML will become a foundational component of Web3, powering agent-to-agent workflows, confidential AI, and fully verifiable digital identities. It’s a promising future, but only if we can continue pushing the limits of what’s possible in zero-knowledge machine learning.
The bottom line? zkML might sound a little crazy, a little too good to be true. But with the right mix of tricks and optimizations, it’s not just possible – it’s inevitable. Stay tuned, because this is one revolution you don’t want to miss.
REF:
[1]: (GKR) https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/2008-DelegatingComputation.pdf
[2]: (JOLT) https://eprint.iacr.org/2023/1217
[3]: (Twist and Shout) https://eprint.iacr.org/2025/105