In the rapidly advancing domain of zero-knowledge virtual machines (zkVMs), the correctness and efficiency of specifications are foundational. At Nexus, we’ve recently been experimenting with how AI tools can support the development of zkVM circuits — especially in detecting errors, proposing optimizations, and improving specification clarity.
This post outlines our findings from an initial series of experiments using AI systems to reason about zkVM specs and generate candidate circuit logic. While the outcomes were mixed, the process surfaced valuable insights into how AI can (and can’t yet) assist with formal systems design.
Download the full paper at the bottom of this post
Whole-spec analysis: Broad insights, limited depth
Our first approach involved running large language models over complete zkVM specification files. We asked these models to flag potential soundness issues, detect inconsistencies across trace columns and constraints, and suggest performance improvements.
While this produced some intriguing suggestions, the limitations became clear quickly. Soundness analysis lacked reliability; in some cases, AI appeared to base suggestions on internal comments intended for future work, rather than on formal logic. However, this approach did yield one useful outcome: identifying variables mentioned in comments but not defined in the actual specification — an easy-to-miss class of error in complex specs.
A shift toward targeted prompting
Recognizing the limits of whole-document analysis, we refined our approach. Instead of treating the spec as a monolith, we focused on individual circuits —beginning with the ADD instruction — and crafted targeted prompts that addressed specific constraints and edge cases (e.g., field size, program counter behavior, clock updates).
This circuit-level prompting proved far more effective. It enabled deeper insights into optimization opportunities (e.g., carry handling) and facilitated the generation of test cases that helped validate correctness. We expanded this approach across additional instructions — ADDI, SLTU, SLT, and shift operations like SLL —consistently using test cases to check validity and highlight redundancy in constraint sets.
Challenges in MUL circuit design
Among the more demanding tasks was using AI to synthesize logic for the MUL instruction from scratch. Accurate circuit generation required precise instructions around field sizes, limb management, and modular arithmetic. Initial outputs often failed on these fronts, but iterating on prompts led to more viable implementations.
The Nexus Verifiable AI Lab is dedicated to exploring the frontier of verifiability, economics, and artificial intelligence in order to expand the boundaries of human and machine cooperation.
Key takeaways
1. Large-scale analysis is not a substitute for formal reasoning
While it can highlight inconsistencies or surface novel framing, AI struggles with global soundness. Manual review and formal verification remain critical.
2. Specificity unlocks utility
Precise prompting — down to variable behavior and edge conditions—made a significant difference. The more structured the question, the better the result.
3. Requirements must be explicit
In cases where behavior (like clock updates) was implicit but not formally defined, AI often missed critical functionality. Making functional expectations explicit remains essential.
4. AI excels at repetitive validation
AI tools were particularly helpful for verifying the necessity of constraints and testing across multiple edge cases. They helped accelerate tedious — but essential — validation loops.
5. Circuit generation is iterative by nature
For complex logic like multiplication, success came not from a single prompt, but from successive iterations and prompt refinement.
What’s next
We’re expanding our exploration to include other AI systems — such as Gemini and Cursor — and further refining our prompt engineering strategies. We’re also interested in automating more advanced optimizations (e.g., Karatsuba multiplication) and applying these methods to division and other arithmetic instructions.
From a tooling perspective, shared AI chat sessions across collaborators (without joint accounts) would be a significant enabler for team-based workflows. We’re keeping a close eye on developments in that space.
On a personal note
When we first started this effort, we were skeptical of using AI tools in scientific and specification-heavy workflows. Concerns about plagiarism, accuracy, and trustworthiness loomed large. But what we found was that — with careful prompting and a clear sense of boundaries — AI tools can be valuable allies. Especially for repetitive tasks like constraint checking or structural optimization, they can accelerate work without compromising rigor.
We’re still early in this journey at Nexus, but it’s clear that AI has a growing role to play in the zkVM development lifecycle.