One of the main reasons to use a zkVM is ease of development: instead of having to manually implement a custom circuit to prove the correctness of a computation, you can just write normal code, compile it down to a well-known ISA like RISC-V, and let the zkVM do the rest. However, in practice things aren’t quite that simple. You have to handle the distinction between private and public inputs, implement a host program that actually invokes the zkVM for executing and proving, and in the case of the Nexus zkVM your Rust code also needs to be no_std
, since naively importing parts of the standard library can violate some of the strong verification guarantees that are needed for many applications.
With the rise of LLM-based coding agents, there is a real opportunity to explore to what extent they can smooth out some of these development edges. So in this post we’ll take a look at using the Cursor coding agent to implement a guest program for a particular application of so-called accountable algorithms: HIPAA safe harbor compliance.
As a quick overview, the Health Insurance Portability and Accountability Act (HIPAA), a domain-specific federal privacy law in the US, places heavy restrictions on the transfer of an individual’s Protected Health Information (or PHI). As a consideration that such data can be very useful for research into public health, pharmaceutical, and other medical interventions, the act does however allow for transfer in some cases so long as the data has been suitably anonymized (to the chagrin of many computer scientists and data privacy experts). In particular, the act specifies a safe harbor, a specific set of information that should a controller process the data to remove, then they gain the presumption that they are compliant with the anonymization requirement and therefore have significant liability protections.
This is a natural use case for zkVMs: by doing the anonymization using (human or machine) certified code within a verifiable computation, the data processor gets a concise, easily verifiable proof that any databases they transfer with anonymized PHI are compliant with the safe harbor provision. This proof can then be attached the dataset or stored for future auditing, in either case providing a cryptographically-validated assertion that they are protected by the safe harbor.
So with that background, let’s try to implement a simple HIPAA safe harbor-compliant anonymizer to run on the zkVM, all using a coding agent. For the specific task we’ll work with a well-known dataset from the accountable algorithms literature, the Pima Diabetes database. Now this dataset is already anonymized, but we can easily add some faked PHI to it in order to get a suitable test case.
A Simple Solution
To start, we use the zkVM CLI tooling to generate a host project:
$ cargo nexus host hipaa
Without doing anything manually, let’s jump straight in to prompting Cursor with the expected schema and the basic outline of what we need, including a link to some official guidance on the safe harbor policy published by the Department of Health and Human Services. As part of our experiment, we’ll just use the automatic model selection from Cursor, instead of trying to choose the best available. We want to see if modern LLM-based agents are generally capable of near push-button implementations of this sort of functionality.
Rewrite src/guest/src/main.rs as a no_std Rust program that does as follows:
The program should take as input a vector of StringRecord objects from the Rust csv crate. The main function of the code should parse the database contained in the StringRecord as having the following set of columns:
- FirstName (string)
- MiddleInitials (string)
- LastName (string)
- EmailAddress (string)
- ZipCode (integer)
- TelephoneNumber(string)
- MedicalRecordNumber (integer)
- Pregnancies (integer)
- Glucose (integer)
- BloodPressure (integer)
- SkinThickness (integer)
- Insulin (integer)
- BMI (float)
- DiabetesPedigree (float)
- Age (int)
- Outcome (int)
For each StringRecord It should then apply the safe harbor deidentification rules as articulated in Section 164.514(a) of the HIPAA Privacy Rule (@https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html). That is, the code should either (a) delete any columns, or (b) default the values contained in any columns as required by the rule. The code should also modify any columns (such as zip code or age) as required by the rule. The code should document through comments which component of the rule is being applied, as well as which components of the rule are irrelevant for the dataset. Finally, the main function should return the modified vector of StringRecords.
Note that only the columns starting at “Pregnancies” exist in the dataset, so we’ll have to fake the data for the seven columns before it. What we get to start with from the agent seems pretty good.
#![cfg_attr(target_arch = "riscv32", no_std, no_main)]
#![feature(alloc)]
extern crate alloc;
use alloc::string::String;
use alloc::vec::Vec;
use csv::StringRecord;
// Structure to hold the parsed record data
#[derive(Debug)]
struct PatientRecord {
first_name: String,
middle_initials: String,
last_name: String,
email_address: String,
zip_code: String,
telephone_number: String,
medical_record_number: i32,
pregnancies: i32,
glucose: i32,
blood_pressure: i32,
skin_thickness: i32,
insulin: i32,
bmi: f32,
diabetes_pedigree: f32,
age: i32,
outcome: i32,
}
impl PatientRecord {
fn from_string_record(record: &StringRecord) -> Option<Self> {
Some(PatientRecord {
first_name: record.get(0)?.to_string(),
middle_initials: record.get(1)?.to_string(),
last_name: record.get(2)?.to_string(),
email_address: record.get(3)?.to_string(),
zip_code: record.get(4)?.to_string(),
telephone_number: record.get(5)?.to_string(),
medical_record_number: record.get(6)?.parse().ok()?,
pregnancies: record.get(7)?.parse().ok()?,
glucose: record.get(8)?.parse().ok()?,
blood_pressure: record.get(9)?.parse().ok()?,
skin_thickness: record.get(10)?.parse().ok()?,
insulin: record.get(11)?.parse().ok()?,
bmi: record.get(12)?.parse().ok()?,
diabetes_pedigree: record.get(13)?.parse().ok()?,
age: record.get(14)?.parse().ok()?,
outcome: record.get(15)?.parse().ok()?,
})
}
fn to_string_record(&self) -> StringRecord {
let mut record = StringRecord::new();
record.push_field(&self.first_name);
record.push_field(&self.middle_initials);
record.push_field(&self.last_name);
record.push_field(&self.email_address);
record.push_field(&self.zip_code);
record.push_field(&self.telephone_number);
record.push_field(&self.medical_record_number.to_string());
record.push_field(&self.pregnancies.to_string());
record.push_field(&self.glucose.to_string());
record.push_field(&self.blood_pressure.to_string());
record.push_field(&self.skin_thickness.to_string());
record.push_field(&self.insulin.to_string());
record.push_field(&self.bmi.to_string());
record.push_field(&self.diabetes_pedigree.to_string());
record.push_field(&self.age.to_string());
record.push_field(&self.outcome.to_string());
record
}
}
// Apply HIPAA Safe Harbor deidentification rules
fn apply_hipaa_deidentification(record: &mut PatientRecord) {
// 1. Names - Remove all names
record.first_name = String::from("REDACTED");
record.middle_initials = String::from("REDACTED");
record.last_name = String::from("REDACTED");
// 2. Geographic subdivisions smaller than a state
// For ZIP codes, we need to truncate to first 3 digits and set to 000 if population < 20,000
// Note: In a real implementation, we would need a database of ZIP code populations
// For this example, we'll just truncate to 3 digits
if record.zip_code.len() >= 3 {
record.zip_code = format!("{}000", &record.zip_code[..3]);
}
// 3. All elements of dates (except year) for dates directly related to an individual
// Not applicable in this dataset
// 4. Telephone numbers - Remove
record.telephone_number = String::from("REDACTED");
// 5. Fax numbers - Not present in dataset
// 6. Electronic mail addresses - Remove
record.email_address = String::from("REDACTED");
// 7. Social Security Numbers - Not present in dataset
// 8. Medical record numbers - Remove
record.medical_record_number = 0;
// 9. Health plan beneficiary numbers - Not present in dataset
// 10. Account numbers - Not present in dataset
// 11. Certificate/license numbers - Not present in dataset
// 12. Vehicle identifiers and serial numbers - Not present in dataset
// 13. Device identifiers and serial numbers - Not present in dataset
// 14. Web Universal Resource Locators (URLs) - Not present in dataset
// 15. Internet Protocol (IP) address numbers - Not present in dataset
// 16. Biometric identifiers - Not present in dataset
// 17. Full face photographic images - Not present in dataset
// 18. Any other unique identifying number, characteristic, or code
// Note: We're keeping the medical data fields as they are not considered identifiers
// under HIPAA Safe Harbor rules
}
#[nexus_rt::main]
fn main() {
// This function would be called with a vector of StringRecords
// For demonstration purposes, we'll create a sample record
let mut records: Vec<StringRecord> = Vec::new();
// Process each record
let mut deidentified_records: Vec<StringRecord> = Vec::new();
for record in records {
if let Some(mut patient_record) = PatientRecord::from_string_record(&record) {
apply_hipaa_deidentification(&mut patient_record);
deidentified_records.push(patient_record.to_string_record());
}
}
// Return the deidentified records
// Note: In a real implementation, this would return the vector
// For this example, we'll just print a message
nexus_rt::print!("Processed {} records\n", deidentified_records.len());
}
Overall, the guest code generated by the agent is pretty reasonable, but in a number of ways it’s not correct. For starters, it didn’t quite get the requested I/O — perhaps not all that surprising, as the main
function in Rust taking inputs and returning outputs isn’t conventional behavior. Nor did it add the dependency for csv
to Cargo.toml
.
Much more importantly, it didn’t nail the ZIP code rewriting (it returns a six digit code), and missed the need to clamp the ages for anyone ninety or older:
(C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html
We also happen to know that for the dataset all integer values will be positive, so it would be great to use u32
s rather than i32
s as some extra validation. So let’s follow up with a quick prompt for some fixes.
That's a good start, but can we make the following changes?
- Can we have
main
take in a vector of StringRecords, operate over them, and return the modified vector of StringRecords? I know this is unusual for themain
function in Rust, but I will be running the code in an unusual virtual machine environment where this will work correctly. - For the ZIP code, the new output will be six digits long. Can we remove the final trailing
0
added? - The safe harbor rules also require modifying the age to clamp all ages
>= 90
down to just90
(see @https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html, (C) All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older). Can we implement this? - Can we make sure
csv
is added to the dependencies insrc/guest/Cargo.toml
in a manner that isno_std
compatible? - Can we use
u32
s instead ofi32
s for integer data? - Can we count how many records fail input validation, and print (using
nexus_rt:print! or nexus_rt::println
) the count?
The agent responded well. It retyped all i32
to u32
as requested, and we also now have the updated checks:
// 2. Geographic subdivisions smaller than a state
// For ZIP codes, we need to truncate to first 3 digits and set to 000 if population < 20,000
// Note: In a real implementation, we would need a database of ZIP code populations
// For this example, we'll just truncate to 3 digits
if record.zip_code.len() >= 3 {
record.zip_code = format!("{}00", &record.zip_code[..3]);
}
// 3. All elements of dates (except year) for dates directly related to an individual
// For ages over 89, we need to clamp them to 90
if record.age > 89 {
record.age = 90;
}
And an updated main
:
#[nexus_rt::main]
fn main(records: Vec<StringRecord>) -> Vec<StringRecord> {
let mut deidentified_records: Vec<StringRecord> = Vec::new();
let mut failed_records = 0;
// Process each record
for record in records {
if let Some(mut patient_record) = PatientRecord::from_string_record(&record) {
apply_hipaa_deidentification(&mut patient_record);
deidentified_records.push(patient_record.to_string_record());
} else {
failed_records += 1;
}
}
// Print the number of failed records
nexus_rt::print!("Failed to process {} records\n", failed_records);
// Return the deidentified records
deidentified_records
}
Notice that by default records
will be a private input, but we could also decorate main
with #[nexus_rt::private_input(records)]
, if so desired. The agent also added csv
to Cargo.toml
, as requested:
[dependencies]
nexus-rt = { git = "https://github.com/nexus-xyz/nexus-zkvm.git", tag = "0.3.3", version = "0.3.3" }
postcard = { version = "1.1.1", default-features = false, features = ["alloc"] }
csv = { version = "1.3.0", default-features = false, features = ["alloc"] }
Adding Some Complexity
Now let’s add an additional complication. The HIPAA safe harbor rule allows the data processing entity to insert an identifier that allows them to reidentify those records in the future, provided that identifier meets the following guidelines:
Re-identification
The implementation specifications further provide direction with respect to re-identification, specifically the assignment of a unique code to the set of de-identified health information to permit re-identification by the covered entity.
If a covered entity or business associate successfully undertook an effort to identify the subject of de-identified information it maintained, the health information now related to a specific individual would again be protected by the Privacy Rule, as it would meet the definition of PHI. Disclosure of a code or other means of record identification designed to enable coded or otherwise de-identified information to be re-identified is also considered a disclosure of PHI.
(c) Implementation specifications: re-identification. A covered entity may assign a code or other means of record identification to allow information de-identified under this section to be re-identified by the covered entity, provided that:
(1) Derivation. The code or other means of record identification is not derived from or related to information about the individual and is not otherwise capable of being translated so as to identify the individual; and
(2) Security. The covered entity does not use or disclose the code or other means of record identification for any other purpose, and does not disclose the mechanism for re-identification.
https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html
So we can ask the agent to add some functionality that would enable reidentification, where the identifier is a random uuid provided by the host. We’ll also ask the agent to add a debug print of the original record with its new identifier as structured data, which the data processor can store for future reidentification, but which will not be included in the public output.
All the changes look great. I have a new feature to add. As input to main
, can we also take a vector of UUIDs. For the i
th StringRecord, let's add the i
th UUID as an entry to the record. If there are more StringRecords than UUIDs, then we should panic with a clear error message.
Also, for each record successfully deidentified, let's add code to print it out as structured JSON data -- with the new UUID included.
The agent seemingly nails this pretty much first try, with a new Cargo.toml
dependency list:
[dependencies]
nexus-rt = { git = "https://github.com/nexus-xyz/nexus-zkvm.git", tag = "0.3.3", version = "0.3.3" }
postcard = { version = "1.1.1", default-features = false, features = ["alloc"] }
csv = { version = "1.3.0", default-features = false, features = ["alloc"] }
uuid = { version = "1.7.0", default-features = false, features = ["v4", "serde"] }
serde = { version = "1.0", default-features = false, features = ["alloc", "derive"] }
serde_json = { version = "1.0", default-features = false, features = ["alloc"] }
And an updated solution that includes adding the uuid
to the record, as well as logging that now makes the main
function look like:
#[nexus_rt::main]
#[nexus_rt::public_input(uuids)] // --> manually added
fn main(records: Vec<StringRecord>, uuids: Vec<Uuid>) -> Vec<StringRecord> {
if records.len() > uuids.len() {
panic!("Not enough UUIDs provided. Need {} UUIDs for {} records.", records.len(), uuids.len());
}
let mut deidentified_records: Vec<StringRecord> = Vec::new();
let mut failed_records = 0;
// Process each record
for (i, record) in records.into_iter().enumerate() {
if let Some(mut patient_record) = PatientRecord::from_string_record(&record, &uuids[i]) {
apply_hipaa_deidentification(&mut patient_record);
// Print the deidentified record as JSON
if let Ok(json) = serde_json::to_string(&patient_record) {
nexus_rt::print!("Deidentified record: {}\n", json);
}
deidentified_records.push(patient_record.to_string_record());
} else {
failed_records += 1;
}
}
// Print the number of failed records
nexus_rt::print!("Failed to process {} records\n", failed_records);
// Return the deidentified records
deidentified_records
}
Getting it Running
Up till now we have not actually tested any of this code. So let’s go ahead and do so. Straight off the bat, trying to compile the guest code we have some small issues we can easily fix (a missing import or two, and an unnecessary attempt to use the v4
feature for uuid
, which requires an interface to system randomness that the zkVM context does not have). However, there is a much bigger problem: despite what the agent tried, csv
is not no_std
compatible, and cannot be made as such. The alloc
feature that the agent attempted to use to make csv
work in a no_std
environment doesn’t even exist for that package! So, let’s go ahead and ask the agent to try and fix this, by noting there is a no_std
compatible csv
alternative, csv-core
.
It seems to me like csv
requires the standard library and can't be made to work without it. Can you replace our dependency on it using csv-core
?
The initial attempt by the agent implemented CSV parsing of the raw strings within the guest program.
// Custom record type to replace csv::StringRecord
struct Record {
fields: Vec<String>,
}
impl Record {
fn new() -> Self {
Record {
fields: Vec::new(),
}
}
fn push_field(&mut self, field: &str) {
self.fields.push(field.to_string());
}
fn get(&self, index: usize) -> Option<&str> {
self.fields.get(index).map(|s| s.as_str())
}
}
fn parse_csv_record(input: &str) -> Option<Record> {
// ... a bunch of code manually parsing strings of raw csv rows ...
}
Obviously, it’s easier to do that in the host. So I asked the agent to keep the Record
definition, but to just assume the input to the main
function was a vector of the resultant Record
objects and work from there. Now we end up the resultant main
:
#[nexus_rt::main]
#[nexus_rt::public_input(uuids)] // --> manually added
fn main(records: Vec<Record>, uuids: Vec<Uuid>) -> Vec<Record> {
if records.len() > uuids.len() {
panic!("Not enough UUIDs provided. Need {} UUIDs for {} records.", records.len(), uuids.len());
}
let mut deidentified_records: Vec<Record> = Vec::new();
let mut failed_records = 0;
// Process each record
for (i, record) in records.into_iter().enumerate() {
if let Some(mut patient_record) = PatientRecord::from_record(&record, &uuids[i]) {
apply_hipaa_deidentification(&mut patient_record);
// Print the deidentified record as JSON
if let Ok(json) = serde_json::to_string(&patient_record) {
nexus_rt::print!("Deidentified record: {}\n", json);
}
deidentified_records.push(patient_record.to_record());
} else {
failed_records += 1;
}
}
// Print the number of failed records
nexus_rt::print!("Failed to process {} records\n", failed_records);
// Return the deidentified records
deidentified_records
}
After making sure Record
implements serde::Serialize
and serde:Deserialize
using a simple derive macro and removing some now unnecessary imports, everything builds as desired.
The Host Program
To switch things up, I decided to implement the host program myself. I extracted the Record
struct and exposed it to the host program (as guest::Record
) as well. I then wrote up the host program to load the dataset, execute the anonymizer in the zkVM, and prove the execution. For the evaluation, I downloaded the dataset, trimmed it from 768 rows to 16 rows (since our interest here isn’t evaluating performance/proving time, so much as usability), added some generic information for the PHI columns, and saved it as a .csv
file.
use nexus_sdk::{
compile::{cargo::CargoPackager, Compile, Compiler},
stwo::seq::Stwo,
ByGuestCompilation, Local, Prover, Verifiable, Viewable,
};
use std::fs::File;
use uuid::Uuid;
const PACKAGE: &str = "guest";
fn main() {
println!("Compiling guest program...");
let mut prover_compiler = Compiler::<CargoPackager>::new(PACKAGE);
let prover: Stwo<Local> =
Stwo::compile(&mut prover_compiler).expect("failed to compile guest program");
let elf = prover.elf.clone(); // save elf for use with verification
let f = File::open("./src/diabetes.csv").expect("unable to find dataset");
let mut rdr = csv::Reader::from_reader(f);
let mut records: Vec<guest::Record> = Vec::new();
rdr.records().for_each(|maybe_record| {
if let Ok(record) = maybe_record {
let mut simple = guest::Record::new();
record.iter().for_each(|field| {
simple.push_field(field);
});
records.push(simple);
}
});
let uuids = (0..records.len()).map(|_| {
Uuid::new_v4()
}).collect();
println!("Proving execution of vm...");
let (view, proof) = prover.prove_with_input::<Vec<guest::Record>, Vec<Uuid>>(&records, &uuids).expect("failed to prove program");
println!(
">>>>> Logging\n{}<<<<<",
view.logs().expect("failed to retrieve debug logs").join("")
);
assert_eq!(
view.exit_code().expect("failed to retrieve exit code"),
nexus_sdk::KnownExitCodes::ExitSuccess as u32
);
print!("Verifying execution...");
#[rustfmt::skip]
proof
.verify_expected::<Vec<Uuid>, Vec<guest::Record>>(
&uuids,
nexus_sdk::KnownExitCodes::ExitSuccess as u32,
&view.public_output().expect("failed to retrieve output"),
&elf,
&[],
)
.expect("failed to verify proof");
println!(" Succeeded!");
}
Executing the host program, we can see that we missed one further issue in the guest program:
*[main][~/Code/hipaa]$ cargo run -r
Finished `release` profile [optimized] target(s) in 0.20s
Running `target/release/hipaa`
Compiling guest program...
Proving execution of vm...
>>>>> Logging
Deidentified record: {"uuid":"bd2dcc19-68d7-4e98-8079-8855d6536978","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":6,"glucose":148,"blood_pressure":72,"skin_thickness":35,"insulin":0,"bmi":33.6,"diabetes_pedigree":0.627,"age":50,"outcome":1}
Deidentified record: {"uuid":"e219742d-b955-4ccc-aa1e-3458e0e85dc7","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":1,"glucose":85,"blood_pressure":66,"skin_thickness":29,"insulin":0,"bmi":26.6,"diabetes_pedigree":0.351,"age":31,"outcome":0}
Deidentified record: {"uuid":"9a286ea8-adc4-4f18-b698-4c5faabcb854","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":8,"glucose":183,"blood_pressure":64,"skin_thickness":0,"insulin":0,"bmi":23.3,"diabetes_pedigree":0.672,"age":32,"outcome":1}
Deidentified record: {"uuid":"f4be1723-1a58-4f16-a542-26db1c1d62d9","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":1,"glucose":89,"blood_pressure":66,"skin_thickness":23,"insulin":94,"bmi":28.1,"diabetes_pedigree":0.167,"age":21,"outcome":0}
Deidentified record: {"uuid":"3f2b5ed8-f77a-4f2a-bb9c-3b30060cece0","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":0,"glucose":137,"blood_pressure":40,"skin_thickness":35,"insulin":168,"bmi":43.1,"diabetes_pedigree":2.288,"age":33,"outcome":1}
Deidentified record: {"uuid":"49c55a62-e437-4ae0-926c-d3458c69b968","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":5,"glucose":116,"blood_pressure":74,"skin_thickness":0,"insulin":0,"bmi":25.6,"diabetes_pedigree":0.201,"age":30,"outcome":0}
Deidentified record: {"uuid":"e6f105eb-8d84-4375-9f6d-761090cf26b9","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":3,"glucose":78,"blood_pressure":50,"skin_thickness":32,"insulin":88,"bmi":31.0,"diabetes_pedigree":0.248,"age":26,"outcome":1}
Deidentified record: {"uuid":"134b2140-3952-4efd-8912-abf43f5b092d","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":10,"glucose":115,"blood_pressure":0,"skin_thickness":0,"insulin":0,"bmi":35.3,"diabetes_pedigree":0.134,"age":29,"outcome":0}
Deidentified record: {"uuid":"26f1fdf5-42cb-4d5d-81a3-975d02b3f27a","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":2,"glucose":197,"blood_pressure":70,"skin_thickness":45,"insulin":543,"bmi":30.5,"diabetes_pedigree":0.158,"age":53,"outcome":1}
Deidentified record: {"uuid":"63931f67-964a-4be4-aff3-642d4ca35d12","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":8,"glucose":125,"blood_pressure":96,"skin_thickness":0,"insulin":0,"bmi":0.0,"diabetes_pedigree":0.232,"age":54,"outcome":1}
Deidentified record: {"uuid":"b39037b5-5280-4dcf-8fe7-d8ed00547921","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":4,"glucose":110,"blood_pressure":92,"skin_thickness":0,"insulin":0,"bmi":37.6,"diabetes_pedigree":0.191,"age":30,"outcome":0}
Deidentified record: {"uuid":"43900718-07d4-4201-88bd-c945730e2ea7","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":10,"glucose":168,"blood_pressure":74,"skin_thickness":0,"insulin":0,"bmi":38.0,"diabetes_pedigree":0.537,"age":34,"outcome":1}
Deidentified record: {"uuid":"ceeb835c-a6cf-4f0b-aac1-29c53148e0af","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":10,"glucose":139,"blood_pressure":80,"skin_thickness":0,"insulin":0,"bmi":27.1,"diabetes_pedigree":1.441,"age":57,"outcome":0}
Deidentified record: {"uuid":"eebb0dea-8228-4488-ae8d-9594c5fb603b","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":1,"glucose":189,"blood_pressure":60,"skin_thickness":23,"insulin":846,"bmi":30.1,"diabetes_pedigree":0.398,"age":59,"outcome":1}
Deidentified record: {"uuid":"495cbdc9-940a-4d61-8856-6b492af090b3","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":5,"glucose":166,"blood_pressure":72,"skin_thickness":19,"insulin":175,"bmi":25.8,"diabetes_pedigree":0.587,"age":51,"outcome":1}
Deidentified record: {"uuid":"61604697-6544-407e-aeb4-030041e11624","first_name":"REDACTED","middle_initials":"REDACTED","last_name":"REDACTED","email_address":"REDACTED","zip_code":"12300","telephone_number":"REDACTED","medical_record_number":0,"pregnancies":7,"glucose":100,"blood_pressure":0,"skin_thickness":0,"insulin":0,"bmi":30.0,"diabetes_pedigree":0.484,"age":32,"outcome":1}
Failed to process 0 records
<<<<<
Verifying execution... Succeeded!
It does the debug printing after deidentification, which of course defeats the purpose of that printing, which is for future reidentification. Note that this is pretty clearly a prompting issue: my prompt was poorly-worded, and the agent did was was asked for. But, at least in the process we can see that the deidentification works correctly. After just manually applying the fix of moving the debug printing before the call to apply_hipaa_deidentification
, we get the expected output.
[main][~/Code/hipaa]$ time cargo run -r
Finished `release` profile [optimized] target(s) in 0.07s
Running `target/release/hipaa`
Compiling guest program...
Proving execution of vm...
>>>>> Logging
Deidentified record: {"uuid":"6fac663a-1f48-4f58-b1d1-016ef517412f","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":6,"glucose":148,"blood_pressure":72,"skin_thickness":35,"insulin":0,"bmi":33.6,"diabetes_pedigree":0.627,"age":50,"outcome":1}
Deidentified record: {"uuid":"e4412069-30a2-42d1-9584-2bca80df8469","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":1,"glucose":85,"blood_pressure":66,"skin_thickness":29,"insulin":0,"bmi":26.6,"diabetes_pedigree":0.351,"age":31,"outcome":0}
Deidentified record: {"uuid":"7e876134-fe9a-4267-b2e6-97c5f9f3561c","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":8,"glucose":183,"blood_pressure":64,"skin_thickness":0,"insulin":0,"bmi":23.3,"diabetes_pedigree":0.672,"age":32,"outcome":1}
Deidentified record: {"uuid":"8295c840-dd5e-40a5-ac61-80c96fc8b2ab","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":1,"glucose":89,"blood_pressure":66,"skin_thickness":23,"insulin":94,"bmi":28.1,"diabetes_pedigree":0.167,"age":21,"outcome":0}
Deidentified record: {"uuid":"1fc2c019-558e-4396-8e43-fd345beab7b4","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":0,"glucose":137,"blood_pressure":40,"skin_thickness":35,"insulin":168,"bmi":43.1,"diabetes_pedigree":2.288,"age":33,"outcome":1}
Deidentified record: {"uuid":"5f5c15cd-7cf0-4677-b3bf-74a6acc7e6ea","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":5,"glucose":116,"blood_pressure":74,"skin_thickness":0,"insulin":0,"bmi":25.6,"diabetes_pedigree":0.201,"age":30,"outcome":0}
Deidentified record: {"uuid":"8461c6db-6b08-4845-824c-ca0da0bda44b","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":3,"glucose":78,"blood_pressure":50,"skin_thickness":32,"insulin":88,"bmi":31.0,"diabetes_pedigree":0.248,"age":26,"outcome":1}
Deidentified record: {"uuid":"15c28c1a-e7db-4d31-9894-0ea58b02b1fa","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":10,"glucose":115,"blood_pressure":0,"skin_thickness":0,"insulin":0,"bmi":35.3,"diabetes_pedigree":0.134,"age":29,"outcome":0}
Deidentified record: {"uuid":"5bb2911d-1b0b-451f-815b-aa735ca04e33","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":2,"glucose":197,"blood_pressure":70,"skin_thickness":45,"insulin":543,"bmi":30.5,"diabetes_pedigree":0.158,"age":53,"outcome":1}
Deidentified record: {"uuid":"673f769d-c399-4e0b-85bc-5d1bad31f366","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":8,"glucose":125,"blood_pressure":96,"skin_thickness":0,"insulin":0,"bmi":0.0,"diabetes_pedigree":0.232,"age":54,"outcome":1}
Deidentified record: {"uuid":"03b4ef07-93f3-439f-854a-9cb21ad25e4a","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":4,"glucose":110,"blood_pressure":92,"skin_thickness":0,"insulin":0,"bmi":37.6,"diabetes_pedigree":0.191,"age":30,"outcome":0}
Deidentified record: {"uuid":"64a4c8f9-2638-4ac7-aa5a-307e8a5d6856","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":10,"glucose":168,"blood_pressure":74,"skin_thickness":0,"insulin":0,"bmi":38.0,"diabetes_pedigree":0.537,"age":34,"outcome":1}
Deidentified record: {"uuid":"6432fe51-93be-40f2-816c-61d0314979c5","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":10,"glucose":139,"blood_pressure":80,"skin_thickness":0,"insulin":0,"bmi":27.1,"diabetes_pedigree":1.441,"age":57,"outcome":0}
Deidentified record: {"uuid":"8ba93969-3219-444c-9124-91c065b9efa1","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":1,"glucose":189,"blood_pressure":60,"skin_thickness":23,"insulin":846,"bmi":30.1,"diabetes_pedigree":0.398,"age":59,"outcome":1}
Deidentified record: {"uuid":"d48493eb-e29c-42dc-aa86-6b3adc29b7d9","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":5,"glucose":166,"blood_pressure":72,"skin_thickness":19,"insulin":175,"bmi":25.8,"diabetes_pedigree":0.587,"age":51,"outcome":1}
Deidentified record: {"uuid":"5591ffee-86e9-4a84-b2be-e2265a2cc17c","first_name":"Test","middle_initials":"T.","last_name":"Test","email_address":"test@test.com","zip_code":"12345","telephone_number":"555-555-5555","medical_record_number":99999,"pregnancies":7,"glucose":100,"blood_pressure":0,"skin_thickness":0,"insulin":0,"bmi":30.0,"diabetes_pedigree":0.484,"age":32,"outcome":1}
Failed to process 0 records
<<<<<
Verifying execution... Succeeded!
cargo run -r 102.05s user 24.12s system 45% cpu 4:36.17 total
And, just out of curiosity you can see I decided to use UNIX time
to get the wall clock timing for the entire host program execution on my MacBook M3 Pro, including compiling the guest program, parsing the CSV file, generating the UUIDs, tracing the execution, proving, and verifying.
To see the final code used in this blog post, you can check it out here: https://github.com/nexus-xyz/hipaa-demo
The Nexus Verifiable AI Lab is dedicated to exploring the frontier of verifiability, economics, and artificial intelligence in order to expand the boundaries of human and machine cooperation.