Back to Archive
ENGINEERINGEngineering

Adversarial Testing Against the Compiler Chain

How the team tries to break the compiler and what those tests can and cannot prove about the formal system.

2026-03-24T00:00:00.000Z
Editorial cover for Adversarial Testing Against the Compiler Chain

The Question We Asked Ourselves

When you build a compiler that transpiles code between 10 input languages and 14 output targets, that certifies programs mathematically, and that compiles itself — there is one question that towers above everything else:

How do you know it actually works?

The industry standard answer is "we tested it pretty well." We looked at that answer and decided it was an embarrassment. "Pretty well" is not engineering. So we did something different.

What Are Abyssal Tests?

We call them abyssal tests because they go all the way down — to the absolute bottom of the system. These are not your typical integration tests that verify "the button works." These are tests designed to destroy our own compiler. Every atomic operation. Every value combination. Every backend. Every control flow pattern. We tried to break it. Systematically.

110,227 Tests Across 7 Categories

The tests span 7 categories: individual monomer operations, multi-family compositions, cross-target consistency, determinism verification, real execution with verified I/O, security and abuse resistance, and regression coverage. Every single test verifies a concrete, specific property. None of these are randomly generated. Each one exists because it targets a specific execution path that could fail — and we made sure it does not.

What We Tried to Break

Level 1: Individual Operations

Every monomer in the full catalog was tested with boundary values: 0, 1, 127, 128, 255, and every dangerous combination between them. ADD8(255, 1) must produce wrap-around — not a crash, not undefined behavior, wrap-around. DIV8(x, 0) must produce a controlled error — not a segfault. SHL(1, 7) must produce 128. under declared constraints. No "it depends." No platform-specific behavior.

Level 2: Compositions

Here is where most compilers fall apart. An individual monomer can work perfectly and fail catastrophically when composed with another. We generated chains of 2, 3, 4, 5, and 6 operations mixing families: arithmetic with logic, logic with strings, strings with float, float with trigonometry. If ADD8 works and SIN works, does SIN(ADD8(1,2)) work?

Yes. In every single case. Every combination. Every permutation.

Level 3: Cross-Target

The same PCD program must produce correct code in JavaScript, Python, Rust, Go, C, C++, PHP, and Java. Each monomer generates idiomatic code in the target language — not a transliteration, but native semantics appropriate to that language. And here is the hard part: all backends must produce the same result for the same input. Identical outputs. Across languages.

2,864 tests verify this for monomer combinations alone. Every single one passes.

Level 4: Determinism

This is the most important property of BRIK64: the same input produces the same output. Always. Not "usually." Not "in most cases." Always. No garbage collection pausing between two runs. No JIT optimizing differently the second time. No scheduler reordering operations behind your back.

Every program is compiled twice. Hashes are compared. If they differ by a single bit, the test fails. 600 determinism tests. Zero failures.

Level 5: Real Execution

The first 100,000 tests verified code generation — that the compiler produces valid, compilable code. The last 10,000 go further. They verify real execution: that the generated code, when actually run, produces the correct values. Not just valid syntax. Correct answers.

ADD8(1, 2) must not only generate code that compiles — it must produce 3 when executed. SIN(0) must produce 0.0. A loop that accumulates 10 times must produce exactly 10.

These tests execute the BIR (BRIK Intermediate Representation) with known input values and verify that the output is exactly what the mathematics predicts. Not approximately. Exactly.

Level 6: Security and Abuse

What happens when someone deliberately tries to attack the compiler? SQL injection in a PCD variable name. XSS in a string literal. Path traversal in a filesystem argument. Unicode homoglyphs designed to confuse the parser. We threw everything we could think of at it.

484 regression and security tests verify that the system rejects or correctly handles every single malicious case. The compiler is not just correct — it is hostile to attackers.

Level 7: Regression

Every bug we found and fixed during development became a permanent, immortal test case. The array overflow that caused a segfault in ELF generation. The variable scoping in if blocks that did not propagate to the outer scope. The ENV function that did not exist as a monomer and returned garbage.

These bugs can never come back. Not tomorrow. Not next year. Not ever. Their tests are embedded in the artifact forever.

What We Did NOT Find

This is the part that matters most. After 110,227 deliberate, systematic attempts to break our own system:

0 failures in core operations. Every certified monomer, Phi C = 1. The mathematical certification holds under adversarial conditions.

0 determinism failures. Same input, same output. Always.

0 uncontrolled crashes in the compilation pipeline.

0 cross-target inconsistencies. All backends produce equivalent code. Write once, run anywhere — and get the same answer everywhere.

Why This Is Possible

The secret is not that we are better testers than everyone else. It is that the operation space is finite. And that changes everything.

A conventional program has a virtually infinite state space: any combination of calls to any function with any argument. Exhaustively verifying a 1,000-line Python program is computationally outside the declared model. Nobody will ever do it. It cannot be done.

A PCD program is composed of exactly 128 atomic operations. Each one has a known signature, a known domain, and a known range. You can verify every combination because the space is finite. This is not cleverness. This is architecture.

It is the same reason you can formally verify a digital circuit with 128 gates but you cannot formally verify a modern processor with a billion transistors. We made the deliberate architectural decision to keep the component space finite. And that decision is what makes exhaustive verification not just viable — but inevitable.

The Result

110,227 tests. 0 failures. This is not a marketing claim. It is not a rounded number. It is a verifiable fact. Every test is in the repository. Every one runs on every commit. Every one produces the same result today that it produced yesterday and will produce tomorrow and will produce a decade from now.

Because that is what "deterministic by construction" means. Not a promise. A mathematical property.

Run the Corpus

git clone https://github.com/brik64/brik64-demos.git cd brik64-demos ./run_demo.sh adversarial-corpus

The abyssal tests cover the full monomer catalog, 14 backends, 10 input languages, control flow, multi-family compositions, determinism, real execution, security, and regression. The code and the tests are part of the same verifiable, immutable artifact. Run them yourself. The numbers do not change.

More reading

Continue the archive

Full archive
Technical BRIK64 diagram showing an AI coding prompt transformed into a reviewable blueprint, human review checkpoint, and target outputs.
AI safetyUncategorized

Reviewable AI Coding Pipelines: From Prompt to Blueprint

AI-generated code workflows become more reviewable when teams separate the prompt, generated code, structural blueprint, human review, and target compilation.

Open article
BRIK64 editorial image showing AI-generated software becoming inspectable, certified, and compiled across targets.
AI-generated softwareUncategorized

Making AI-Generated Software Reviewable

AI-generated software can move quickly, but it still needs structure, traceability, and review boundaries. BRIK64 helps teams preserve the blueprint behind generated code.

Open article
Editorial cover for AI Governance Workflows Need Reviewable Technical Evidence
AI SAFETYAI Safety

AI Governance Workflows Need Reviewable Technical Evidence

How bounded software evidence can help teams carry AI governance reviews into compliance workflows without implying full legal coverage.

Open article
Editorial cover for Compiler Evidence: Targets, Proof Files, and Test Scope
ENGINEERINGEngineering

Compiler Evidence: Targets, Proof Files, and Test Scope

A summary of the public numbers that can be stated responsibly and the limits of what those numbers prove.

Open article
Editorial cover for Safety-Critical Software Needs a Readable Assurance Path
PRODUCTProduct

Safety-Critical Software Needs a Readable Assurance Path

How bounded software evidence can support engineering review in high-consequence domains without replacing the broader safety program.

Open article
Editorial cover for Bounded Contract Logic Before Deployment
PRODUCTProduct

Bounded Contract Logic Before Deployment

Why smart contract workflows benefit from explicit state boundaries, value constraints, and reviewable rule sets before deployment.

Open article
Editorial cover for What the Proof Material Means for Users
VISIONFoundations

What the Proof Material Means for Users

A practical note on the proof files behind the compiler and what remains invisible to a normal authoring workflow.

Open article
Editorial cover for Why a New Format Instead of Another General-Purpose Language
VISIONFoundations

Why a New Format Instead of Another General-Purpose Language

Why BRIK64 introduces PCD as a bounded computational format rather than extending a conventional language with another annotation layer.

Open article
Editorial cover for Translation Validation Across Two Targets
RESEARCHResearch

Translation Validation Across Two Targets

A look at cross-target output comparison, what it can support, and what still depends on the bounded intermediate form.

Open article
Editorial cover for Why Tests Passing Is Not the Same as Closure
VERIFICATIONEngineering

Why Tests Passing Is Not the Same as Closure

A look at sampled testing versus bounded verification, with examples of logic that passed tests but still required stronger structural checks.

Open article
Editorial cover for One Blueprint Across Multiple Targets
PRODUCTProduct

One Blueprint Across Multiple Targets

How the transpilation chain uses PCD as a bounded intermediate form, what 10 source languages and 14 targets mean in practice, and where the equivalence claim stops.

Open article
Editorial cover for What AI Intuition Still Cannot Verify
AI SAFETYAI Safety

What AI Intuition Still Cannot Verify

Why intuition without an external proof path remains a risk, and where BRIK64 fits in that boundary.

Open article
Editorial cover for API and MCP Access Around the Registry
PLATFORMProduct

API and MCP Access Around the Registry

How discover-and-execute workflows expose registry and platform operations to humans and agents without enlarging the proof claim.

Open article
Editorial cover for Blueprints Before Refactors
REVOLUTIONProduct

Blueprints Before Refactors

How extracting bounded computation from an existing codebase can make rewrites and target changes easier to review.

Open article
Editorial cover for A Bounded JavaScript-to-Rust Workflow
TUTORIALGetting Started

A Bounded JavaScript-to-Rust Workflow

Lift the logic, review the bounded blueprint, then emit a target language while keeping the claim attached to the intermediate circuit.

Open article
Editorial cover for Lifting Existing Code into a Reviewable Blueprint
TOOLINGProduct

Lifting Existing Code into a Reviewable Blueprint

What the Lifter preserves, where liftability evidence exists in the repo, and how bounded blueprints help before migration.

Open article
Editorial cover for COBOL Migration Through Bounded Lift-and-Review
MIGRATIONProduct

COBOL Migration Through Bounded Lift-and-Review

Why legacy modernization benefits from lifting review-critical logic into a bounded blueprint before transpilation or replacement.

Open article
Editorial cover for Why AI-Generated Code Needs Blueprints and External Checks
PRODUCTAI Safety

Why AI-Generated Code Needs Blueprints and External Checks

Generated code and generated tests can fail together. This note explains why BRIK64 keeps verification outside the model loop.

Open article
Editorial cover for Which Parts of a Codebase Are Ready for Stronger Review?
PRODUCTProduct

Which Parts of a Codebase Are Ready for Stronger Review?

Use lifting and bounded analysis to identify review-critical functions before migration or certification work.

Open article
Editorial cover for Laszlo B. Kish and the Information-Theory Thread
RESEARCHResearch

Laszlo B. Kish and the Information-Theory Thread

A research profile on the ideas that influenced the information-theoretic framing behind Digital Circuitality.

Open article
Editorial cover for Informational Entropy Is Not Thermal Entropy
RESEARCHResearch

Informational Entropy Is Not Thermal Entropy

Why the distinction matters for the foundations story and how it sharpens the claim boundary around Digital Circuitality.

Open article
Editorial cover for From Preferences to Enforced Action Boundaries
AI SAFETYAI Safety

From Preferences to Enforced Action Boundaries

Why robotics and agent systems need explicit action gates, bounded state, and reviewable fallback paths.

Open article
Editorial cover for First PCD Circuit: A Minimal Walkthrough
TUTORIALGetting Started

First PCD Circuit: A Minimal Walkthrough

Install the CLI, write a small circuit, and inspect the bounded output path. A practical introduction to the format and the compile step.

Open article
Editorial cover for EVA Algebra: Sequence, Parallel, Conditional
DEEP DIVETheory

EVA Algebra: Sequence, Parallel, Conditional

How three composition operators carry sequencing, fan-out, and branching through the circuit model, and what that means for compiler readability and closure.

Open article
Editorial cover for Working with the SDKs Without Leaving the Bounded Model
SDKSGetting Started

Working with the SDKs Without Leaving the Bounded Model

How the Rust, JavaScript, and Python SDKs expose BRIK64 patterns while keeping the formal core distinct from host-language code.

Open article
Editorial cover for Why Software Verification Still Looks Different from Hardware
RESEARCHResearch

Why Software Verification Still Looks Different from Hardware

A comparison between sampled software testing and the compositional review posture hardware teams expect.

Open article
Editorial cover for 128 Operations and the Boundary Between Core and Bridges
ENGINEERINGEngineering

128 Operations and the Boundary Between Core and Bridges

A tour of the reviewed core, the contract-bounded extensions, and what that split means for technical scope.

Open article
Editorial cover for PCD for AI Agents: A Small Format with an External Proof Loop
AI AGENTSAI

PCD for AI Agents: A Small Format with an External Proof Loop

How a finite grammar helps agents author bounded logic while the compiler and policy checks stay outside the model.

Open article
Editorial cover for Precision as a Declared Domain
ENGINEERINGEngineering

Precision as a Declared Domain

Why bounded numeric domains matter for floating behavior, decimal handling, and reviewable arithmetic.

Open article
Editorial cover for BPU: Policy Enforcement as a Hardware Roadmap
HARDWAREHardware

BPU: Policy Enforcement as a Hardware Roadmap

Why software-only guardrails share execution context with the model they constrain, and how the BPU roadmap moves policy enforcement toward FPGA and silicon.

Open article
Editorial cover for Policy Circuits for AI Safety Workflows
AI SAFETYAI Safety

Policy Circuits for AI Safety Workflows

How external policy circuits can gate generated code and agent actions without claiming to solve general alignment.

Open article
Editorial cover for What Digital Circuitality Tries to Formalize
VISIONFoundations

What Digital Circuitality Tries to Formalize

A bounded programming model built from reviewed operations, explicit composition, and closure checks.

Open article