We didn't just claim it works.
We benchmarked it.

Same prompts. Same model. The only variable was IQPROMPT. A controlled A/B benchmark scored on seven quality dimensions by an automated judge designed to isolate exactly what our layer adds.

Method: Controlled A/B · Same model, same task · Single isolated variable · 7-dimension automated scoring

Headline Results

The numbers that matter.

Prompts improved

83%

Of all benchmark prompts showed measurable improvement with IQPROMPT active.

Instruction adherence

+30%

The AI followed the original intent of the request significantly more reliably.

7.42 → 9.66

Actionability gain

+14%

Responses provided a clearer, more usable path forward for the person asking.

7.28 → 8.28

Overall output quality

9.44/10

Up from 8.91 raw. Quality held at near-ceiling gains do not come at a cost.

+0.53 points

Full Results

Raw vs. IQPROMPT side by side.

Each prompt was executed twice raw input versus IQPROMPT-enhanced input against the same model. The two biggest gains land on the two dimensions that matter most in real enterprise use: getting AI to give a usable answer, and getting it to actually follow the request.

Measure	Raw	With IQPROMPT	Gain
Overall output quality (out of 10)	8.91	9.44	+0.53
Actionability clear path forward	7.28	8.28	+14%
Instruction adherence followed intent	7.42	9.66	+30%
Share of prompts improved		83%

Domain Breakdown

Where it helps most.

Gains are strongest where prompts are typically underspecified. Finance showed the smallest gain only because it started with the highest raw quality a high-baseline effect, not a weakness of IQPROMPT.

Marketing+1.07 points

Highest gain prompts most commonly underspecified in this domain

Legal+0.82 points

Strong gain complex intent benefits significantly from structured execution

Health+0.65 points

Meaningful improvement professional review still required for regulated outputs

FinanceSmallest gain

Started with the highest raw quality baseline high-baseline effect, not a weakness

Stated Limits

What this study does and does not claim.

A sophisticated investor trusts an isolated-variable experiment far more than a headline number with no method. Volunteering limits before they are asked converts a potential concern into a credibility point.

One model cross-model validation is next. This benchmark ran on a single production model. The architecture is model-agnostic by design. Cross-model validation is the next benchmark run and can be executed on any model a qualified investor wants to see.

Functional quality, not factual accuracy. The study measures functional response quality not factual accuracy or regulatory compliance. Regulated-domain outputs still require professional review.

Single automated judge. Scoring used a single automated judge for this run. Multi-judge and human-panel validation are planned for subsequent benchmarks.

Why the Method Matters

Discipline is the signal.

A note for the sophisticated allocator

A sophisticated allocator trusts an isolated-variable experiment far more than a headline number with no method. Saying "same model, same task only our layer changed" signals discipline. The benchmark was designed with one variable: the IQPROMPT layer. Everything else was held constant. That is how you isolate what the technology actually contributes and that is the standard we hold ourselves to.

Next step

See it run on any model you choose.

Cross-model validation is the next benchmark. Request a walkthrough or submit your details to receive the full investor offering package.

Request investment details→Book a walkthrough with Bill Nakulski