Field note

AI vs. construction math

Asking a chatbot "how many bags of concrete for this slab?" feels fast — until the truck shows up short. Independent testing shows today's leading AI models get everyday math right only 45–63% of the time. Here's why VibeCalc runs fixed, human-checked formulas instead — and exactly where AI still earns its place on the job.

Use a real calculator See the research

What the research found

In 2025 the calculator company Omni Calculator published the ORCA benchmark (Omni Research on Calculation in AI): 500 real-world calculation problems spanning finance, physics, health, statistics, and engineering & construction. They ran leading models — including ChatGPT, Gemini, and Claude — against every problem and graded the answers. The result is not reassuring if you're ordering materials off a chatbot's number:

63% Best model's accuracy. No model tested scored higher.

~35% Of errors were rounding mistakes — the kind that quietly skew a takeoff.

~33% Were outright calculation errors in otherwise sensible-looking answers.

A 2026 update added an instability measure: ask the same model the same question several times and you can get different answers. That's the opposite of what you want when two estimators on the same crew need to land on the same material order.

Source: Omni Calculator, "The ORCA Benchmark" (2025) and the ORCA V3 report (2026); preprint arXiv:2511.02589. VibeCalc cites this independent third-party research and has not re-run the study. Figures are approximate and will shift as models improve.

Why a few percent matters on a real job

On paper, "63% accurate" sounds like a passing grade. On a jobsite it means roughly one in three answers is wrong — and a wrong quantity isn't a typo, it's a short pour, a second supply run, or a pallet of material you can't return. Take a simple slab:

Slab: 20 ft × 24 ft × 4 in thick

Right way: 20 × 24 × (4 ÷ 12) = 160 cu ft → 160 ÷ 27 = 5.93 cu yd
Common AI slip: treat 4 in as 0.4 ft → 20 × 24 × 0.4 = 192 cu ft → 7.1 cu yd

That single bad unit conversion over-orders concrete by about 20% on one slab — hundreds of dollars, or a half-empty truck you still pay for. The math itself is trivial; the failure is in reliably applying it every time. That's exactly what a fixed formula does and a language model doesn't guarantee.

How VibeCalc is different

Deterministic

Same inputs, same answer — every time, on every device. No token-by-token guessing, no run-to-run drift.

Shown, not hidden

The formula is printed on the calculator page with a worked example. You can check it against your own method before you trust it.

Unit-honest

Inches stay inches and millimetres stay millimetres. Conversions are handled in code, not improvised in prose.

Where AI still earns its place

This isn't an anti-AI pitch. AI is genuinely useful on the trades — for explaining a code clause in plain language, drafting a client email, summarising a spec sheet, or talking through which calculator you even need. Use it for the words. Just don't let it be the last step before you place a material order — pull the final number from a calculator that shows its work.

Pull the number from a real calculator

Open

Concrete Calculator

Slab and footing volume, plus 80lb / 60lb / 25kg / 20kg bag counts.

Open

Wall Framing Calculator

Stud counts at 16″, 24″, 400 mm, and 600 mm on-center.

Open

Rafter Length Calculator

Common rafter length from span and pitch, by Pythagoras.

Open

Roof Pitch Calculator

Convert rise and run to pitch, angle in degrees, and grade.

Android App

Need jobsite math hands-free?

The Android app adds the jobsite workflow: voice input, offline use, material lists, PDF export. Free, no ads, no accounts.

Get the Free App Free. Built for Android phones. Donations welcome.

Saved you a mistake? Support VibeCalc

AI & construction math FAQ

Can ChatGPT or other AI chatbots do construction math reliably? +

Not for the final number you buy against. Independent testing (the ORCA benchmark) found leading models answered everyday calculation problems correctly only about 45–63% of the time, with errors mostly from rounding and basic calculation slips. For estimating concrete, framing, or rafters — where a wrong number costs a return trip or wasted material — a fixed formula is safer.

What is the ORCA benchmark? +

ORCA (Omni Research on Calculation in AI) is an independent study, published in 2025 and updated in 2026, that tested leading AI models on 500 real-world calculation problems across finance, physics, health, engineering, and construction. No model scored above 63%. A later version added an "instability" measure showing models can give different answers to the same question on different tries.

Does VibeCalc use AI to calculate? +

No. Every VibeCalc calculator runs a fixed, human-written formula directly in your browser. The same inputs always produce the same output, and the formula is printed on the calculator page so you can verify it yourself. There is no language model guessing in the loop.

Is an online calculator more accurate than an AI chatbot for measurements? +

For a defined calculation — volume, count, length, area — yes. A calculator applies one deterministic formula every time. An AI chatbot generates an answer token by token and can mis-round, drop a unit conversion, or vary between runs. Use AI to explain a concept; use a calculator for the quantity you order.