How accuracy works
No single machine reading of nineteenth-century print can be trusted — so we never rely on one. This page explains, plainly, how a gazette page becomes searchable text, what the confidence ratings mean, and what we do when we're wrong.
Gazette pages are dense, mixed and worn: tight two-column setting, broken and battered type, pounds-shillings-pence, fractions like ½ and ¼, and tables that run to dozens of columns. Ordinary OCR makes two distinct kinds of mistake on them, and they need different cures.
Character errors turn Geelong into Geeiong — the reading is in the right place, but the letters are wrong, so a search for the right spelling finds nothing.
Order errors are worse. Most engines read a two-column page straight across the gutter, welding the columns into nonsense:
NOTICE is hereby given, that the TENDERS will be received at this Office partnership lately subsisting between until noon on Tuesday…
NOTICE is hereby given, that the partnership lately subsisting between William Brown and… — then the next column.
A welded page can still look accurate — nearly every word is correct — but the sentences are shuffled and any quotation from it is wrong. Catching this kind of error is most of the work.
Every page is read twice, by two engines with opposite strengths, and each is trusted only for what it is good at.
The two readings are then aligned and compared, word by word, against the position of every word on the page. Where they agree, the word is kept. Where they disagree, the conflict is recorded — never silently guessed away — and the page's score reflects it.
Each page also carries an honest agreement score from this comparison. A page where the two readings diverge isn't shipped as if all were well — it falls back to the safest available text and is rated accordingly.
Every notice carries a confidence rating, scored by what was actually measured — how strongly the readings agreed, and how cleanly the notice was located on the page.
The readings agree and the notice is cleanly located. Quote it — and the citation will take you to the original anyway.
Mostly agreed, with recorded conflicts — a worn page, a hard name, a smudged figure. Sound for discovery; check the scan before you rely on a detail.
The readings genuinely diverged, or the page resisted both engines. We say so, and show you exactly where to look.
The same honesty applies to matching: a possible-but-unproven match is labelled as exactly that, in cautious language — never silently promoted to a fact.
Nothing in the Record floats free of its source. Every notice links to its issue, date, page, and the exact region of the original scan — the scan itself is never altered, and every claim we make can be checked against it. How to quote and cite what you find is set out at Cite & sources.
Machine reading is good and it is imperfect. Important findings should be verified against the original page — which is why we put it beside every transcription, not behind a request form.
If you find an error, tell us: office@colonialrecord.com.au. A corrected reading enters the Record the way everything else does — on the evidence of the original page. And if an unlock gives you the wrong record, one email inside seven days gets a refund or a re-credit; no forms, no questions.
See it for yourself — every search result shows its confidence rating and its source before you pay anything.