Yes. OCR Buddy is free and open source under the MIT license. There is no account, no subscription and no telemetry.

Why not use a large AI vision model for OCR?

Large generative OCR models predict the next likely token, so when the image is unclear they invent fluent but wrong text. OCR Buddy uses the classic detection plus CTC-recognition family, which transcribes the glyphs that are actually present and fails to blanks or low-confidence output instead of fabricating a sentence.

What can OCR Buddy read?

Plain text and code (including code in a paused video or a PDF), single equations converted to LaTeX, and single tables converted to a Markdown grid — including borderless tables. You can capture a region, the whole viewport in one click, or scroll-capture an entire page. It also reads coloured text on light backgrounds, such as red error messages or blue links. Latin scripts are built in; on-demand language packs add Chinese, Japanese, Cyrillic, East Slavic, Greek, Korean, Thai, Devanagari, Tamil and Telugu. You can also OCR any image directly: open, paste, drag-drop, or right-click it on a page.

OCR Buddy — Faithful, 100% Local OCR for Chrome (Code, Formulas & Tables)

Why it's different

Classic OCR. Not generative OCR.

Big vision-language models top the benchmarks — then invent fluent, plausible, wrong text the moment the pixels get unclear. For code, numbers, IDs or prices, a confidently-wrong transcription is worse than none. OCR Buddy makes the opposite bet.

Generative OCR

Predicts the next likely token

Falls back on a language prior when the image is ambiguous
Writes something that reads well but isn't there
Far too heavy to run inside a browser tab

Hallucination here is architectural — not a bug you can prompt away.

OCR Buddy

Detection + CTC recognition

Has no language prior — it transcribes the glyphs that are actually present
When it can't read, it fails to blanks or low-confidence — never a made-up sentence
Small and fast enough to run comfortably in the browser

In-browser and no-hallucination aren't a tradeoff — both constraints select the same stack: PP-OCRv5 on ONNX Runtime Web.

How it works

The whole pipeline, on your device.

A region, the viewport, or a whole scrolling page hands a clean crop to a warm OCR engine running in an offscreen worker — coordinated, never uploaded. Or skip OCR entirely and export the page as Markdown.

01

Pick a source

Drag a region, grab the whole viewport in one click, or scroll-capture an entire page. The overlay is passive — it never reads page content.

02

Captured cleanly

A composited screenshot is cropped on an offscreen canvas — even from a paused cross-origin video.

03

Recognized locally

PP-OCRv5 runs on ONNX Runtime Web — WebGPU when available, multi-threaded WASM as fallback.

04

Verify & copy

The crop sits beside the text; low-confidence words are flagged. Edit, then copy.

Three modes

Pick how a region should be read.

And change your mind after capturing — the “Read as” switcher re-runs a different mode on the same crop, no re-selecting.

Text / Code

Code, prose, or any text

Inter-word spacing and blank lines are reconstructed from box geometry — the recognizer emits no space token, so layout is rebuilt, not guessed. A Code view restores indentation and syntax-highlights it.

Code view

while (queue.length > maxInflight) {
  const chunk = queue.shift();
  device.submit(chunk);
}

Formula → LaTeX

One equation, into LaTeX

The one place a generative model is unavoidable — so the guardrail is visual. The LaTeX is rendered with KaTeX right beside the source crop; if it can't render, OCR Buddy abstains and shows the image.

Rendered · verify against crop

softmax(QK^T√d_k)V \frac{QK^{\top}}{\sqrt{d_k}}

Table → Markdown

One table, into a grid

Rebuilt by pure geometry from the word boxes — rows by vertical position, columns from an x-coverage profile. Because it keys off alignment, not ruled lines, it handles borderless tables too.

Markdown table

Model	Size	License
PP-OCRv5 det	4.7 MB	Apache-2.0
Latin rec	8 MB	Apache-2.0

Faithful by design

You always see the source.

Anti-hallucination isn't a tagline — it's the feature set. The captured crop sits right above the extracted text, the cheapest possible check. If the model isn't confident about a word, it says so instead of guessing.

Source crop shown beside the result, every time
Per-word confidence — low scores underlined, never silently trusted
A blank or ambiguous region yields empty output — never invented filler

Source · captured region

Captured region with the words maxInflight and WebGPU rendered ambiguously

Backpressure is applied when the queue exceeds maxInflight, so the consumer never overruns the WebGPU device.

2 words flagged low-confidence

Built right

A small, honest tool — by choice.

A full page-layout “Document mode” and a heavyweight formula library both shipped, then were removed on purpose: layout models need a whole page of context and misread single crops, and the library corrupted the formula decode. Keeping them out is part of the design.

Column-aware reading order, so two-column papers don't interleave
Homoglyph fold maps stray look-alikes back to Latin — 4o0 → 400
Capture works on paused cross-origin video — no tainted-canvas failures

Extracted text

function flush(queue, maxInflight) {
  // backpressure: never overrun the device
  while (queue.length > maxInflight) {
    const chunk = queue.shift();
    device.submit(chunk);
  }
  return queue.length;
}

Accuracy, honestly

Essentially perfect on what each mode is for.

Measured with the exact PP-OCRv5 config the extension ships, against ground truth on real academic pages.

99.9/100

character accuracy on a coherent text block — the normal “select a region” workflow

scripts/ocr-image-test.mjs · Node / CPU

Clean prose is effectively verbatim

Sentences, citations like [22] and tokens like RoPE-2D, all correct.

Grab a paragraph + a table together and the score drops

That's reading-order interleaving, not misrecognition — the characters are right, the order isn't. Select one region to restore it.

Equations and tables aren't text

Use Formula and Table modes for those — Text/Code mode flattens them. No “100% OCR of anything” claims here.

Private by architecture

Nothing leaves your device.

There is no server. The OCR models are bundled in the extension and run in an offscreen worker, so even first-run inference is fully offline. The only network use is downloading the extension itself — and, only if you explicitly pick a non-Latin language pack, a one-time model download that's cached locally (no page content or image ever rides along).

No servers, no API calls, no telemetry
Models bundled — works fully offline (optional language packs cached after one download)
Screenshot permission requested explicitly, per-site, only when needed

On-device OCR

Detection & recognition in a local worker

ON

Network upload

No images or text ever sent out

OFF

Tracking & telemetry

No account, no analytics

OFF

Open & bundled

Built on excellent open-source work.

All models ship inside the extension and run on-device. Permissive licenses throughout — no copyleft anywhere in the stack.

Model	Role	License
`PP-OCRv5 mobile det` · ~4.7 MB	Text detection	Apache-2.0
`latin PP-OCRv5 rec` · ~8 MB	Latin text recognition (CTC)	Apache-2.0
`mfr_encoder / decoder` · ~53 MB	Formula → LaTeX (pix2text-mfr)	MIT
Language packs · ~8–17 MB each	On-demand recognition: zh+ja, Cyrillic, East Slavic, Greek, Korean, Thai, Devanagari, Tamil, Telugu — downloaded once, cached locally	Apache-2.0

Vite + CRXJS Manifest V3 ONNX Runtime Web WebGPU / WASM KaTeX highlight.js Chrome 124+

FAQ

Questions, answered plainly.

Does my data ever leave my device?

No. There's no server and no API calls. The OCR models are bundled in the extension and run entirely on your device — even the first run is fully offline. The only network use is downloading the extension itself — and, only if you explicitly pick a non-Latin language pack, a one-time model download that's cached locally. No page content or image ever rides along.

Is it really free?

Yes — free and open source under the MIT license. No account, no subscription, no telemetry.

Why not use a big AI OCR model?

Generative models predict the next likely token, so when the image is unclear they invent fluent but wrong text. OCR Buddy uses classic detection + CTC recognition, which transcribes the glyphs that are present and fails to blanks instead of fabricating a sentence.

What can it read?

Plain text and code (including code in a paused video or a PDF), single equations converted to LaTeX, and single tables converted to a Markdown grid — borderless tables included. Capture a region, the whole viewport in one click, or scroll-capture an entire page. It even reads coloured text on light backgrounds, like red error messages or blue links. Latin scripts are built in; on-demand packs add Chinese, Japanese, Cyrillic, East Slavic, Greek, Korean, Thai, Devanagari, Tamil and Telugu. You can also OCR any image directly — open, paste, drag-drop, or right-click it on a page.

Can it export a whole page as Markdown?

Yes. Page → Markdown turns the current page into clean Markdown built from its own structure — headings, lists, links, tables, code blocks — not from OCR, so it's faithful and ready to paste into an LLM. You get a preview to copy or download as a .md file, and it runs entirely on your device.

Does it work offline?

Yes. The bundled models ship inside the extension, so the default Latin experience works with no network connection at all. Optional non-Latin language packs download once when you select them and are cached locally — offline from then on.

Which browsers are supported?

Chrome 124 or newer (WebGPU in workers). On devices without WebGPU it falls back to multi-threaded WebAssembly with identical results.

Grab text from anything on screen.

Streaming pipeline limits