Image to Text (OCR) — Free Online Tool

Extract text from JPEG, PNG, WebP images using Tesseract OCR. Supports English and Vietnamese. 100% browser-based — images never leave your device.

EnglishVietnameseNo uploadTesseract.jsEditable output

📝

Drag & drop an image containing text here or browse

JPEG, PNG, WebP — max 20 MB

Why ImgTools?

🔒

100% Private

Images are processed entirely in your browser. Tesseract WASM runs locally — nothing is sent to Google Vision, AWS Textract, or any OCR server.

🌏

Vietnamese support

Recognises Vietnamese text with diacritics at 85-95% accuracy on clear printed images. Also supports simultaneous English + Vietnamese recognition.

✏️

Editable text

Results appear in a textarea you can edit directly — fix any OCR mistakes before copying or downloading as .txt.

What is Image to Text?

Image to Text (or OCR — Optical Character Recognition) is a computer vision technique that detects and extracts written text from images. Instead of manually re-typing the contents of a photograph, screenshot, or scanned document, OCR analyses the pixels, identifies characters, and outputs editable text — saving hours of data entry for tasks like digitising old records, copying from slides, or extracting data from invoices.

ImgTools' OCR uses Tesseract.js — the JavaScript/WebAssembly port of Tesseract, an OCR engine originally developed at HP in 1985 and later open-sourced by Google. It has been trained on millions of text samples over decades. The critical difference from commercial APIs: the entire recognition pipeline runs inside your browser via WebAssembly. Your original image and the extracted text never leave your device — a guarantee that cloud-based services like Google Vision API, AWS Textract, or Microsoft Azure Computer Vision cannot offer, since they require uploading your image to their servers.

Typical use cases include digitising printed documents into Word, copying content from screenshotted slides, extracting numbers from paper receipts into Excel, reading licence plates and ID cards, grabbing recipes from cookbook photos, or simply copying text from a social-media screenshot you can't highlight.

✓Supports two major languages: English and Vietnamese (with diacritics) — can run both simultaneously for bilingual documents
✓Accuracy ranges from 85-95% on clean printed text with common fonts; handwriting and blurry photos score lower
✓Output appears in an editable textarea — no need to open an external editor to fix mistakes
✓Export results as UTF-8 .txt files compatible with Notepad, Word, Google Docs, or any text editor
✓Unlimited usage, no watermark, no account required, no sign-in
✓First run downloads the language model (~2 MB for English, ~10 MB for Vietnamese); subsequent runs hit the browser cache and are much faster
✓Works on all modern browsers — Chrome, Firefox, Safari, Edge — on both desktop and mobile, no app install needed

How to use

1
Drop or browse an image containing text (JPEG/PNG/WebP).
2
Pick the recognition language: English, Vietnamese, or both.
3
Click Recognize Text and wait a few seconds for Tesseract to process.
4
Copy the output or download as a .txt file.

When to use OCR

Paper documents

Photograph books, contracts, or forms — OCR extracts text you can paste into Word/Google Docs for easy editing, searching, or translation.

Presentation slides

Missed a slide in a meeting? Photograph it and run OCR to get the full text for minutes or revision notes.

Website screenshots

Pages that block Ctrl+C or screenshots from apps that don't allow text selection — OCR extracts the text straight from the pixels.

Invoices & receipts

Pull amounts, invoice numbers, and dates from photographed receipts into a spreadsheet without retyping each digit.

IDs & licence plates

Extract ID numbers, passport details, or vehicle plates from photos — useful for admin, logistics, or security workflows.

Comics & manga

Quickly translate foreign-language comics by running OCR on each panel, then pasting into Google Translate or DeepL.

How it works under the hood

Tesseract is an open-source OCR engine originally developed at HP in 1985 and open-sourced by Google in 2006. Version 4 onwards uses an LSTM (Long Short-Term Memory) neural network to recognise text line-by-line instead of character-by-character — producing dramatically more accurate results for connected scripts and diacritic-heavy languages like Vietnamese. Each language ships as a .traineddata file containing the pre-trained model weights.

Tesseract.js is the browser port, compiled from the original C++ code to WebAssembly. When you click Recognize, the browser downloads the .traineddata file from the jsdelivr CDN (first run only), spins up a Web Worker running the WASM binary in a separate thread to keep the UI responsive, and pipes your image through the pipeline: preprocessing (grayscale, threshold), line and word segmentation, then character recognition via LSTM. Everything runs inside your browser tab — you can verify in Chrome DevTools → Network that no request contains your image data.

Accuracy depends heavily on input quality. With black text on a clean white background, common fonts, character height above 30 pixels, and no blur or tilt — Tesseract reaches 90-95% for English and 85-92% for Vietnamese. On tilted, occluded, or low-resolution images, accuracy can drop to 60-70%. For best results, capture images with good lighting, keep the camera square to the document, and use ImgTools' Rotate & Flip tool to straighten tilted photos before OCR. Handwritten text is largely unsupported — only very neat printed-style handwriting (engineering drawings, block capitals) has any chance of being recognised.

OCR Frequently Asked Questions

Is OCR 100% accurate?

No OCR engine — including Google Vision or AWS Textract — is 100% accurate. Our Tesseract implementation reaches 85-95% on clear printed images. Blurry photos, tilted pages, unusual fonts, and noisy backgrounds all reduce accuracy. That's why the output appears in an editable textarea — you should review and correct any mistakes before using the text for anything important.

Does it recognise handwriting?

Tesseract is trained on printed text, not cursive handwriting. Normal handwritten text is essentially unreadable to it. Very neat printed handwriting — block capitals like you'd find on engineering drawings — may be recognised at 40-60% accuracy. For genuine handwriting OCR, newer AI-based services like Google Cloud Vision or Azure Computer Vision perform much better, but require uploading your image to their servers.

Why is the first run slow?

On first run, the browser downloads the .traineddata language model from the CDN: ~10 MB for Vietnamese, ~2 MB for English. Download plus Web Worker initialisation takes 5-10 seconds depending on connection speed. Once cached, subsequent images are processed directly — usually under 3-5 seconds for an A4-sized image.

Are my images uploaded anywhere?

No. Tesseract.js runs 100% in your browser via WebAssembly. The only network request is fetching the public .traineddata file from the CDN — this file is identical for everyone, containing no user data. You can verify by opening Chrome DevTools → Network tab while running OCR: no request will contain your image payload.

What's the max image size?

We limit uploads to 20 MB per image. Larger images can slow down or crash the browser due to the memory needed to process all those pixels. If your original is over 20 MB, use ImgTools' Compress tool first — text recognition quality is barely affected by 80% quality compression.

Does it support Japanese, Korean, Chinese, French, German?

The current version supports only English and Vietnamese. Tesseract itself can handle 100+ languages (including Japanese, Korean, Chinese, Russian, French, German) — we'll add more based on demand. If you need another language, please reach out via phanmemtonghop.com.

Does it preserve line breaks and layout?

Yes, Tesseract attempts to preserve paragraph and line structure in the output. However, tilted or curved pages (like photos of thick books) can produce scrambled layouts — you'll need to fix these manually in the textarea. Multi-column documents (like newspaper scans) are read column-by-column, which may need rearrangement after OCR.

Can I use this commercially?

Yes. ImgTools is 100% free for personal and commercial use — unlimited images, no watermark, no attribution required. Tesseract.js is released under the Apache 2.0 licence, so you could even self-host if needed. However, for very high volume (thousands of images per day), consider deploying a server-side Tesseract setup to leverage multi-core CPU parallelism.