Extract Text from a Scanned PDF (Free, No Software)
To extract text from a scanned PDF, you need an OCR tool that can read the image and convert it to selectable text — and you can do it for free in under a minute. This guide walks you through three methods: an online tool, the Google Drive trick, and the Microsoft OneNote workaround.
Why You Cannot Copy Text from a Scanned PDF
If you have ever tried to highlight text in a scanned PDF and gotten nothing, you are not alone. The reason is straightforward: a scanned PDF is not really a text document. It is a picture.
When you scan a paper document, the scanner takes a photograph of each page. That photograph gets wrapped inside a PDF file. Your PDF reader shows you something that looks like a text document, but underneath it is just pixels — the same as a JPEG or PNG image.
That is why Ctrl+C does nothing. There are no characters for your computer to grab. It only sees dots of light and dark arranged in patterns that look like letters to your eyes, but mean nothing to software.
The solution is OCR (optical character recognition). OCR software analyzes those pixel patterns, identifies letters and words, and outputs real, selectable text. Every method in this guide uses some form of OCR.
Method 1: Extract Text with PDFPulp (Fastest)
The quickest way to extract text from a scanned PDF is to use an online OCR tool. PDFPulp handles the conversion in seconds with no software to install and no account to create.
- Open the PDFPulp text extraction tool.
- Drop your scanned PDF into the upload zone (or tap to select it on mobile).
- Wait a few seconds while OCR processes your file.
- Copy the extracted text or download it as a plain text file.
That is the entire process. Three steps, no signup, no watermarks.
PDFPulp gives you 5 free operations per day with a 100MB file size limit. Files are automatically deleted after 24 hours, so your documents stay private. There are no ads on any page.
If your scanned PDF is large and slow to upload, you might want to compress it first to speed things up.
Method 2: Google Drive OCR Trick
Google Drive has a hidden OCR feature built into Google Docs. It is not obvious, but it works well for simple scanned documents.
- Go to Google Drive and sign in.
- Upload your scanned PDF to Drive.
- Right-click the uploaded file and choose Open with > Google Docs.
- Google automatically runs OCR on the PDF and opens an editable document.
- Copy the text you need from the Google Doc.
This method works best on single-page or short documents with clear, printed text. It can struggle with multi-column layouts, tables, and documents longer than 10 pages. The formatting often comes through messy — expect to spend time cleaning it up.
The main advantage is that you probably already have a Google account. The downside is that your document gets uploaded to Google's servers and stays in your Drive until you delete it.
Method 3: Microsoft OneNote
If you already use Microsoft Office, OneNote has a lesser-known OCR capability that works on scanned PDFs.
- Open Microsoft OneNote (the desktop version, not the Windows Store app).
- Go to Insert > File Printout and select your scanned PDF.
- OneNote imports each page as an image.
- Right-click on the imported image and select Copy Text from Picture.
- Paste the extracted text wherever you need it.
This method works entirely offline, which is useful if you are handling sensitive documents and prefer not to upload them. However, it requires a Microsoft Office installation, which means it is not free unless you already have a license.
OneNote handles single-page documents well but can be tedious for longer files since you need to right-click and copy from each page image individually.
How to Get the Best OCR Results
OCR accuracy depends heavily on the quality of your scan. Here are a few things that make a difference:
Scan resolution matters. A scan at 300 DPI or higher produces much better OCR results than a 150 DPI scan. If you control the scanning process, always use 300 DPI.
Straighten the pages. Skewed or rotated pages confuse OCR engines. If your scan came out crooked, rotate the PDF before running text extraction.
Clean scans beat messy ones. Coffee stains, pen marks, sticky notes, and faded ink all reduce accuracy. If possible, scan from a clean original.
Printed text works best. OCR handles standard printed fonts with near-perfect accuracy. Handwriting, decorative fonts, and very small text are much harder. Expect to proofread more heavily with these.
Single-column layouts are easiest. Multi-column documents, forms with checkboxes, and tables often produce garbled output. You may need to split the PDF into individual pages and process sections separately.
Scanned PDF vs. Native PDF: How to Tell the Difference
Not sure whether your PDF is scanned or native? Here is a quick test:
Open the PDF and try to click on any word. If you can highlight individual words and sentences, you have a native (text-based) PDF. Standard copy and paste will work fine — no OCR needed.
If clicking and dragging selects nothing, or if it selects the entire page as one big block, you have a scanned (image-based) PDF. That is when you need one of the methods above.
Some PDFs are a mix of both. A document might have native text on some pages and scanned images on others. This happens when someone scans part of a document and merges it with digital pages. In that case, OCR will handle the scanned pages while the native text pages stay intact.
If your PDF is actually native text and you just need to pull the content out, PDFPulp's extract text tool works on both types. You do not need to figure out which type you have before uploading.
When to Use Each Method
| Method | Best for | Drawbacks |
|---|---|---|
| PDFPulp | Quick jobs, no account needed, works on mobile | 5 free operations per day |
| Google Drive | Short documents when you already have Drive open | Messy formatting, uploads to Google servers |
| OneNote | Offline use with sensitive documents | Requires Office license, tedious for long files |
For most people, an online tool is the fastest option. You upload, wait a few seconds, and get your text. If you process PDFs regularly, it is worth bookmarking PDFPulp's extract text page so it is one click away.
If the extracted text is part of a larger PDF workflow — say you need to pull text from a report, then reduce the file size before emailing it — check out our guide on how to compress a PDF for email. And if you are curious about why your PDF files are so large in the first place, the PDF file size guide covers the most common causes.
Frequently Asked Questions
Why can't I copy text from a scanned PDF?
A scanned PDF is just a photograph of the page stored inside a PDF wrapper. Your computer sees pixels, not characters. You need OCR (optical character recognition) to convert those pixel patterns into actual text you can select and copy.
Is OCR accurate on scanned documents?
Modern OCR is very accurate on clean, high-resolution scans — typically 95-99% for printed text. Accuracy drops with handwriting, low-resolution scans, unusual fonts, or skewed pages. Always proofread the extracted text before using it.
Can I extract text from a scanned PDF for free?
Yes. PDFPulp lets you extract text from PDFs for free with no account required. You get 5 free operations per day and files up to 100MB. Google Drive also offers free OCR when you open a PDF with Google Docs.
What file types can OCR process?
OCR works on any image-based document — scanned PDFs, photographed documents, screenshots, and image files like PNG or JPEG. If the content started as a picture rather than typed text, OCR is the tool you need.
Does extracting text from a scanned PDF change the original file?
No. Text extraction reads the content and outputs it separately. Your original scanned PDF stays exactly as it was. PDFPulp auto-deletes uploaded files after 24 hours for privacy.
What is OCR and how does it work?
OCR stands for optical character recognition. It scans an image pixel by pixel, identifies shapes that match known letter and number patterns, and outputs the corresponding text. Modern OCR engines use machine learning to handle varied fonts, sizes, and page layouts with high accuracy.
Can I extract text from a handwritten PDF?
You can try, but results vary widely. OCR works best on neat, consistent handwriting with dark ink on a clean background. Cursive, messy handwriting, or low-contrast scans often produce garbled output. For handwritten documents, always proofread the extracted text carefully.
Why does my extracted text have errors?
OCR errors usually come from low scan quality, skewed pages, unusual fonts, or background noise like stains and watermarks. Scanning at 300 DPI or higher, straightening pages before processing, and using clean originals all reduce errors significantly.
Can I extract text from a password-protected PDF?
You need to remove the password protection first. OCR tools cannot process a PDF they cannot open. If you know the password, open the file in any PDF reader, remove the security settings, save it, and then run text extraction on the unprotected version.
How do I extract text from a PDF in a language other than English?
Most modern OCR engines support dozens of languages, including those with non-Latin scripts like Chinese, Arabic, and Hindi. PDFPulp handles multilingual documents automatically. For best results, make sure the scan is high-resolution and the text is clearly printed.
Can I extract tables from a scanned PDF?
OCR can read the text inside table cells, but it does not preserve the table structure. The output is plain text, so rows and columns often merge into a single block. For structured table data, you may need to clean up the output manually or use a specialized table extraction tool.
Ready to extract text from your scanned PDF? Drop your file into PDFPulp and get selectable text in seconds. It's free (5 operations per day), requires no account, and adds no watermarks.