Restructure PDF. Ready for AI.

PDF2Brain automates extraction, digitization, and restructuring of complex PDFs — reports, tables, and scans — into clean Markdown, Excel, and Word. Turn static documents into AI-ready data. No software install.

AI PDF Extraction — Markdown, Excel & OCR

Why PDF2Brain for PDF Data Liberation?

Frequently asked questions

What is OCR?
OCR (Optical Character Recognition) lets computers read and recognize characters from images, scanned documents, or PDFs. This tool uses AI to turn visual documents into editable text.
How do the processing modes differ in accuracy?
"Fast" matches the --super-fast preset (30 workers, 1024px images). "Balanced" is recommended (8 workers, 1536px). "Accurate" prioritizes quality (sequential, 2048px images, max-tokens 4500).
Are there file upload size limits?
PDF files up to 50MB are supported. Documents with 1–500 pages work best. Very large files can be split for better results.
What output format do I get?
Choose PDF → Markdown for .md output, PDF → Word for .docx, or Extract tables → Excel for .xlsx workbooks.
Is my uploaded data stored or deleted?
Your data is processed temporarily only. Source files are deleted after processing completes. We do not store or share your data.

PDF2Brain

Deep PDF data platform

Restructure PDF.

Ready for AI.

Automate extraction, digitization, and restructuring of complex PDFs — reports, tables, and scans — into clean data formats. Turn static documents into sources humans and AI can use.

  • Markdown
  • Excel
  • Word
Explore tools

Two pillars, one platform

AI extraction when you need data. Lightning-fast utilities when you need files done.

Lean. Fast. Smart.

Automate extraction, digitization, and restructuring — from scanned reports to structured tables, ready for humans and AI.

Vision-language OCR

Understands full pages, not just character-by-character OCR.

Natural reading order

Multi-column layouts, tables, and side notes in human-like flow.

Math & handwriting

Formulas → LaTeX; works on scans and handwritten notes.

Affordable pay-per-use

Free credit on signup. Pay only for what you use.

See the difference

Real output from our OCR — a PDF with tables and math, converted to Markdown.

Before — PDF

magic.pdf

Original PDF with tables, multi-column layout, and mixed formatting. · Scroll to explore

After — Markdown

magic.md

Page 1

The magic of Prince

Prince is a computer program that converts XML and HTML into PDF. It is simple, yet very powerful, and it creates beautiful documents. The purpose of this small document is to showcase the formatting magic Prince can do. We have chosen to highlight eighteen of our favorite features. This document is written in HTML and converted to PDF by Prince. The source file is a compact 13k document, including the embedded CSS, SVG and MathML.

TABLE OF CONTENTS

Hyphenation .................... 1Crop and cross marks..... 2
Rounded borders ............... 1Cross-references ........... 2
Character substitution ......... 1Math....................... 2
HTTP support................... 1Footnotes.................. 2
Web fonts ...................... 1Leaders.................... 2
Image resolution .............. 1SVG........................ 2
Columns ....................... 1Page folds.................. 2
CMYK colors ................... 1Headers and footers ........ 2
Counters ...................... 2PDF bookmarks.............. 2

#1: Hyphenation

Prince 6 supports automatic hyphenation which can break words across several lines, adding a hyphen at the word break. Hyphenation is controlled with a set of experimental CSS properties, and hyphenation patterns for different languages can be supplied. Notice how the text in this document is hyphenated.

#2: Rounded borders

CSS3 introduces support for rounded borders. In the table below, some of the corners have been rounded. On purpose, the bottom right corner has an asymmetrical shape.

fruitcomputer
appleyesyes
orangeyesno

#3: Character substitution

It's sometimes convenient to replace one character with another without changing the source document. For example, the apostrophe character is easily found on keyboards, but in print it's common to replace it with a quotation character. Notice how Prince 6 has replaced the apostrophes in this paragraph.

#4: HTTP support

Prince 6 has built-in HTTP support and can fetch pages, images, DTDs and style sheets from the web. The image to the left was automatically fetched when the PDF version was generated.

#5: Web fonts

Prince 6 can fetch fonts from the web and use them without installing them on your system. The fonts used in the PDF version of this document are web fonts. We are grateful to Ray Larabie, Dieter Steffmann, and Red Hat for making high-quality fonts freely available.

#6: Image resolution

Sometimes images should be scaled to a certain resolution, rather than to an absolute size. In Prince, you can set the resolution of an image as a property in the style sheet. The smiley face in the previous section was scaled this way. The smiley also represents the challenging Acid2 test, which Prince 6 passes.

#7: Columns

Columns are commonly used a two-column layout used on paper and Prince supports multi-column layout between. The width of the gap and the style of the rule is set in the style sheet. This is for demonstration. Also, this section

#8: CMYK colors

Printers don't use RGB colors, they mostly use CMYK: cyan, magenta, yellow and black. Prince 6 can read CMYK colors and will use them, if present. The heading above this paragraph has both an RGB color (red) and a CMYK color (blush). Therefore, the text is red in browsers, but blue in the PDF version. This is for demon-

1

Page 2

The magic of Prince

www.princexml.com

stration purposes; normally the colors would be close to each other.

#9: Counters If you are reading the XHTML source code of this document, look for the h2 elements. You will notice that they contain the text of the headings, but not their number. The list item number, including the "#" and "—" are automatically generated by the style sheet. Generated content and counters are especially useful for complex documents. They are described in CSS level 2.1.

#10: Crop and cross marks In printing, crop marks are used to indicate where the printed paper should be cut. Cross marks are used to align prints of different colors to improve color reproduction. Prince 6 adds support for crop and cross marks, and the PDF version of this document includes both.

#11: Cross-references Prince can read hyperlinks inside a document and generate page numbers accordingly. For example, it will automatically find out which page Headers and Footers are discussed on (page 2). Cross-references are used to generate the Table of contents (page 1).

#12: Math Prince 6 provides experimental support for MathML. Here is an example:

maps to\text{maps to} xy=fx(x)=(1+1x)xx \longrightarrow y = f_x(x) = \left(1 + \frac{1}{x}\right)^x

#13: Footnotes Footnotes1^1 are essential in printed documents and Prince knows how to generate them. Unlike what some people think, footnotes are not the place to put information you don't want to see. More often, footnotes will actually attract attention. 9 of 10 readers will read the footnotes before they read the text from where the footnotes are anchored2^2.

#14: Leaders Leaders consist of dots or dashes in a row leading the eye across a page. For example, the Table of contents has leaders in it. The leaders are not found in the document itself, but rather in the style sheet.

#15: SVG Scalable Vector Graphics (SVG) is a language for describing two-dimensional graphics for the web. SVG images scale better than traditional bitmapmed images and are suitable for printing. The crown is generated by two SVG elements.

#16: Page floats On paged media, elements can be set float to the top or bottom of pages. The big URL at the top of this page comes after this paragraph in the source code, but is floated to the top by the style sheet.

#17: Headers and footers Printed documents often have page headers and footers. Printed documents often have page headers and footers. For example, page numbers are often printed at the bottom of the page, and the document title is shown at the top — except on title pages.

#18: PDF Bookmarks Prince will automatically generate PDF bookmarks from heading elements in HTML. The feature is set with a property in the style sheet, and can also be used with other markup languages.


  1. A footnote is a note placed at the bottom of a page of a book or manuscript that comments on or cites a reference for a designated part of the text.
  2. Often, the most interesting information is found in the footnotes.
<center>2</center>

LLM-ready Markdown with headings, tables, footnotes, and LaTeX math preserved. · Scroll to explore

Security & privacy

Your documents stay yours.

Temporary processing

Uploaded files are processed for your session only. Output files are removed after download — not stored long-term on our servers.

Your data is not used for training

We do not use your document content to train AI models.

No third-party sharing

We do not sell or share your files. Inference runs through our API provider solely to process your request.

Simple, transparent pricing

Pay per page for AI extraction — see full rates for every tool.

Cost-optimized extraction

Extract PDFs to Markdown & Word

Built to extract PDF data at the lowest practical cost — pay-as-you-go, no subscriptions, no GPU rental. Sign in with Google for trial credit.

Batch utilitiesFree
Batch MergeCompress PDFSplit PDF
AI PDF extractionPer credit
PDF → Markdown / WordTable Extraction → ExcelBatch Redact
View full pricing

FAQ

Quick answers before you start.

Ready to liberate your PDF data?

Sign in with Google and get free credit instantly.

Explore tools