Does it work on scanned PDFs and books?

Yes. PDF2Brain uses vision-language OCR (olmOCR) — it understands full pages including scans, photos, tables, and handwriting.

Are my files stored on your servers?

No long-term storage. Files are processed temporarily and removed after your session completes or you download the result.

How does billing work?

Sign in with Google to receive free signup credit. Usage is deducted per token as you process pages. View detailed spending on the Usage page.

What output formats are supported?

PDF → Markdown, PDF → Word (.docx), and Extract tables → Excel (.xlsx). More tools are coming soon.

Page 1

The magic of Prince

Prince is a computer program that converts XML and HTML into PDF. It is simple, yet very powerful, and it creates beautiful documents. The purpose of this small document is to showcase the formatting magic Prince can do. We have chosen to highlight eighteen of our favorite features. This document is written in HTML and converted to PDF by Prince. The source file is a compact 13k document, including the embedded CSS, SVG and MathML.

TABLE OF CONTENTS


Hyphenation .................... 1	Crop and cross marks..... 2
Rounded borders ............... 1	Cross-references ........... 2
Character substitution ......... 1	Math....................... 2
HTTP support................... 1	Footnotes.................. 2
Web fonts ...................... 1	Leaders.................... 2
Image resolution .............. 1	SVG........................ 2
Columns ....................... 1	Page folds.................. 2
CMYK colors ................... 1	Headers and footers ........ 2
Counters ...................... 2	PDF bookmarks.............. 2

#1: Hyphenation

Prince 6 supports automatic hyphenation which can break words across several lines, adding a hyphen at the word break. Hyphenation is controlled with a set of experimental CSS properties, and hyphenation patterns for different languages can be supplied. Notice how the text in this document is hyphenated.

#2: Rounded borders

CSS3 introduces support for rounded borders. In the table below, some of the corners have been rounded. On purpose, the bottom right corner has an asymmetrical shape.

	fruit	computer
apple	yes	yes
orange	yes	no

#3: Character substitution

It's sometimes convenient to replace one character with another without changing the source document. For example, the apostrophe character is easily found on keyboards, but in print it's common to replace it with a quotation character. Notice how Prince 6 has replaced the apostrophes in this paragraph.

#4: HTTP support

Prince 6 has built-in HTTP support and can fetch pages, images, DTDs and style sheets from the web. The image to the left was automatically fetched when the PDF version was generated.

#5: Web fonts

Prince 6 can fetch fonts from the web and use them without installing them on your system. The fonts used in the PDF version of this document are web fonts. We are grateful to Ray Larabie, Dieter Steffmann, and Red Hat for making high-quality fonts freely available.

#6: Image resolution

Sometimes images should be scaled to a certain resolution, rather than to an absolute size. In Prince, you can set the resolution of an image as a property in the style sheet. The smiley face in the previous section was scaled this way. The smiley also represents the challenging Acid2 test, which Prince 6 passes.

#7: Columns

Columns are commonly used a two-column layout used on paper and Prince supports multi-column layout between. The width of the gap and the style of the rule is set in the style sheet. This is for demonstration. Also, this section

#8: CMYK colors

Printers don't use RGB colors, they mostly use CMYK: cyan, magenta, yellow and black. Prince 6 can read CMYK colors and will use them, if present. The heading above this paragraph has both an RGB color (red) and a CMYK color (blush). Therefore, the text is red in browsers, but blue in the PDF version. This is for demon-

Page 2

The magic of Prince

www.princexml.com

stration purposes; normally the colors would be close to each other.

#9: Counters If you are reading the XHTML source code of this document, look for the h2 elements. You will notice that they contain the text of the headings, but not their number. The list item number, including the "#" and "—" are automatically generated by the style sheet. Generated content and counters are especially useful for complex documents. They are described in CSS level 2.1.

#10: Crop and cross marks In printing, crop marks are used to indicate where the printed paper should be cut. Cross marks are used to align prints of different colors to improve color reproduction. Prince 6 adds support for crop and cross marks, and the PDF version of this document includes both.

#11: Cross-references Prince can read hyperlinks inside a document and generate page numbers accordingly. For example, it will automatically find out which page Headers and Footers are discussed on (page 2). Cross-references are used to generate the Table of contents (page 1).

#12: Math Prince 6 provides experimental support for MathML. Here is an example:

$\text{maps to}$ $x \longrightarrow y = f_x(x) = \left(1 + \frac{1}{x}\right)^x$

#13: Footnotes Footnotes $^1$ are essential in printed documents and Prince knows how to generate them. Unlike what some people think, footnotes are not the place to put information you don't want to see. More often, footnotes will actually attract attention. 9 of 10 readers will read the footnotes before they read the text from where the footnotes are anchored $^2$ .

#14: Leaders Leaders consist of dots or dashes in a row leading the eye across a page. For example, the Table of contents has leaders in it. The leaders are not found in the document itself, but rather in the style sheet.

#15: SVG Scalable Vector Graphics (SVG) is a language for describing two-dimensional graphics for the web. SVG images scale better than traditional bitmapmed images and are suitable for printing. The crown is generated by two SVG elements.

#16: Page floats On paged media, elements can be set float to the top or bottom of pages. The big URL at the top of this page comes after this paragraph in the source code, but is floated to the top by the style sheet.

#17: Headers and footers Printed documents often have page headers and footers. Printed documents often have page headers and footers. For example, page numbers are often printed at the bottom of the page, and the document title is shown at the top — except on title pages.

#18: PDF Bookmarks Prince will automatically generate PDF bookmarks from heading elements in HTML. The feature is set with a property in the style sheet, and can also be used with other markup languages.

A footnote is a note placed at the bottom of a page of a book or manuscript that comments on or cites a reference for a designated part of the text.
Often, the most interesting information is found in the footnotes.

Restructure PDF.

Two pillars, one platform

AI Extraction Tools

Batch Processing Utilities

Lean. Fast. Smart.

See the difference

Page 1

The magic of Prince

Page 2

www.princexml.com

Security & privacy

Temporary processing

Your data is not used for training

No third-party sharing

Simple, transparent pricing

Extract PDFs to Markdown & Word

FAQ

Ready to liberate your PDF data?

Restructure PDF. Ready for AI.

AI PDF Extraction — Markdown, Excel & OCR

Why PDF2Brain for PDF Data Liberation?

Frequently asked questions

Restructure PDF.

AI Extraction Tools

Batch Processing Utilities

Page 1

The magic of Prince

Page 2

www.princexml.com

Temporary processing

Your data is not used for training

No third-party sharing

Extract PDFs to Markdown & Word

Does it work on scanned PDFs and books?

Are my files stored on your servers?

How does billing work?

What output formats are supported?

Ready to liberate your PDF data?