This article explores the benefits and drawbacks of Parseur’s three parsing engines: AI, OCR for PDFs, and text templates for emails and documents.

This article assumes you’re already familiar with Parseur and have created an account. If not, you can learn more about Parseur and sign up for our free plan.

Overview of Parseur’s Parsing Engines

Parseur offers three distinct parsing engines:

AI-Based Engine - Works with all document types.
Template-Based OCR Engine - Designed specifically for PDFs and Images.
Template-Based Text Engine - Primarily for emails and text documents but compatible with other formats.

To check which engines are available for a document, look in the upper right corner when viewing a document:

✅ Engine is supported.
❌ Engine is not supported.
⏸️ Engine is not enabled in this mailbox. You can activate it in the mailbox settings.

In the example below, only the text parsing engine is available because AI parsing is disabled for the email.

Hover over each badge to get more information.

AI-Based Parsing Engine

How it works

To use the AI engine, enable it during mailbox creation or in the mailbox settings. Specify field names to describe the data you want extracted, add any additional instructions and the AI will attempt to find the most relevant text.

Pros

Supports All Document Types: Works with PDFs, images, emails, HTML, and other text files.
Flexible Layout Parsing: Handles various document layouts and complex tables.
Easy Setup: Just name the fields you want and add instructions if needed.
Smart Analysis: Can perform tasks like summarization and translation using instructions.

Cons

Document Length Limit: Effective up to about 100 pages, but varies with language and content density.
Accuracy May Decrease with Many Fields: As more fields are added, parsing accuracy can decrease.
Best Performance in English: Although all languages are supported, accuracy is highest for English documents.
Probabilistic Nature: Results may vary slightly, and debugging is limited.
No Mandatory Field Support: Currently, there is no option to mark a field as mandatory.

Use the AI engine if...

You get accurate results based on your field names and instructions.
Your documents have various layouts.
You need advanced data analysis.
100% accuracy isn’t necessary, or you have a quality-monitoring process.

Additional Resources

Using the AI Parsing Engine

OCR Template Engine for PDFs and Images

How it works

To use the OCR engine, create templates by drawing boxes over the text you want to extract. Set up one template per document layout, and Parseur will automatically select the best template.

Pros

No Page Limit: Handles documents of any length.
Flexible Field Count: Extract as many fields as needed.
Language Agnostic: Works with any language or alphabet.
Mandatory/Optional Fields: Define fields as mandatory or optional.
Template-Based Parsing: Deterministic results with debugging support.

Cons

One Template Per Layout: Requires separate templates for different layouts.
Limited Document Support: Limited to PDFs and images.
Limited Table Support: Only handles simple tables.

Use the OCR engine if...

You need high accuracy and data quality.
You’re working with a manageable number of PDF layouts.
The tables are simple, or you can post-process complex data.

Additional Resources

Text Template Engine for Emails and Text Documents

How it works

Create templates by selecting the text segments you want to extract. Set up one template per document layout, and Parseur will choose the best match.

Pros

No Page Limit: Works with documents of any length.
Unlimited Fields: Extract as many fields as required.
Supports All Languages: Compatible with various languages and alphabets.
Complex Table Extraction: Handles complex tables with some configuration.

Cons

One Template Per Layout: Requires individual templates for each layout.
Mandatory Fields Only: All fields in a text template are mandatory; optional fields require multiple templates.
Email Client Sensitivity: Forwarded emails must use automated rules for consistency.
Complex Table Extraction: May require regex adjustments.

Use the text template engine if...

You need consistent, high-quality data.
You have a reasonable number of email layouts.
You can set up automated forwarding rules for email parsing.

Tips and more information

Recommendation: Use All Engines Together!

Parseur allows you to use all three parsing engines in the same mailbox for maximum flexibility.

How Parseur Prioritizes Engines

Text Template: Parseur first checks for a matching text template.
OCR Template: If no text template is found, it looks for an OCR template.
AI Engine: If neither template is available, Parseur will use the AI engine.

By understanding the strengths of each parsing engine, you can maximize Parseur’s capabilities for accurate and efficient document processing.

Extract metadata from emails and documents with Metadata fields

Download and parse a webpage from a link in a document

Document formats supported by Parseur

Create your first OCR template to extract text from PDF

Extract data using the AI parsing engine

AI vs template parsing: pros and cons

Overview of Parseur’s Parsing Engines

AI-Based Parsing Engine

How it works

Pros

Cons

Use the AI engine if...

Additional Resources

OCR Template Engine for PDFs and Images

How it works

Pros

Cons

Use the OCR engine if...

Additional Resources

Text Template Engine for Emails and Text Documents

How it works

Pros

Cons

Use the text template engine if...

Tips and more information

Recommendation: Use All Engines Together!

How Parseur Prioritizes Engines