Skip to main content
All CollectionsExtracting data
AI vs template parsing: pros and cons
AI vs template parsing: pros and cons

Understand the benefits and drawbacks of Parseur’s three parsing engines

Updated over a month ago

This article explores the benefits and drawbacks of Parseur’s three parsing engines: AI, OCR for PDFs, and text templates for emails and documents.

This article assumes you’re already familiar with Parseur and have created an account. If not, you can learn more about Parseur and sign up for our free plan.

Overview of Parseur’s Parsing Engines

Parseur offers three distinct parsing engines:

  1. AI-Based Engine - Works with all document types.

  2. Template-Based OCR Engine - Designed specifically for PDFs and Images.

  3. Template-Based Text Engine - Primarily for emails and text documents but compatible with other formats.

To check which engines are available for a document, look in the upper right corner when viewing a document:

  • ✅ Engine is supported.

  • ❌ Engine is not supported.

  • ⏸️ Engine is not enabled in this mailbox. You can activate it in the mailbox settings.

In the example below, only the text parsing engine is available because AI parsing is disabled for the email.

Hover over each badge to get more information.

AI-Based Parsing Engine

How it works

To use the AI engine, enable it during mailbox creation or in the mailbox settings. Specify field names to describe the data you want extracted, add any additional instructions and the AI will attempt to find the most relevant text.

Pros

  • Supports All Document Types: Works with PDFs, images, emails, HTML, and other text files.

  • Flexible Layout Parsing: Handles various document layouts and complex tables.

  • Easy Setup: Just name the fields you want and add instructions if needed.

  • Smart Analysis: Can perform tasks like summarization and translation using instructions.

Cons

  • Document Length Limit: Effective up to about 100 pages, but varies with language and content density.

  • Accuracy May Decrease with Many Fields: As more fields are added, parsing accuracy can decrease.

  • Best Performance in English: Although all languages are supported, accuracy is highest for English documents.

  • Probabilistic Nature: Results may vary slightly, and debugging is limited.

  • No Mandatory Field Support: Currently, there is no option to mark a field as mandatory.

Use the AI engine if...

  • You get accurate results based on your field names and instructions.

  • Your documents have various layouts.

  • You need advanced data analysis.

  • 100% accuracy isn’t necessary, or you have a quality-monitoring process.

Additional Resources

OCR Template Engine for PDFs and Images

How it works

To use the OCR engine, create templates by drawing boxes over the text you want to extract. Set up one template per document layout, and Parseur will automatically select the best template.

Pros

  • No Page Limit: Handles documents of any length.

  • Flexible Field Count: Extract as many fields as needed.

  • Language Agnostic: Works with any language or alphabet.

  • Mandatory/Optional Fields: Define fields as mandatory or optional.

  • Template-Based Parsing: Deterministic results with debugging support.

Cons

  • One Template Per Layout: Requires separate templates for different layouts.

  • Limited Document Support: Limited to PDFs and images.

  • Limited Table Support: Only handles simple tables.

Use the OCR engine if...

  • You need high accuracy and data quality.

  • You’re working with a manageable number of PDF layouts.

  • The tables are simple, or you can post-process complex data.

Additional Resources

Text Template Engine for Emails and Text Documents

How it works

Create templates by selecting the text segments you want to extract. Set up one template per document layout, and Parseur will choose the best match.

Pros

  • No Page Limit: Works with documents of any length.

  • Unlimited Fields: Extract as many fields as required.

  • Supports All Languages: Compatible with various languages and alphabets.

  • Complex Table Extraction: Handles complex tables with some configuration.

Cons

  • One Template Per Layout: Requires individual templates for each layout.

  • Mandatory Fields Only: All fields in a text template are mandatory; optional fields require multiple templates.

  • Email Client Sensitivity: Forwarded emails must use automated rules for consistency.

  • Complex Table Extraction: May require regex adjustments.

Use the text template engine if...

  • You need consistent, high-quality data.

  • You have a reasonable number of email layouts.

  • You can set up automated forwarding rules for email parsing.

Tips and more information

Recommendation: Use All Engines Together!

Parseur allows you to use all three parsing engines in the same mailbox for maximum flexibility.

How Parseur Prioritizes Engines

  1. Text Template: Parseur first checks for a matching text template.

  2. OCR Template: If no text template is found, it looks for an OCR template.

  3. AI Engine: If neither template is available, Parseur will use the AI engine.

By understanding the strengths of each parsing engine, you can maximize Parseur’s capabilities for accurate and efficient document processing.

Did this answer your question?