With Parseur’s AI engine, you can extract data from documents effortlessly by leveraging field names and instructions within your mailbox. No more manual template setups—just simple and accurate data extraction, regardless of the document’s language or complexity.
Key AI Data Extraction Features:
Template-less extraction: Say goodbye to manual template creation and updates. Our AI-driven solution eliminates the need for setup, allowing you to automatically extract data from documents.
Flexible Document Layouts: Because there are no templates, the AI engine can extract data from any kind of document layout.
Multilingual proficiency: Parseur’s AI can understand and extract data from documents in most languages.
Limitations:
The AI engine has the following limitations to bear in mind:
Page count limitation: When extracting data from tables, the AI can handle documents up to 100 pages. Use the Split PDF feature to divide your document into smaller parts during upload if needed.
How to parse documents using AI
Getting started with Parseur’s AI parsing feature is quick and intuitive:
Step 1 : Select AI-assisted mailbox creation
Choose the AI-assisted mailbox creation mode to have Parseur set up a mailbox for you.
Alternatively, if using the Manual Creation mode, ensure the “Use AI” toggle is enabled after selecting your mailbox type.
Finally, for existing mailboxes, you can also activate AI in the mailbox settings.
Step 2: Upload a Sample Document
Upload one or more samples documents representative of the type of data you want to extract.
Step 3: Review Suggested Fields
Parseur will analyze your document, suggest fields to extract, and proceed with data extraction. Click on a document name to review the extracted data.
If you need to make changes, click on the Fields tab and continue to Step 4.
Note: You can also access this list in the “Fields” section of your mailbox on the left menu.
Step 4 (optional): Refine Fields for Extraction
Option 1: Edit Existing Fields
Click the edit button next to a field to modify it.
Here is what you'll get after you click edit:
You can update the following attributes:
Field Name: The label for your data when you download or export it.
Output Format: The type of data to extract. Use this to further normalize your data. Refer to the overview about field formats for more details.
Instructions: By default, AI uses field names to understand what to extract. If needed, provide more detailed instructions and context here. Think of instructions as a custom prompt describing what you want the AI to extract. Read more about using instructions.
Option 2: Create Fields
If needed, you can add fields. Simple fields lets you extract a single value, whereas table fields will list you extract repetitive data.
To create a Simple Field:
Click on "New field" to add the specific fields you wish to extract.
Enter the field name, format, and instructions as detailed above.
To create table fields:
Click on "New Table", and enter the name and instructions.
Click on the "Add fields to <your field>" button to name the individual fields you want to extract from the table.
Repeat this process for each field in the table (e.g., quantity, description, SKU, price).
Step 5: Reprocess Your Document and Review Results
After updating all desired extraction fields, click the “Process” button to initiate the AI-driven data extraction process.
Frequently Asked Questions (FAQ)
Q1: Parseur AI didn’t fetch the value I wanted for some fields. How can I improve the AI’s accuracy?
Tip #1: Use More Accurate Instructions
Edit your field and update the instructions to provide more context about the data you want the AI to extract. Note that the AI can only analyze data included in the document; it doesn’t have internet access to retrieve external data. Read more about using instructions.
Tip #2: Remove Unused or Duplicate Fields
Having too many fields can confuse the AI. If Tip #1 doesn’t help, try limiting the extracted fields to only the essential ones.
Tip #3: Consider Using a Template Engine for Some Layouts
AI is a probabilistic model and may not always achieve 100% accuracy. If you require better results, consider creating templates for specific layouts. Read more about the pros and cons our AI parsing engine vs template parsing engines.
Q2: Parseur only retrieved one data point from my documents. How do I extract all similar data points?
If data repeats within a page, use “Table fields” instead of single fields:
Go to the “Fields” tab when viewing a document.
Click New Table
Name it in a way the AI understands (e.g., “ContactList” for contact details).
Click Create
Click Add Field and name each field similar to the single fields previously used.
Delete the single fields to avoid confusing the AI.
Reprocess your documents and review the results.
For documents containing multiple individual documents (e.g., several invoices), use the Split PDF feature.
Q3: Can the AI extract data from long documents?
Yes, the AI engine can extract documents up to 100 pages.
Please note that Processing long documents takes some time, we appreciate your patience.
If you have longer documents, consider these options:
Use the “Split Document” feature to separate a bundled PDF into individual documents.
Alternatively, consider using one of our template engines: the Text engine for emails and text documents, and the OCR engine for PDFs.
Q4: If I have both templates and the AI engine enabled in my mailbox, which one will be used?
Matching templates take priority over the AI engine. If no matching templates are found, Parseur will use the AI engine for data extraction.
Q5: How secure is my data when using the AI engine? Is my data shared to improve the AI model?
Parseur uses state-of-the-art AI models from Azure, Google, and AWS to process your data. Your data is processed in the European Union and remains your property; it is not reused or shared to improve AI models.
Q6: What is the difference between AI engine v1 and v2 in the mailbox settings?
AI v1 is our legacy template engine, introduced in 2023. AI v2, introduced in July 2024, improves extraction accuracy and can handle parsing data from larger documents. We recommend using v2 by default and switching to v1 only if needed.