This article refers to the new version of the template editor for PDFs using OCR. Check out this page if you are looking for the older version for emails and text PDFs.
Notes: Documents processed with an OCR template are charged 1 credit per page in the document
This tutorial assumes that you already created a mailbox and sent your first email. If not, check out this article to get started.
Prefer the video?
Step 1: Open the template editor
A template needs to be created from one or several sample documents. You can create your first template in several ways: using the wizard, from the document view page or from the document list page.
Open the template editor from the Mailbox Wizard
If you have just created your template, the Wizard will offer you to create your first template.
Open the template editor from the document view page
From the document view page, click on the + Create Template button to open the editor.
If the document has already been processed, you can also use the " + " button at the top to open the template editor.
Open the template editor from the document list page
Head over to the Documents section on the left menu. Hover on the document you want to create a template from and click on + New Template
Open the template editor from the document list page with multiple samples
The new version lets you create templates with multiple document samples so that you can work with optional fields and better test your template works for all documents.
You can create a new template and include several samples:
Select the documents you want to use as samples by clicking on the checkbox
Then, click on the + New template button
Step 2: get acquainted with the OCR template editor
When you create your first template, the Template Editor tutorial opens. You can revisit that tutorial by clicking on the "How to use this editor" link at the top right corner of the screen.
The Template Editor is where you will show which data points you want to retrieve from documents.
Let's go through each section of this screen:
Template Name: give your template a name. Name must be unique in a mailbox. We recommend you always update the default name and give a meaningful one to each template.
Contextual help: gives you some tips on what to do next or error messages, if any.
Sample list: you can attach several document samples to the template editor. This allows you to manage optional fields and check a template works against several documents.
View: leave this on image view for now. Other modes can be useful but are for advanced usage.
Content: shows the content of the current selected PDF sample. You can draw a box over it to tell Parseur which data to extract (see Step 3 below).
Fields tab: lists the fields used or available to use. As you haven't created any field yet, this list is empty.
Metadata tab: lists additional metadata fields you may want to add to your parsed results. See below for more information.
Static tab: allows you to create Static fields, which are fields you can set with custom values. See below for more information.
Settings tab: lists several advanced options like the action to take on matching documents.
Create buttons: you will use those buttons to create fields, labels and table fields. They will become active once you draw a box over the content. Read on for more information.
Step 3: Create your first field
In Parseur, a field represents a piece of information you want to extract.
The animation below shows you how to create your first template.
To create a field:
Draw a box over the text you want to extract. Draw the box over the full size the text can possibly take in any document. Parseur will only extract the text under the box.
Move or resize the box using the handles as appropriate
The "New Field" button becomes available
Click this button, this will open the field option section
Name your field and change options as appropriate
Click Save or draw a new field
When you create a field, Parseur will position it in absolute terms on the page by default: that means it will extract the text in all documents in that exact box location on that page. If the field can move horizontally or vertically, you can use labels to position the field dynamically. Check out our article on how to use labels and dynamic OCR for more information.
Step 4: Create all remaining fields and save
Repeat steps described above for every field you want to capture.
Tips when creating a template:
As mentioned above, make sure to have fields cover the full zone where the text can be placed for a field, not only the one where the text is in the current document.
On the right end side you see some fields and labels in bold and some in regular text: fields in bold text are required, and the ones in regular text are optional. You can change that setting by toggling the "Field presence is required" switch when editing a field or label.
As you can see from the screen capture, we created labels on top of the invoice supplier's name and the "Invoice" term. This will help Parseur selecting the right template in case your mailbox contains templates from several suppliers. When searching for the best template, Parseur will filter for the ones that contain all mandatory labels.
Step 5: Add metadata fields (optional)
You may want to extract additional metadata information that is not present in the document body, like for instance, a link to the original PDF document.
Head over to the Metadata tab next to the Fields tab
For more information, check out our Using Metadata Fields article.
Step 6: Save the template
Once finished, click "Create".
You now see that your document has been processed.
Step 7: Check the results
Make sure that all the data were captured correctly.
On this screen you see:
At the top left, metadata info about the document, including the template that was used
At the top right, the action buttons (hover them for more information)
On the left, the document content
On the right, the parsed data extracted. You can switch between the table view and JSON view according to your preference
If everything looks correct, congratulations! You have parsed your first document!
Now send more documents and verify that your data is correctly extracted. Create new templates as necessary.
FAQ - Frequently Asked Questions
How can I split multipage PDFs into several documents?
You can setup your mailbox to split PDFs every X pages on the Mailbox Settings:
Open your mailbox
Click on Settings on the left menu
Click on the Processing tab
Check the "Split PDFs into individual documents" box
Enter the number of pages you want Parseur to split your PDFs on
How does Parseur prioritize templates?
Templates are prioritized following the same usual rules. Check out the following article to understand how Parseur picks a template.
I have another question
Please contact us on the chat at the bottom right corner.