This article refers to the new version of the template editor for PDFs using OCR. Check out this page if you are looking for the older version for emails and text PDFs.
This tutorial assumes that you have already created a mailbox and sent your first email. If not, check out this article to get started.
Prefer the video?
What is OCR?
OCR stands for Optical Character Recognition. It is the method we use to identify and extract text from PDFs.
Step 1: Open the template editor
A template needs to be created from one or several sample documents. You can create your first template in several ways: using the wizard, from the document view page, or from the document list page.
Open the template editor from the Mailbox Wizard
If you have just created your template, the Wizard will offer you to create your first template.
Open the template editor from the document view page
From the document view page, click on the + Create Template button to open the editor.
If the document has already been processed, you can also use the " + " button at the top to open the template editor.
Open the template editor from the document list page
Head over to the Documents section on the left menu. Hover on the document you want to create a template from and click on + New Template.
Open the template editor from the document list page with multiple samples
The new version lets you create templates with multiple document samples so that you can work with optional fields and better test whether your template works for all documents.
You can create a new template and include several samples:
Check the boxes next to the documents you want to use as samples.
Then, click on the + New template button
Step 2: familiarize yourself with the OCR template editor.
When you create your first template, the Template Editor tutorial opens. You can revisit that tutorial by clicking on the "Help and Tutorial" link at the top right corner of the screen.
The Template Editor is where you will show which data points you want to retrieve from documents.
Let's go through each section of this screen:
Template Name: give your template a name. The name must be unique in a mailbox. We recommend you always update the default name and give a meaningful one to each template.
Contextual help: gives you some tips on what to do next or error messages, if any.
Sample list: you can attach several document samples to the template editor. This allows you to manage optional fields and check that a template works against several documents.
View: leave this on image view for now. Other modes can be useful but are for advanced usage.
Content: shows the content of the current selected PDF sample. You can draw a box over it to tell Parseur which data to extract (see Step 3 below).
Fields tab: lists the fields used or available to use. As you haven't created any field yet, this list is empty.
Metadata tab: lists additional metadata fields you may want to add to your parsed results. See below for more information.
Static tab: allows you to create Static fields, which are fields you can set with custom values. See below for more information.
Settings tab: lists several advanced options, like the action to take on matching documents.
Create buttons: you will use those buttons to create fields, labels, and table fields. They will become active once you draw a box over the content. Read on for more information.
Step 3: Create your first field
In Parseur, a field represents a piece of information you want to extract.
The animation below shows you how to create your first template.
To create a field:
Draw a box over the text you want to extract. Draw the box over the full size the text can possibly take in any document. Parseur will only extract the text under the box.
Move or resize the box using the handles as appropriate
The "New Field" button becomes available
Click this button will open the field option section
Name your field and change options as appropriate
Click Save or draw a new field
When you create a field, Parseur will position it in absolute terms on the page by default, which means it will extract the text in all documents in that exact box location on that page. If the field can move horizontally or vertically, you can use labels to position the field dynamically. Check out our article on how to use labels and dynamic OCR for more information.
Step 4: Create all remaining fields and save
Repeat the steps described above for every field you want to capture.
Tips for creating a template:
As mentioned above, make sure to have fields cover the full zone where the text can be placed for a field, not only the one where the text is in the current document.
On the right-hand side, you see some fields and labels in bold and some in regular text: fields in bold text are required, and those in regular text are optional. You can change that setting by toggling the "Field presence is required" switch when editing a field or label.
As you can see from the screen capture, we created labels on top of the invoice supplier's name and the "Invoice" term. This will help Parseur select the right template in case your mailbox contains templates from several suppliers. When searching for the best template, Parseur will filter for the ones that contain all mandatory labels.
Step 5: Add metadata fields (optional)
You may want to extract additional metadata information that is not present in the document body, like, for instance, a link to the original PDF document.
Head over to the Metadata tab next to the Fields tab
For more information, check out our Using Metadata Fields article.
Step 6: Save the template
Once finished, click "Create". Make sure that the "Processed Unprocessed Documents" option is selected in the dropdown.
You can now see that your document has been processed.
Step 7: Check the results
Make sure that all the data is captured correctly.
On this screen, you see:
At the top left, metadata information about the document, including the template that was used
At the top right, the action buttons (hover them for more information)
On the left, the document content
On the right, the parsed data is extracted. You can switch between the table view and JSON view according to your preference
If everything looks correct, congratulations! You have parsed your first document!
Now send more documents and verify that your data has been correctly extracted. Create new templates as necessary.
FAQ - Frequently Asked Questions
Can I split multipage PDFs into several documents or apply a template to several pages?
Yes, check out our dedicated article on how to split bundled PDFs into separate documents.
How does Parseur prioritize templates?
Templates are prioritized following the same usual rules. Check out the following article to understand how Parseur picks a template.
How can I parse checkboxes or radio buttons?
Parseur doesn't support extracting radio buttons or checked boxes at this time. You can upvote the following feature request to help us prioritize this feature: https://feedback.parseur.com/suggestions/358567/handle-checkboxes
I have another question
Please contact us on the chat at the bottom right corner.