This article refers to the new version of the template editor for PDFs using OCR. This tutorial assumes that you already know how to create simple OCR templates.
Understanding field positioning: absolute vs relative to label
When you create a field, Parseur will position it in absolute terms on the page by default: that means it will extract the text in all documents in that exact box location on that page.
This is called Zonal OCR and it works when the field is always at the same place in a document. But sometimes, a field can move up or down, left or right in a document.
This is when you can use the relative label positioning, also known as Dynamic OCR.
Let's take the example below: we want to extract the subtotal value under the table. However, the number of items in the table can vary from one document to the next. So the position of the subtotal value can move vertically across documents.
Subtotal value will move vertically on the page depending on the number of items in the table above
However, the position of that value is always present at the same place to the right of the Subtotal
text placeholder. So we'll create a Label over the Subtotal
text. Then we'll tell Parseur that the Subtotal field will be relative to this label.
How to create a dynamically positioned field with Labels?
Creating a field positioned relative to a label is quite straightforward
Draw a box over the text label that you want to use to position the field relative to it dynamically (in our example, "Subtotal")
Click New Label
Wait for Parseur to identify the content of the label
Draw a box over the text you want to extract
Click New Field and enter the field name and other options like any normal field
Under Field position > Start relative to, select the label you just created from the dropdown
If your field has a fixed height and width, you only need to use the "Start relative to" option.
How create fields with a dynamic height or width?
If your field has a variable height (typical tables with varying numbers of rows or comments with varying numbers of lines),
Perform steps 1 to 6 above
Create a second label below the field
Edit the field
Under Field position > End relative to, select that second label
This will tell Parseur to stop the field relative to that second label.
I am getting a "No text found in the box" error. What can I do?
This can typically happen if you try to create a label over an image (for example a company logo or screen capture embedded in the document). There are two ways to fix this:
Option 1: Find another label
Try to find another piece of text that can accurately position the field. This is the recommended option.
Option 2: Force OCR on images
If option 1 is not possible, you can force Parseur to detect text in images.
To do so:
Open your Mailbox Settings
Click on the Processing tab
Under Advanced Settings, check the "Force use of OCR on PDFs" button
Click Save
Re-upload your Documents
How are labels identified in a document?
Labels are identified using two data points:
A text content, for example, "Subtotal" in our previous example
An occurrence number. In case the text content is found several times in the document, Parseur will use the occurrence number to select the right label.
How to constrain a template on a certain number of occurrences of a label?
In some cases, you may want Parseur to make sure that the total number of occurrences also matches. To do so:
Edit your label
Click on the Lock icon
Save the label and template
With this option, Parseur will not match a document to that template if the total number of occurrences in a document is different.
How to pick the last label of a document?
Label occurrence is calculated from top of the document by default. Sometimes however, you want to tell Parseur the label should be located from the bottom of the document instead.
For example, you want to always take the last occurrence of "Total" in a document even of the total number of occurrences varies from one document to the next.
You can change the the direction occurrences are counted on the label edit screen: