This article refers to the new version of the template editor for PDFs using OCR. This tutorial assumes that you already know how to create simple OCR templates.
Understanding field positioning: absolute vs relative to label
When you create a field, Parseur will position it in absolute terms on the page by default, which means it will extract the text from all documents in that exact box location on that page.
This is called Zonal OCR, and it works when the field is always in the same place in a document. But sometimes, a field can move up or down, left or right, in a document.
This is when you can use relative label positioning, also known as Dynamic OCR.
Let's take the example below: we want to extract the subtotal value under the table. However, the number of items in the table can vary from one document to the next. So the position of the subtotal value can move vertically across documents.
The subtotal value will move vertically on the page depending on the number of items in the table above
However, the position of that value is always present in the same place to the right of the
Subtotal text placeholder. So we'll create a Label over the
Subtotal text. Then we'll tell Parseur that the Subtotal field will be relative to this label.
How to create a dynamically positioned field with Labels?
Creating a field positioned relative to a label is quite straightforward
Draw a box over the text label that you want to use to position the field relative to it dynamically (in our example, "Subtotal")
Click New Label
Wait for Parseur to identify the content of the label
Draw a box over the text you want to extract
Click New Field and enter the field name and other options like any normal field
Under Field position > Start relative to, select the label you just created from the dropdown
If your field has a fixed height and width, you only need to use the "Start relative to" option.
How create fields with a dynamic height or width?
If your field has a variable height (typical tables with varying numbers of rows or comments with varying numbers of lines),
Perform steps 1 to 6 above
Create a second label below the field
Edit the field
Under Field position > End relative to, select that second label
This will tell Parseur to stop the field relative to that second label.
The label content recognized by Parseur is not the one I see on the document. What can I do?
When you create a label over a piece of text and the text recognized by Parseur isn't the one you see on the document, it usually means that your PDF was scanned or encoded with a bad OCR program.
When that happens, you can ask Parseur to redo the OCR by enabling the Force OCR option in your mailbox Settings > Processing. You will then need to delete and re-upload the document for the OCR to take place.
I am getting a "No text found in the box" error. What can I do?
This can typically happen if you try to create a label over an image (for example, a company logo or screen capture embedded in the document). There are two ways to fix this:
Option 1: Find another label
Try to find another piece of text that can accurately position the field. This is the recommended option.
Option 2: Force OCR on images
If option 1 is not possible, you can force Parseur to detect text in images.
To do so:
Open your Mailbox Settings
Click on the Processing tab
Under Advanced Settings, check the "Force use of OCR on PDFs" button
Re-upload your Documents
How are labels identified in a document?
Labels are identified using two data points:
A text content, for example, "Subtotal" in our previous example
An occurrence number. If the text content is found several times in the document, Parseur will use the occurrence number to select the right label.
How do I constrain a template to a certain number of occurrences of a label?
In some cases, you may want Parseur to make sure that the total number of occurrences also matches. To do so:
Edit your label
Click on the Lock icon
Save the label and template
With this option, Parseur will not match a document to that template if the total number of occurrences in a document is different.
How do I pick the last label of a document?
By default, the document's top is the starting point for calculating label occurrence. Sometimes, however, you want to tell Parseur that the label should be located starting from the bottom of the document instead.
For example, you want to always take the last occurrence of "Total" in a document, even though the total number of occurrences varies from one document to the next.
You can change the direction occurrences are counted on the label edit screen: