This article refers to the new version of the template editor for PDFs using OCR. Check out this page if you are looking for the older version for emails and text PDFs.

Notes:

  • Documents processed with an OCR template are charged 1 credit per page in the document

  • This feature is currently in beta. Contact us on the chat if you'd like to try it out.

This tutorial assumes that you already created a mailbox and sent your first email. If not, check out this article to get started.

Step 1: Open the template editor

A template needs to be created from one or several sample documents. You can create your first template in several ways: using the wizard, from the document view page or from the document list page.

Open the template editor from the Mailbox Wizard

If you have just created your template, the Wizard will offer you to create your first template.

Create on Create Template from the Mailbox Wizard

Open the template editor from the document view page

From the document view page, click on the + Create Template button to open the editor.

If the document has already been processed, you can also use the " + " button at the top to open the template editor.

Click on Create Template from the document view page

Open the template editor from the document list page

Head over to the Documents section on the left menu. Hover on the document you want to create a template from and click on + New Template

Click on New Template from the document list page

Open the template editor from the document list page with multiple samples

The new version lets you create templates with multiple document samples so that you can work with optional fields and better test you template works for all documents.

You can create a new template and include several samples:

  1. Select the documents you want to use as samples by clicking on the checkbox

  2. Then, click on the + New template button

Step 2: get acquainted with the OCR template editor

When you create your first template, the Template Editor tutorial opens. You can revisit that tutorial at any time by clicking on the "How to use this editor" link at the top right corner of the screen.

The Template Editor is where you will show which data points you want to retrieve from documents.

Walk through the template editor screen

Let's go through each section of this screen:

  1. Template Name: give your template a name. Name must be unique in a mailbox. We recommend you always update the default name and give a meaningful one to each template

  2. Contextual help: gives you some tips on what to do next or error messages, if any.

  3. Sample list: you can attach several document samples to the template editor. This allows you to manage optional fields and check a template works against several documents.

  4. View: leave this on Image view for now. Other modes can be useful but are for an advanced usage.

  5. Content: shows the content of the current selected PDF sample. You can draw box over it to tell Parseur which data to extract (see Step 3 below).

  6. Fields tab: lists the fields used or available to use. As you haven't created any field yet, this list is empty.

  7. Metadata tab: lists additional metadata fields you may want to add to your parsed results. See below for more information.

  8. Static tab: allows you to create Static fields, which are field you can set with custom values. See below for more information.

  9. Settings tab: lists several advanced options like the action to take on matching documents.

  10. Create buttons: you will will use those buttons to create fields, label and table fields. They will become active once you draw a box over the content. Read on for more information.

Step 3: Create your first field

In Parseur, a field represents a piece of information you want to extract.

The animation below shows you how to create your first template.

To create a field:

  1. Draw a box over the text you want to extract. Make sure to draw the box over the full size the text can possibly take in any document. Parseur will only extract the text under the box.

  2. Move or resize the box using the handles as appropriate

  3. The "New Field" button becomes available

  4. Click this button, this will open the field option section

  5. Name your field and change options as appropriate

  6. Click Save or draw a new field

To create a table field:

  1. Draw a box over the table you want to extract.

  2. Move or resize the box using the handles as appropriate

  3. Click on the "New Table Field" button

  4. The preview becomes available

  5. Click in the table at the position where you want to split columns.

  6. The preview updates with the new columns

  7. Name the columns by clicking on their names in the table preview or in the right menu

  8. If the table has a variable number of rows, make sure to create a label to identify the end of the table and assign it to the "End relative to" field (see below for more information about the use of label)

  9. If the table can span multiple pages and there are headers and footers that get in the way, move the red header and footer rulers vertically to tell parser to avoid header and footer margins respectively

Step 4: Understanding field positioning: absolute vs relative to label

When you first create a field, Parseur will position it in absolute terms on the page by default: that means it will extract the text in all documents in that exact box location on that page.

Absolute positioning works when the field is always at the same place in a document. But sometimes a field can move up or down, left or right in a document. This is when you can use the relative label positioning.

Let's take the example below: we want to extract the subtotal value below the table. However the number of items in the table can vary from one document to the next. So the position of the subtotal value can move vertically across documents.

Sub total value will move vertically on the page depending on the number of items in the table above

However the position of that value is always present at the same place to the right of the Subtotal text placeholder. So we'll create a Label over the Subtotal text. Then we'll tell Parseur that the Subtotal field will be relative to this label.

If your field has a fixed height and width, you only need to use the "Start relative to" option. If your field has a variable height (typically tables with varying number of rows), use "Start relative to" in combination with "End relative to".

To create table fields: tables fields have a variable height, so you need to help Parseur understand where to start and especially where to stop. For this, you can use 2 relative labels, one for the start of the table, and one for the end.

In the example below, the table field is relative to the Table Header label for the start and the Subtotal Label for the end:

Step 5: Create all remaining fields and save

Repeat steps described above for every field you want to capture.

Tips when creating a template:

  1. As mentioned above, make sure to have fields cover the full zone of where the text can be placed for a field, not only the one on where the text is in the current document

  2. On the right end side you see some fields and labels in bold and some in regular text: fields in bold text are required, the ones in regular text are optional. You can change that setting by toggling the "Field presence is required" switch when editing a field or label

  3. As you can see from the screen capture, we created labels on top of the invoice supplier name and the "Invoice" term. This will help Parseur selecting the right template in case your mailbox contains templates from several suppliers. When searching for the best template, Parseur will filter on the ones that contain all mandatory labels.

Step 6: Add metadata fields (optional)

You may want to extract additional metadata information that is not present in the document body, like for instance a link to the original PDF document.

Head over to the Metadata tab next to the Fields tab

For more information check out our Using Metadata Fields article.

Step 7: Save the template

Once finished, click "Create".

You will now see that your document has been processed.

Step 8: Check results

Make sure that all the data was captured correctly.

In this screen you see:

  • At the top left, metadata info about the document, including the template that was used

  • At the top right, the action buttons (hover them for more information)

  • On the left, the document content

  • On the right, the parsed data extracted. You can switch between the table view and JSON view according to your preference

If everything looks correct, congratulations! You have parsed your first document!

Now send more documents and verify that your data is correctly extracted. Create new templates as necessary.

FAQ - Frequently Asked Questions

How can I split a multipage PDFs into several documents?

You can setup your mailbox to split PDFs every X pages on the Mailbox Settings:

  • Open you mailbox

  • Click on Settings on the left menu

  • Click on the Processing tab

  • Check the "Split PDFs into individual documents" box

  • Enter the number of pages you want Parseur to split your PDFs on

  • Click save

How does Parseur prioritises templates?

Templates are prioritised following the same usual rules. Check out the following article to understand how Parseur picks a template.

I have another question

Please contact use on the chat. Our OCR template is still in beta and we'd love to get your feedback and improve this feature.

What's Next?

Did this answer your question?