After reading this article you will have a better understanding how Parseur text parsing engine works and how to create reliable parsing templates.
Note: This article applies to text templates. Checkout our OCR template article if you want to know more about using our OCR parsing engine.
A simple example
As a pet owner forum manager, you receive an email each time a user subscribes to your newsletter. In our example, emails are automatically generated from a form on your WordPress blog and the first email looks like this:
"Hi, my name is Anna, and I have 3 dogs."
That email is directly sent to your newly created Parseur mailbox for it to be automatically processed. It's just an example and the way your emails arrive in the mailbox may vary.
Your initial template
You create a template based on the first email your sent:
Let's create some fields and capture some data:
So you see the name of the member (Anna) and her number of pets (3) have been recorded.
But how did that happen?
Understanding what happens when you create a template: introducing delimiters
Well, when you created the template, Parseur recorded your selection by saving the surrounding delimiters. After all, one can't rely on the field content itself since it's going to be different from an email to the next, right?
So here's what Parseur recorded:
So here, "is" and ", and" are the starting and ending delimiters for the field "Name" and "have" and "dogs" are so for the field "Number of pets".
Sending a second email, let the Parseuring begin!
Now, let's see what happens when we receive another, similar, email that reads:
"Hi, my name is Bob, and I have 12 dogs."
So far so good, we see that this email is automatically processed since all delimiters were found and the data is correctly extracted, like that:
Sending a third email: the cold reality of data extraction
But, as often, reality is a harsh mistress, and we receive the following email as Charlotte joins the forum:
"Hi, my name is Charlotte, and I have 2 cats."
Now we have a little problem, we see that the "dog" delimiter can't be found in this email and so the parsing fails and Parseur asks you to create a new template.
Here is what it looks like in Parseur:
Troubleshooting the issue using the Template debugger
To get a better view of what happened, you can launch the template debugger, like this:
The template debugger helps you to see what part of the template is causing a mismatch. You can see the template debugger applied to our template trying to match our latest email:
A template needs to match 100% for Parseur to be able to use it.
The solution: update the template and make it more reliable
There is a simple fix to this common issue: add a new field to your template, to take the new changing data into account, just like this:
It will then automatically process your latest email with a new field "Kind of pets". Now your new template is more robust and flexible.
I hope this articles helped you understand how Parseur works.
So, it's important to keep in mind, while creating your templates in the template editor, that any word that can possible change should be selected and put into a field. That will save you even more time and will reduce the number of templates that you need.