Note: Field constraints are currently exclusive to Text templates and do not apply to OCR templates (for PDFs) or the AI engine. If you require this feature for your OCR templates or AI engine, please inform us via the chat, and we will explore how we can assist you.
What are field constraints?
Field constraints in Parseur are used for more precise data extraction when using Text templates, especially in complex scenarios. They allow you to restrict the data a field can contain by enforcing a specific pattern or criteria. This helps ensure that Parseur selects the correct template when multiple templates could match a single document.
For example, if you have templates for both pickup and delivery orders at your restaurant, but they are being incorrectly matched due to similar layouts, you can add a field constraint. By setting an Exact
constraint for the pickup orders template to only match fields containing the word "pickup," you prevent incorrect matches with delivery orders.
When a document is processed, Parseur not only checks for a matching template but also verifies that any set field constraints are met. If a field does not meet its constraint, Parseur will discard the template as a match.
Note: Constraints are only looked at once the field has been found in the document. To find a field in a text document, Parseur looks for delimiters before and after each field. Check out the Understand how the text parsing engine works article for more information.
How to set field constraints in a template?
To add a constraint to a field, follow these steps:
Open the template editor.
If the field hasn't been created yet, create it now.
Click on the Edit icon (usually represented by a pencil) next to the field.
Open the Advanced Options panel.
Set the field constraint to "Exact match".
Save the field.
Update the template to apply the changes.
What field constraints are available?
Parseur provides various predefined constraints, along with the option to create custom constraints using regular expressions, to fine-tune how data should be captured from your documents:
Auto-detect: This is the default setting. Parseur will automatically choose a constraint based on the initial field value from the document used to create the template. If the initial value contains HTML, it sets the "Text or HTML" constraint. If it contains only text without HTML, it sets the "Text only" constraint.
Text or HTML: This allows any value in the field, including HTML. Opt for this constraint if the base document's field contains text but might include HTML in other instances (e.g., a "comment" field with simple text that could expand into multiple lines with HTML breaks).
Text only (no HTML): The template will not match if the field, expected to contain only text, includes HTML. This ensures that the extracted data is free of HTML tags.
Exact match: This constraint requires the field to exactly match the value from the base document. It's useful for fields that should remain consistent across documents.
Custom: Enables the definition of a unique constraint using regular expressions. This is useful for highly specific or complex data patterns not covered by other constraints.
If you apply an "Exact match" or "Custom" constraint to a field in Parseur, the platform will display a small warning exclamation mark icon next to the field as a reminder of the constraint. You can hover over this icon at any time to view the details of the constraint applied. This feature helps you remember that there are specific conditions affecting how data is captured for that field.
How to use regular expressions for constraints?
Using regular expressions for constraints allows you to set highly specific patterns for the data fields in Parseur. Although regular expressions (regex) are advanced and not typically required for most use cases, they offer powerful customization for data extraction when default constraints are insufficient.
Below is a basic guide to get you started. Check out the following article for an introduction to Python regular expressions and available pattern options.
.
matches any character+
matches preceding expressions 1 or more times and are greedy (i.e. tries to capture as many characters as possible until it fails).+?
is the same as+
except that it is non greedy (i.e. stops as soon as the following expression is found, in the case of Parseur that is the closing delimiter of a field)*
/*?
are the same as+
/+?
except that they match 0 or more times?
matched the previous expression 0 or 1 time.\
escapes special characters (permitting you to match characters like*
,?
, and so forth), or signals a special sequence, see below\d
matches a decimal digit (between 0 and 9)\w
matches a word (a word is a sequence of characters, including digits and_
but without spaces)\s
matches a space character, including new lines[ ]
is used to indicate a set of characters. Examples:[amk]
will match'a'
,'m'
, or'k'
,[a-z]
will match any lowercase ASCII letter,[^5]
will match any character except'5'
A|B
creates a regular expression that will match eitherA
orB
(?i)order(?-i)
makes the constraint case-insensitive for anything between(?i)
and(?-i)
. For example, here, it will match "order", "Order", "ORDER", "oRdEr" etc.