What are field constraints?

For the most advanced use cases, Parseur let's you specify field constraints on your fields. As their name suggests, field constraints are designed to constrain a particular field to follow a certain pattern. Constraints are a way to be more restrictive on the content of a field. Field constraints help Parseur pick the right template when several templates match the same document.

Example: let's say you have created a template that matches pickup orders made at your restaurant. Unfortunately that template also matches delivery orders at the restaurant because their layout is similar. You don't want that to happen because you need to extract other types of information for delivery orders. In this case, you can add an Exact constraint on the field so that only fields with a pickup value will match.

When a new document comes in and Parseur finds a matching template, Parseur will also check that any field constraint set will match as well. If a constraint fails, Parseur will consider that the full template doesn't match.

Note: Constraints are only looked at once the field has been found in the document. To find a field in a document, Parseur looks for delimiters before and after each field. Check the Understand how Parseur works article for more information about this.

How to set field constraints in a template?

To add a constraint to a field:

  • Open the template editor

  • If you don't have created the field already, go ahead and create it.

  • Click on the Edit icon button next to the field

  • Open the Advanced Options panel

  • Set the field constraint to "Exact match"

  • Save the field

  • Update the template

What field constraints are available?

Parseur gives you several predefined constraints as well as an option set your own using regular expressions:

  • Auto-detect: Parseur will set the default constraint based on the initial field value of the document used as a base for the template. If the initial value contains some HTML, it will set the "Text or HTML" constraint. If the initial value contains only Text and no HTML, it will set the "Text only" constraint. This is the default mode.

  • Text or HTML: Parseur will accept any value in this field including HTML. Use this constraint if the field in the base document contains only Text but you know that there can be cases where that field contains HTML also (a typical example is a "comment" field where the base document contained only a single line of text but you know there be documents with comments spanning multiple lines, separated by the <br> HTML attributes)

  • Text only (no HTML): As the same suggests, template will fail to match if the value of a Text-only field contains some HTML.

  • Exact match: Forces the field to match the value found in the base document used to create the template.

  • Custom: allows you to specify you own constraint using regular expressions

If you set and Exact match or Custom constraint to field, Parseur will display a small warning exclamation mark icon to remind you. Hover the icon to see the details.

How to use regular expressions for constraints?

If none of the default constraints work for you, you can specify your own using regular expressions. A regular expression (shortened as regex or regexp) is a sequence of characters that define a search pattern.

Regular expressions constraints are quite advanced and you shouldn't need to use them in most cases.

Below a quick cheat sheet about regular expressions. Check out the following article for an introduction to Python regular expressions and available pattern options.

  • . matches any character

  • + matches preceding expressions 1 or more times and is greedy (i.e. tries to capture as many characters as possible until it fails

  • +? is the same as + except that it is non greedy (i.e. stops as soon as the following expression is found, in the case of Parseur that is the closing delimiter of a field)

  • * / *? are the same as + / +? except that they match 0 or more times

  • ? matched the previous expression 0 or 1 time.

  • \ escapes special characters (permitting you to match characters like *, ?, and so forth), or signals a special sequence, see below

  • \d matches a decimal digit (between 0 and 9)

  • \w matches a word (a word is a sequence of characters, including digits and _ but without spaces)

  • \s matches a space character, including new lines

  • [ ] is used to indicate a set of characters. Examples: [amk] will match 'a', 'm', or 'k', [a-z] will match any lowercase ASCII letter, [^5] will match any character except '5'

  • A|B creates a regular expression that will match either A or B

  • (?i)order(?-i) makes the constraint case-insensitive for anything between (?i) and (?-i). For example here, it will match "order", "Order", "ORDER", "oRdEr" etc.

Did this answer your question?