Using the Post Processing module of your Parseur mailbox, you can perform advanced manipulations on your parsed data such as field merge, split, date and time calculations, regular expression match and even add your own business logic!

Use the Post Processing module if you need to perform data operations not available from using standard Field formats.

Disclaimers

Post Processing lets you write Python code. You should be comfortable with basic programming and ideally know a bit about the Python programming language to use it
This module is only available on Pro plans and above.

Access the post processing module in Parseur

To access the post processing module:

open a mailbox
click on "Post Process" in the left menu.

The post processing screen looks like this:

The Post Processing screen is split into several sections:

At the top, use the Previous and Next buttons to change the base parsed data you want to use to test your code on
On the left, you have the original parsed data before the transformation
In the middle is where you write your post processing Python code.
On the right is what the data looks like after going through post processing

As you type your code, the results on the right will automatically update.

Create your first post processing code

Let's create your first code. This will add a custom field to the parsed result. This is something you can do already using Static Fields, but it will allow you to understand how Post Processing works.

Select the commented out examples and remove them
Type the following in the editor (you can copy and paste it too):

data["my_first_field"]  = "Hello World!"

Wait for the results on the right to refresh
Check at the bottom of the transform data: you now have a new my_first_field whose content is "Hello World!"

Congratulations, you have created your first field using post processing!

Now, save this code by clicking on the Save button (or with Ctrl+S)
Go to the document queue and reprocess a document: the new field is added to the result of that document
Open the logs: you now have 2 processed entries: the first one with the original data extracted by the template. And the second one with the data after post processing.

Writing post processing code

Programming in Python

The Post Processing module lets you write Python 3.6 code. Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace.

If you're new to Python, here are some useful links to get you started:

Examples in the section below will give you some ready-made snippets to copy & paste and adjust to your use case.

Post processing in action: manipulating the parsed `data` variable

When you write code in the post processing module, you have access to the data variable that lists all your parsed data from the document you are post processing.

The data variable is a Python dict.:

each parsed field is an item of the dictionary.
modify the data in place to transform the fields.
alternatively you can save your transformed data in another variable, and return that variable instead at the end of the processing code

You also have access to the extra variable.

the extra variable is a dict that contains all metadata fields, even those not enabled in your mailbox settings
use this variable for example if you want to extract data from the document content by accessing extra["HtmlDocument"] or extra["TextDocument"]

Available functions

Here is the list of all accessible built-in functions:

abs: Returns the absolute value of a number
all: Returns True if all parameters evaluate to True, False otherwise
any: Returns True if any parameter evaluates to True, False if none does
bool: Converts the given parameter to a boolean
bytes: Converts the given str parameter to bytes
callable: Returns True if the given parameter can be called. False otherwise
chr: Returns the character that has the given Unicode code point
complex: Returns a complex number, given its real and imaginary parts
dict: Builds a dict from named parameters or a collection of couples
divmod: Returns the quotient and remainder of the given numbers' division
enumerate: Count from 0, for each element in a given collection
format_address(address_str) is a convenience method that takes an address formatted as a string and returns a geolocated address object. See our address format article for more information.
format_datetime(date_str, format) is a convenience method that takes a date formatted as a string and formats it according to python datetime formats
filter: Keep or remove elements from a collection, according to a function
float: Converts the given parameter to an inaccurate floating-point number
hasattr: Returns True if an object has a given attribute, False otherwise
hash: Returns a number that uniquely identifies an object
hex: Converts an integer into hexadecimal as a string starting with 0x
id: Returns an integer that is the identity of an object
int: Converts the given parameter to an integer
isinstance: True if 1st parameter is of the type given as the 2nd parameter
issubclass: True if 1st parameter is a subclass of a class given as the 2nd parameter
len: Returns the length of the collection given, as an integer
list: Builds a list from given parameters
max: Returns the largest element from a given collection
map: Apply a given function to each element of a given collection
min: Returns the smallest element from a given collection
next: Returns the next element from a given iterator
oct: Converts an integer into an octal number as a string starting with 0x
ord: Returns an integer representing the Unicode code point of a given character
pow: Returns the first argument raised to the second argument's power
range: Build a sequence of numbers
repr: Returns a string that is a representation of an object
reversed: Returns a copy of a given collection, in reverse
round: Given a number, returns the closest rounded integer
slice: Returns a slice object, that is a piece of a collection
sorted: Returns a sorted copy of a given collection
str: Converts the given object to a string
sum: Returns a sum of all values in a given collection
tuple: Builds a tuple from given parameters
zip: Aggregates elements from each of the collections given as parameters

Available modules

In addition to the built-in functions above, and True, False and None, the following modules and functions are available:

datetime module for date, time and datetime manipulations
dateutil module for advanced date manipulations, including parsing strings to dates, calculating deltas and working with timezones
decimal module for manipulating floating point numbers where precision is important, for example when working with prices
re module for working with regular expressions
PostProcessError custom exception to raise error messages that will appear in your logs

Limitations of the Post Processing module

The post processing module should give you everything you need to perform the most advanced data manipulations.

Note however the following limitations, especially if you are already a Python expert:

Only a subset of Python standard built-ins is included. Trying to use a non-included built-in will result in a NameError exception. You cannot use the import keyword to import additional modules. Trying to perform an import will result in a ImportError exception
You cannot use the format() method on strings. Use the f"..." notation instead.
You cannot access internal object attributes (starting with a _). Trying to access an internal attribute will raise an exception

Stopping the execution and preventing exports

If you want to stop the execution of a particular document during post processing, simply return None. This will mark the document as Skipped (post process) and won't trigger any export like Zaps or webhooks.

Handling errors and exceptions

Any exception raised during post processing will stop the post processing and mark the document as Post process failed.

When that happens, click on the magnifying glass icon to access the logs and get more details about the error.

You can log your own error message in the log by raising the PostProcessError exception. Example: raise PostProcessError("This user is not allowed").

Any exception raised while writing your code in the Post Processing module will prevent you from saving your code (except when raising PostProcessError exceptions).

Useful keyboard shortcuts

The following shortcuts are useful to make writing your code more efficient and pleasant.

Ctrl+S to save
Ctrl+F to search
Ctrl+/ to comment out the current line or a block of code
Ctrl+D to delete the current line or block of code
Ctrl+L to go to line
Tab to indent
Shift+Tab to outdent

Note: replace Ctrl by Cmd if using macOS.

Examples of the most common post processing use cases

Merge two or more fields

Option 1: for simple field merging use the + notation

data["full_name"] = data["first_name"] + ' ' + data["last_name"]

Option 2: for more complex manipulations, use the f-string notation

data["description"] = f"Notes: status {data['status'].upper()}"

Split a field into sub fields

Use the split() method to split a field into sub fields.

Let's say the field vehicle contains Kia, Stinger, GT (red Leather) and you want to split this field into make, model and variant. You can use the following code:

make, model, variant = data["vehicle"].split(", ")
data["make"] = make        # will store Kia
data["model"] = model      # will store Stinger
data["variant"] = variant  # will store GT (red Leather)

Optionally, you can limit the number of splits:

make, variant = data["vehicle"].split(", ", 1)
data["make"] = make        # will store Kia
data["variant"] = variant  # will store Stinger, GT (red Leather)

Work with optional fields

Using the data["..."] notation will raise a KeyError error if the field is not present in the parsed data. This can be a problem if you have several templates, some of them with optional fields.

When this happens, you can use the data.get() method instead.

# the following will raise an error 
# if the field named "option" is not present
data["description"] = data["option"]

# Use the get() method instead
data["description"] = data.get("option")

# You can also supply a default value
data["description"] = data.get("option", "No options.")

Iterate on a table field

Use a for-loop with enumerate() to walk through a table and return the current index and value.

Example: let's say you have an items table field with quantity, description, unit_price columns and you want to add a new price value for each item as well as a total_price field.

data["grand_total"] = 0
for index, item in enumerate(data["items"]):
    price = item["quantity"] * item["unit_price"]
    data["items"][index]["price"] = price
    data["grand_total"] = data["grand_total"] + price

Note: you need to make sure quantity and unit_price are numbers and not string. You can use the Number format for that or perform the conversion in Python directly using int() or decimal.Decimal().

Work with dates and times

For most use cases, you can use the date and time field formats. However, sometimes, you may need to compute a date from different fields. In those situations, you can use the datetime, dateutil and format_datetime() method to manipulate dates.

Example 1: Parse date string and convert format

Let's say you have a start_date formatted as Sunday ‌1‌ ‌N‌o‌v‌e‌m‌b‌e‌r‌ ‌2‌0‌2‌0 field and another start_time field formatted as 1PM and you want to create a new start_datetime field formatted as YYYY-MM-DD HH:MM.

Use the format_datetime(date_str, format) convenience method to parse and format a date.

datetime_str = data["start_date"] + " " + data["start_time"]

data["start_datetime"] = format_datetime(datetime_str, "%Y-%m-%d %H:%M")

# start_datetime is now "2020-11-01 13:00"

For a list of available python date and time formats, check out the date and time field format page (at the bottom).

Example 2: Compute dates

Let's say you want to store the date for the next day, in a new field named tomorrow.

Use dateutil.relativedelta.relativedelta to compute a one day, positive delta, then add it to the current date and time provided by datetime.datetime.now().

data["tomorrow"] = (
    datetime.datetime.now() 
    + dateutil.relativedelta.relativedelta(days=1)
)

Work with regular expressions

Python regular expressions are (complex but) powerful methods to search for patterns in fields and perform splits and replacements.

To learn more about regexps, check out this introductory article about regular expressions in python.

Example 1: price to string (simple version)

This is a simple version that removes all text, spaces and $ symbol from a price field.

data["price"] = re.compile(r'[a-zA-Z $]+').sub('', data["price"])

Example 2: price to decimal (advanced version)

This is a full working example of how to use regular expressions to convert a price string including currency or text and potentially thousands separator into a decimal, including error management. This code creates a reusable function that can be called to convert fields at different locations in your post processing code.

REMOVE_FROM_DECIMAL = re.compile(r"[a-zA-Z\ \$£€\!\?]+")

def str_to_decimal(value):
    clean_value = REMOVE_FROM_DECIMAL.sub("", value)
    if clean_value == "":
        return None   
    if "," in clean_value:
        if re.search(r",\d{3}", clean_value):
            # Comma as thousand separator
            clean_value = clean_value.replace(",", "")
        else:
            # Comma as decimal separator
            clean_value = clean_value.replace(",", ".")
    try:
        return decimal.Decimal(clean_value)
    except decimal.InvalidOperation: 
        return None

Example 3: extract a value following a pattern

Use re.search() along with re.group() and a regular expression. with groups identified in parenthesis to extract the pattern.

# let assume data["room_details"] = "4 rooms, 3 bedrooms, 84 sqm"
# and we want to extract the area

area_match = re.search("(\d+) sqm", data.get("room_details", ""))
if area_match:
    data["area"] = surface_match.group(1)

# => "area" is 84

Add business logic

Write conditional logic statements to trigger different behaviors depending on incoming parsed data.

Example: let's say we are a food delivery company using Parseur to manage incoming orders. Restaurants forward all orders to us but we are only interested in parsing delivery orders, not pickup orders. Also, we want to make sure to log an error if we can't determine the order type.

The type of order is extracted into the order_type field by our templates.

if "order_type" not in data:
    # something is wrong, the intern must have again forgotten
    # to add the order_type field when creating this template.
    # Let's stop here and log an error.
    raise PostProcessError(f"Error: order_type not found in parsed data. Please check template {extra['Template']}.")

if data["order_type"].lower() == "pickup":
    # all order types including pickup, Pickup or PICKUP are skipped
    return None

# From here, we know the order valid and is a delivery.
# Write the rest of the post processing code here

Most common error messages and solutions

Python error messages are usually quite expressive for the seasoned programmer. When you get an error message, you will get the line number where the error was triggered, making it easier to spot the problem.

SyntaxError: invalid syntax at statement: [...]

This means the code you wrote doesn't follow the Python syntax. It can come for various reasons:

missing parenthesis, bracket or quote
assignment issues
misspelling keywords

If you can't find the reason for the syntax error, check out the following article on fixing most common python syntax errors.

IndentationError: unexpected indent at statement: [...]

An indent is a specific number of spaces or tabs denoting that a line of code is part of a particular code block.

This error means one of your blocks of code is wrongly indented. Python is a particular language in the sense that white spaces are significant. It is important that every statement from your main code block doesn't start with a space and every statement in sub blocks (for example in an if block) are indented with the same number of spaces.

To avoid this error, use the Tab key to indent your code consistently and the Shift+Tab key to outdent it.

KeyError: 'key'

This can happen when:

You try to access a key that doesn't exist in a dict (for example, a field name that doesn't exist in data - remember that Python is case-sensitive)
You try to access an element in an array at an index that doesn't exist (for example you try to access items[10] but the items array only has 4 elements)

There are several ways to fix this error

if you are dealing with an optional field, use the data.get("field_name") method instead of data["field_name"]
If you want to test if a field is present before working on it, you can use:

if "field_name" in data:
    # ... do something with data["field_name"]

If you are working with array indexes, you can check the index is valid with:

if index < len(my_array):
    # ... do something with my_array[index]

NameError: name '...' is not defined

This means that the builtin, method or module you are trying to use is not available. Check out the list of available modules at the beginning of this page.

Invalid return, Error: <error_message>

This means the post process data you are returning is invalid. For a return to be valid, it must be serializable in JSON.

For example, you cannot return a Python datetime object, because JSON doesn't have a datetime format. You need to convert it to a string first, for example using the format_datetime() method.

Extract metadata from emails and documents with Metadata fields

Use field formats to normalize data

Export data to Google Sheets

Use Parseur document parsing API

Customize parsed data structure