In this article we'll see how to set up Parseur to extract tables from emails and documents and parse result into structured data.

Extracting tables from emails is very useful, for example, when you want to parse shopping orders coming in your mailbox. There is typically varying number of items in each email and you don't want to create a template for each different number of ordered items. For each ordered item, you want to extract several data points such as item name, quantity and price.

Note: this article assumes you're already familiar with Parseur basics. If not, click here to get started.

Prefer a video?

A typical example: an amazon order

For the purpose of this article, let's assume we want to extract details about book orders on Amazon. Whenever we order on Amazon, Amazon sends a confirmation email containing order detail. An Amazon confirmation email looks like this:

We want to create a template that can extract order details, including for each item the book title and the book price. We want to create a single template that will extract all data, irrespective of the number of books ordered.

Here is what you get after processing this email in Parseur:

  • Reference: 123-4567890-1234567 
  • Date: Sunday, October 7 
  • Customer Address: Jane Doe FAIRVIEW HEIGHTS VAN HOUTEN HILL LONDON, ZZ9 ABC United Kingdom 
  • Items[0].Title: Fear and Loathing in Las Vegas 
  • Items[0].Price: £12.99 
  • Items[1].Title: The Tempest 
  • Items[1].Price: £9.99 
  • Items[2].Title: The Hitchhiker's Guide to the Galaxy 
  • Items[2].Price: £14.99 Total £37.97 

And here is the same result, viewed in JSON format:

Step 1: Create a table field

  • In the template editor, select the whole table you want to capture.
  • Click New Table Field to create a new field for that selection.

The table option panel opens on the right-hand side and the table gets highlighted in green.

Note: for best results, make sure the document you use for the template contains at least 2 rows (i.e. in our example, 2 books).

Step 2: Set table name and options

Use the Table option panel on the right-hand side to customize your field:

  • Name: Enter a name for the table field
  • Output format: leave as is ("Table")
  • Input format: In most cases, leave as is ("Vertical HTML Table"). Parseur will assume your table rows are laid out vertically, top to bottom.If your table rows are laid out horizontally from left to right, select "Horizontal HTML Table". If your table is Text only, choose "Vertical Text Table" (or "Horizontal Text Table" respectively).
  • Table Header: If the first row of your table selection contains the column names and Parseur was able to correctly identify columns in the result preview, tick the box "Table headers included in selection". Otherwise, go to Step 3 below to teach Parseur how to identify your columns.

Step 3: Create table columns

At the bottom of your document, Parseur shows you a live preview of the table parsing.

You have several ways to customize Parseur table parsing to your needs.

Option 1: Use default parsing

If the table preview shows the data as you want it, great!

To name the columns:

  • Click on the column name
  • Select the column name from the list or type a new one

Option 2: Customize table parsing and columns

If Parseur doesn't get the table parsed the way you want it, you can improve the parsing result by telling Parseur which data you want to extract.

In this example, we are going to create column fields to capture the information we want to extract. And then, we're going to point those columns across multiple rows until the preview result is what we want.

To do this:

  1. Select a piece of text you want to extract on the first row of the table
  2. Click New Table Column
  3. Enter a name for that new column and hit Enter (or click the tick button)

Repeat steps 1. to 3. for each column field you want to extract.

Now we need to teach Parseur how to split rows:

  • Select similar fields on the following rows and assign it to the existing field.
  • Do this until the result preview suits what you need.

When the table preview at the bottom of the document shows the desired result you are done!

Advanced: Further customize table parsing with separators

If you need to do some advanced table parsing, you can set your own table row and cell separators.

Parseur uses regular expressions to split the table into rows and the rows into cells.

You will find the regular expressions generated by Parseur in the Advanced Options panel, under Row Separators and Cell Separators. Tick the Manually set separators box to customize the separators.

The separators are regular expressions that Parseur will use to slice the table into cells.

Examples of using separators:

  • To slice the table into rows based on the HTML </tr> attribute, enter </tr> in row separator
  • To slice the table into rows based on a new line, enter \n in row separator
  • To slice the rows into cells based on either </th> or </td>, enter </th>|</td> in cell separator (the pipe character "|" means "or" in regexps)
  • To slice the rows into cells for every 2 or more spaces (common for text-based tables), enter \s{2,} in cell separator (\s means any space character and {2,} means previous pattern needs to be matched 2 or more times)

You can get very far with regular expressions (including using lookaheads and lookbehinds). Check out the following link for the full reference: https://docs.python.org/3.6/library/re.html

Step 4 (optional): Filter table rows

Parseur can filter out unwanted rows. For example you may want to keep rows that only match a specific value in the certain column. You can do that by setting constraints on column fields.

To filter table rows:

  1. Click on the edit button of the column field you want to set a constraint on
  2. Open the Advanced options panel
  3. In the constraint option, enter what the column needs to match in order to be kept. The constraint input accepts regular expressions.
  4. To setup multi-column constraint, save this field and edit other columns constraints.

Examples:

  • To filter on rows with John Smith in the name column, enter John Smith in the constraint input of the name column.
  • To filter on rows where column ID ends with ABC, enter ABC$ in the constraint input of the ID column

Note: By default, if the filter returns no row, Parseur will set the document to "New Template Needed". If you want to allow tables with empty rows, check the Allow empty tables box when editing the table field.

Step 5: Save your changes

Click the Save Field button when you're done editing the table field.

Then, continue editing the template adding new fields and finally click Create Template when completed.

Step 6: Export your table data

You have several strategies available when it comes to exporting your table data. This section lists the different strategies depending on the type of export.

Download and Google Sheet

There are 2 options to export your table data:

  • The default export keeps the "1 document per sheet row" convention: table field rows will be added as columns.
  • The table export uses the "1 table field item per sheet row" convention: each table field item will be on its own sheet row and other fields extracted from the document will be repeated across rows.

Zapier

Send the table directly through Zapier, each row in this table will turn into a row (or item) in the target application (for example, a row in Google Spreadsheet). Non-table fields in the document are repeated for each row. This is the trigger you will most likely need.

There are other possible triggers you can use for tables with Zapier. Check out this article for more information about using Zapier.

Webhooks

There are 3 options to export your table fields via webhooks:

  • Document processed: this is the default option. JSON payload is a deep key/value object with table fields values sent as an array (equivalent to the JSON screen capture at the beginning of this article)
  • Document processed (flattened): JSON payload is a flat key/value object with keys named as in the default table export (e.g. "Items[1].Title")
  • Table item processed: JSON payload is an array of table item objects where other fields are repeated across table items (similar to the table export for Google Sheets).

Check out this article for more information about using Webhooks.

Microsoft Flow

Check out this article for more information about using MS Flow.

Did this answer your question?