In this article, we are going to describe how to use Parseur to parse a webpage from a link in an email or document. The directions are a bit different depending on whether you're using the AI engine or the template parsing engine.

AI Engine (Recommended):

Step 1: Create a Parseur mailbox

If you haven't already done so, create a mailbox and upload your first document.

Check out this page if you're unsure about how to get started.

Step 2: Create a Linked Document field to capture the email link

Once your email is in Parseur, create a new Field under the Fields tab, using the Linked document Output format, with some instructions if necessary:

Once this is done, you can reprocess the document and the Linked Document field will pull the webpage across as a new document.

Step 3: Create fields for the fetched webpage

With the webpage pulled into the system as a new document, we can new create new fields, giving the AI some instructions if necessary to extract your data:

Step 4: Automate the process and profit

You can now set up automatic forwarding of your emails so that they are automatically sent to Parseur to have the webpage scraped and parsed by our system.

Template Engine:

Step 1: Create a Parseur mailbox

If you haven't already done so, create a mailbox and upload your first document.

Check out this page if you're unsure about how to get started.

Step 2: Create a template to capture the email link

Once your email is in Parseur, create your template like any other template, which is as easy as Point & Click. If necessary, refer to this page for more information about creating templates.

Here, the only information we are interested in is the link, so let's select it and create a new field.

Just so you know, if the link URL you want to capture is inside an HTML link, you can just switch to the Source (advanced) view to create your field and only select the URL piece inside the href attribute of the HTML link.

Now click on the edit button right of the field name, and change the format to "Linked Document"

Click Save field, then Create, to save the template.

Two things are going to happen:

Your email is going to be parsed and the link will be extracted
After a few seconds, you'll see the newly downloaded web page appear in the document queue.

Step 3: Create a template for the fetched web page

Now create a template for the web page by clicking on the + plus button.

Click Create and...

Step 4: Watch Parseur parse a web page and profit!

Creating the template will parse your document and extract the relevant data.

Now, every time you send a similar email with a link, the web page will be fetched and if it matches one of your existing templates, data from the web page will be parsed and extracted automatically.

Closing remarks

Parseur is not limited to extracting links from emails. Any field in a template with the format "Linked Document" will be used to download documents and extract data. That means that you can fetch web pages from email attachments as well as from other web pages!
Parseur charges you for the number of successfully processed documents, which means that fetching a web page from an email link and parsing that web page will count as 2 credits. If all you need from the original email is the link and nothing else, you can set that template as a Skip template: Skip templates don't consume credits and don't trigger an export, but it will still download the document behind the link.

Known limitations of the Linked Document feature:

Parseur cannot extract links from PDF documents; only HTML and plain text documents.
The document behind the URL needs to be publicly accessible without needing to enter a login and password to view it.
The webpage behind the URL cannot be a "Single Page Application" (i.e. where content is dynamically downloaded using Javascript after the page first loaded).
If you don't see the downloaded document in your document list, check the logs of the original document to get more information about why it couldn't be downloaded. Contact us if you have any questions.

I found an error in the document logs saying 'Recursive download counter exceeded a safe threshold'. What does this mean?

This means Parseur has reached the depth of 3 downloads from the original document and stopped there, to prevent an infinite loop of downloads. If this happens you will want to modify your field to look for a URL that does not link to itself.

Extract metadata from emails and documents with Metadata fields

Use field formats to normalize data

Use Parseur document parsing API

AI vs template parsing: pros and cons

Automate Exporting Parsed Data from Parseur

Download and parse a webpage from a link in a document

AI Engine (Recommended):

Step 1: Create a Parseur mailbox

Step 2: Create a Linked Document field to capture the email link

Step 3: Create fields for the fetched webpage

Step 4: Automate the process and profit

Template Engine:

Step 1: Create a Parseur mailbox

Step 2: Create a template to capture the email link

Step 3: Create a template for the fetched web page

Step 4: Watch Parseur parse a web page and profit!

Closing remarks

I found an error in the document logs saying 'Recursive download counter exceeded a safe threshold'. What does this mean?