All Collections
Extracting data
Advanced topics
Working with email attachments
Working with email attachments
How to extract text from documents attached to emails
Updated over a week ago

Parseur can extract text from most text-based documents attached to emails such as PDFs, CSVs and Microsoft Word docx. See the complete list of supported documents.

Note: The rest of this article assumes you are familiar with Parseur basics. Click here to get started if not.

How to extract text and parse email attachments?

When choosing the type of mailbox, make sure to choose emails and attachments as shown below:

When sending an email with attachments, Parseur will create a separate document for each attachment. You can then create a template for those attachments like you would do for any document in Parseur.

How to access the attachment in its original (binary) format?

The Attachments Metadata Field allows downloading attachments in their original format.

The Attachments field is not enabled by default. Enable the Attachments field in your Parseur mailbox options, under the Fields > Metadata Fields section:

Check attachments metadata field

Note: To add the Attachments field to already processed documents, reprocess those documents after enabling the field.

From now on, all documents that you process will contain a new Attachments entry. Attachments entry is a list of objects, each attachment object has 4 properties:

  • name: The name of the file

  • URL: A public link to download the file content (A warning: anyone that has the link can access the file directly, without needing a password)

  • content_type: The type of content in MIME format. Parseur supports all content types

  • size: The size of the attachment, in bytes. Note that the total cumulative size for all Inbound attachment files in an email may not exceed 35 MB (your email provider probably limits the attachments size too).

  • content_id: the unique string of letters and digits identifying an attachment. It can appear in the email body to refer to this attachment, in order to embed an attachment inside of an email body, typically an image.

  • is_inline: A boolean, true or false. True if the image actually appears in the email body (embedded image). False otherwise.

From a technical standpoint, the list of attachments is represented as a JSON array. If you want to manage attachments with a custom webhook, the result looks like this:

{
  Attachments: [
    {
      name: "report.pdf",
      url: "https://api.parseur.com/attachment/.../report.pdf",
      content_type: "application/pdf",
      size: 250000,
content_id: "ii_l10j8nkc1",
is_inline: false
    },
    {
      name: "presentation.ppt",
      url: "https://api.parseur.com/attachment/.../presentation.ppt",
      content_type: "application/vnd.ms-powerpoint",
      size: 123000,
content_id: "f_l10j861y0",
is_inline: false
    }
  ]
}

How to upload original attachments to your cloud storage or app?

Once Attachments metadata field is enabled (see the previous section), you can use the URL with any Zapier connector that supports files (such as Google Drive, Dropbox, etc.). 

To do so, map the attachment URL with the file field in your Zap.

Zapier will download the attachment and upload it to your favorite app.

How to keep the relationship between emails and attachments?

Sometimes you need to extract text from both the email and its attachments and you want to make a link between those two sets of parsed data.

While Parseur processes every email and attachment document independently, it remembers the email every attachment belongs to and you can expose the link using Parseur DocumentID and ParentID metadata fields.

An attachment ParentID will be the same as the email DocumentID it was attached to.

To enable DocumentID and ParentID those metadata fields:

  • Open your Parseur mailbox

  • Click on the Fields section

  • Check the DocumentID and ParentID fields in the Metadata panel

Check out the following article to learn more about using Metadata Fields in Parseur.

How to disable attachment parsing?

By default, when you create a new mailbox, Parseur will also parse every email attachment.

If you would like to disable attachment parsing, go to your mailbox settings and uncheck the attachment parsing box.

Did this answer your question?