All Collections
Extracting data
Advanced topics
Working with email attachments
Working with email attachments

How to extract text from documents attached to emails

Updated over a week ago

Parseur can extract text from most text-based documents attached to emails, such as PDFs, CSVs and Microsoft Word docx. See the complete list of supported documents.

Note: The rest of this article assumes you are familiar with Parseur basics. If not, click here to get started.

How to extract text and parse email attachments?

When sending an email with attachments, Parseur will create a separate document for each supported attachment format by default. You can then create a template for those attachments like you would for any document in Parseur.

How to access the attachment in its original (binary) format?

The Attachments Metadata Field allows downloading attachments in their original format.

The Attachments field is not enabled by default. Enable the Attachments field in your Parseur mailbox options, under the Fields > Metadata Fields section:

Check attachments metadata field

Note: To add the Attachments field to already processed documents, reprocess those documents after enabling the field.

From now on, all documents that you process will contain a new Attachments entry. The Attachments field is a list of objects, and each attachment object has 6 properties:

  • name: The name of the file

  • URL: A public link to download the file content (A warning: anyone that has the link can access the file directly, without needing a password)

  • content_type: The type of content in MIME format. Parseur supports all content types

  • size: The size of the attachment, in bytes. Note that the total cumulative size for all Inbound attachment files in an email may not exceed 35 MB (your email provider probably limits the attachment size too).

  • content_id: the unique string of letters and digits identifying an attachment. It can appear in the email body to refer to this attachment, in order to embed an attachment inside of an email body, typically an image.

  • is_inline: A boolean, true or false. True if the image actually appears in the email body (embedded image). False otherwise.

From a technical standpoint, the list of attachments is represented as a JSON array. If you want to manage attachments with a custom webhook, the result looks like this:

{
  Attachments: [
    {
      name: "report.pdf",
      url: "https://api.parseur.com/attachment/.../report.pdf",
      content_type: "application/pdf",
      size: 250000,
content_id: "ii_l10j8nkc1",
is_inline: false
    },
    {
      name: "presentation.ppt",
      url: "https://api.parseur.com/attachment/.../presentation.ppt",
      content_type: "application/vnd.ms-powerpoint",
      size: 123000,
content_id: "f_l10j861y0",
is_inline: false
    }
  ]
}

How to upload original attachments to your cloud storage or app?

Once the Attachments metadata field is enabled (see the previous section), you can use the URL with any Zapier connector that supports files (such as Google Drive, Dropbox, etc.). 

To do so, map the attachment URL to the file field in your Zap.

Zapier will download the attachment and upload it to your favorite app.

How to combine the parsed result of an email with the one of its attachments?

Sometimes you need to extract text from both the email and its attachments, and you want to make a link between those two sets of parsed data.

While Parseur processes every email and attachment document independently, it remembers the email every attachment belongs to, and you can expose the link using the Parseur DocumentID and ParentID metadata fields.

An attachment's ParentID will be the same as the email DocumentID it was attached to.

To enable DocumentID and ParentID, those metadata fields:

  • Open your Parseur mailbox

  • Click on the Fields section

  • Check the DocumentID and ParentID fields in the Metadata panel

Check out the following article to learn more about using Metadata Fields in Parseur.

How to disable attachment parsing or enable attachment-only parsing?

By default, when you create a new mailbox, Parseur will both parse every email and every attachment and create a separate document for each.

You can tell Parseur to only process emails or only process attachments in your mailbox settings, under the Processing tab. Check out the Mailbox options and settings article for more information.

Did this answer your question?