Parseur can extract text from most text-based documents such as PDFs, CSVs and Microsoft Word docx.
Note: The rest of this article assumes you are familiar with Parseur basics. Click here to get started if not.
How to extract text and parse email attachments?
When choosing the type of mailbox, make sure to choose emails and attachments as shown below:
When sending an email with attachments, Parseur will create a separate document for each attachment. You can then create a template for those attachments, like you would do for any document in Parseur.
What are the best practices to extract text from PDFs?
There are a few things you should know about creating template for PDFs. Check out this article for more information.
How to parse and consolidate CSV and Excel attachments?
Parseur can automatically combine CSV and Excel files sent by email without even creating a template. Parseur will combine the files based on their column headers.
All you have to do is send your spreadsheets as email attachments to your Parseur mailbox. Parseur will store the parsed result in the "Sheet" table field.
As the result is in a table field, make sure to use the table field download option in the Export section or the "New Table Processed" trigger in Zapier. Check out at the end of this article for more information about exporting table field data.
Note: If you don't to want to use Parseur default parsing method for CSVs, you can create your own template and it will take priority over the default parsing.
How to access the attachment in its original (binary) format?
The Attachments Extra Field allows to download attachments in their original format.
The Attachments field is not enabled by default. Enable the Attachments field in your Parseur mailbox options, under the Fields > Extra Fields section:
Note: To add the Attachments field to already processed documents, reprocess those documents after enabling the field.
From now on, all documents that you process will contain a new Attachments entry. Attachments entry is a list of objects, each attachment object has 4 properties:
- name: The name of the file
- url: A public link to download the file content (A warning: anyone that has the link can access the file directly, without needing a password)
- content_type: The type of content in MIME format. Parseur supports all content types
- size: The size of the attachment, in bytes. Parseur can store files up to 35MB in size. Note that your email provider probably limits the attachments size too.
From a technical standpoint, the list of attachments is represented as a JSON array. If you want to manage attachments with a custom webhook, result looks like this:
How to upload original attachments to a your cloud storage or app?
Once Attachments extra field is enabled (see previous section), you can use the URL with any Zapier connector that supports files (such as Google Drive, Dropbox etc.).
To do so, map the attachment URL with the file field in your Zap.
Zapier will download the attachment and upload it into your favorite app.
How to keep the relationship between emails and attachments?
Sometimes you need to extract text from both the email and its attachments and you want to be able to make a link between those two sets of parsed data.
While Parseur processes every email and attachment document independently, it remembers the email every attachment belongs to and you can expose the link using Parseur DocumentID and ParentID Extra Fields.
An attachment ParentID will be the same as the email DocumentID it was attached to.
To enable DocumentID and ParentID those extra fields:
- Open your Parseur mailbox
- Click on the Fields section
- Scroll down to the Extra Fields panel
- Check the DocumentID and ParentID fields
Check out the following article to learn more about using Extra Fields in Parseur.
How to disable attachment parsing?
By default, when you create a new mailbox, Parseur will also parse every email attachments.
If you would like to disable attachment parsing, go to your mailbox settings and uncheck the attachment parsing box.
List of all document formats are supported in Parseur
Parseur can extract text from most attachments, as long as they are in a text format.
Here is the list of supported document formats that you can extract text from:
- abw: AbiWord Document
- csv: Comma Separated Value
- djvu: DjVu Document
- doc: Microsoft Word
- docm: Microsoft Office Open XML with Macros Enabled
- docx: Microsoft Office Open XML
- html: HTML Document
- htm: HTML Document
- lwp: Lotus Word Pro
- md: Markdown Documentation File
- odt: ODF Text Document
- pages: Pages Document
- pages.zip: Zipped Pages Document
- pdf: Portable Document File (text-based only, not scanned)
- rst: reStructuredText
- rtf: Rich Text Format
- sdw: StarWriter 5.0
- tex: LaTeX Source Document
- txt: Text Document
- wpd: WordPerfect Document
- wps: Microsoft Works Document
- xls: Microsoft Excel Document
- xlsx: Microsoft Excel Document Open XML
- xlsm: Microsoft Excel Document Open XML with Macros Enabled
- zabw: Compressed AbiWord Document