All Collections
Sending emails and documents
Send documents to Parseur using the API
Send documents to Parseur using the API

How to send text documents directly to Parseur via its API

Updated over a week ago

This article describes how to send documents to Parseur via its native REST HTTPS API.
โ€‹

Parseur API Authentication

The Parseur API uses token-based authentication.

You will find your API Token Key in your Account Overview.

For clients to authenticate, the token key should be included in the Authorization HTTP header. The key should be prefixed by the string literal "Token", with white space separating the two strings. For example:

Authorization: Token 1234d45678c90bcf1234fe123ddae4aabbc6abcd


Unauthenticated responses that are denied permission will result in an HTTP 403 Unauthorized response with an appropriate WWW-Authenticate header. For example:

WWW-Authenticate: Token

The curl command-line tool may be useful for testing token-authenticated APIs. For example:

curl -X GET https://api.parseur.com/ -H "Authorization: Token <enter-your-token>"

Picking a way to send documents to Parseur

The Parseur API offers two endpoints to send new documents:

  • The /parser/:id/upload one, which is used to upload binary files (PDF, DOC, EML, ...) in the application. This endpoint is recommended if you want to send one or many arbitrary file(s) with the help of a programming library or the curl command.

  • The /email one, which is mostly used to receive documents originating from emails (so, it's most often HTML or text documents). This endpoint offers the possibility to send the document's text content directly. It also allows you to specify email-related metadata and headers.

Sending binary documents to Parseur

You can send a document to one of your mailboxes by issuing a POST HTTP request to the URL https://api.parseur.com/parser/:id/upload , replacing the ID with the ID of the mailbox you want to send the document to.

The files are sent through the usual HTTP file upload mechanism described here: https://datatracker.ietf.org/doc/html/rfc1867

In order to upload a file from your disk to your mailbox, you need 2 things:

  • Your API token (found here: https://app.parseur.com/account )

  • The numerical ID of the mailbox you want to upload the file to. You can find this ID in the URL bar of your browser when browsing your mailbox:

You can then upload a file using the following curl command, on Linux and Mac OS X systems:

curl \
-X POST \
https://api.parseur.com/parser/<Your mailbox ID>/upload \
-H "Authorization: Token <Your API token>" \
-F 'file=@/path/to/file.pdf' \
--http1.1

Notes:

  • Make sure to replace the <API token> and <Mailbox ID> placeholders with your own values

  • --http1.1 flag is needed for work around Mac OS X's curl incompatibility with Parseur's servers

A successful upload will return the document ID. If you want to tie the parsed data related to that upload to the extracted data at a later stage, save that ID and add the DocumentID metadata field to the parsed result:

{
"message": "OK",
"attachments": [
{
"name": "file.pdf",
"DocumentID": "1e2e34cba5c678a9012f3e456c789a0f"
}
]
}

Getting a successful 200 return code and document ID from the Upload API only confirms that Parseur received the document. It does not guarantee you will see your document in Parseur. The document ingestion pipeline is async and includes several processing steps. A document that fails to process in an ulterior step may be rejected and not appear in your mailbox.

Check out the list of document formats supported by Parseur to learn more about the types of documents you can send.

Sending emails and text documents to Parseur

Once authenticated, you can send a new document for Parseur to process by issuing a POST request on https://api.parseur.com/email with the following payload:

{
'subject': 'The title of your document, or email subject',
'from': 'Sender Name <[email protected]>',
'recipient': '[email protected]',
'body_html': '<html><body>Document content as HTML. This one has priority over text content if both are present.</body></html>',
'body_plain': 'Document content as text. This one is only used if body_html is empty.',
'message_headers': [
["Standard-SMTP-Header", "Any usual email header goes here"],
["X-Envelope-From", "<[email protected]>"],
]
}


As an example, you can test the following curl command and it should create a new document for your [email protected] mailbox (attachments and message headers removed to keep the example simple but those are optional anyway):

curl \
-X POST \
https://api.parseur.com/email \
-d '{
"subject": "The title of your document, or email subject",
"from": "Sender Name <[email protected]>",
"recipient": "[email protected]",
"body_html": "<html><body>Document content as HTML. This one has priority over text content if both are present.</body></html>",
"body_plain": "Document content as text. This one is only used if body_html is empty.",
"message_headers": []
}' \
-H "Content-Type: application/json" \
-H "Authorization: Token <enter-your-token-here>"

Note: Make sure to replace the recipient email address and the token with your own values

Adding To, CC and BCC fields

You can add To, CC and BCC information by adding the respective keys to your payload:

  • to

  • cc

  • bcc

Important: When using those keys, make sure the recipient's email address (your Parseur mailbox's email address) is present in at least one of those three fields.

Example:

{
'subject': 'The title of your document, or email subject',
'from': 'Sender Name <[email protected]>',
'recipient': '[email protected]',
'to': '[email protected], Another Name <[email protected]>',
'cc': '[email protected]',
'bcc': '[email protected]',
[... rest of the request...]
}

Passing custom parameters

In the query string

When sending a document, you can send custom parameters along with the URL. Those are also called arguments or query string. For example:

curl \
-X POST \
https://api.parseur.com/parser/<Your mailbox ID>/upload?user.name=John&user.age=42 \
-H "Authorization: Token <Your API token>" \
-F 'file=@/path/to/file.pdf' \
--http1.1

This will upload file.pdf and once successfully processed, the keys user.name and user.age will get added to the result for this file. For example, if the initial extracted data looks like this:

{
"user.account_number": 123456,
"user.role": "admin"
}

Then the resulting data will be:

{
"user.account_number": 123456,
"user.role": "admin",
"user.name": "John",
"user.age": "42"
}

Query string parameters are available for both the /email and /upload endpoints.

URL-encoded form data in the payload of the POST request

Available only for the /upload endpoint, you can also send custom parameters in the body of the POST request, URL-encoded as form data. For example: name=John%20Doe&age=42. This will result in having "name": "John Doe" and "age": "42" show up in the result.

Expand custom parameters

Taking advantage of our mailbox's "Expand field names in JSON Result" feature, the result will look like this:

{
"user": {
"account_number": 123456,
"role": "admin",
"name": "John",
"age": "42"
}
}

Custom parameters arrays

If you specify the same argument several times, it will turn it into an array. For example:

curl \
-X POST \
https://api.parseur.com/parser/<Your mailbox ID>/upload?user.names=John&user.names=Jo&user.name=Johnny \
-H "Authorization: Token <Your API token>" \
-F 'file=@/path/to/file.pdf' \
--http1.1

This will add the following to the document's result:

{
"user": {
"names": ["John", "Jo", "Johnny"]
}
}

Gotchas

Due to the way the HTTP protocol works, parameters keys will be turned into lowercase, but not the values. For example: <url>?Name=John will add a name key with the value John.

When merging an existing key/value pair from the result with a key/value with the same key name (both keys have to be lowercase for this to happen) from the parameters, the value from the result overwrites the value from the parameters.

All parameters keys and values are strings. However, if a field of the same name exists in the mailbox and is not populated by parsing, the parameter value will get formatted according to this field's format, if possible.

API rates limit

Our API has the following rate limitations:

  • Sending requests to the /login and /signup endpoints is limited to 30 requests per minute, per client IP address, with an initial burst allowance of 2 requests.

  • Sending requests to other endpoints is limited to 5 requests per second per IP, with an initial burst allowance of 50 requests.

Requests that go over the rate limit will return a 429 error code.

What's next?

Did this answer your question?