This article describes how to send documents to Parseur via its native REST HTTPS API.
โ
Parseur API Authentication
The Parseur API uses token-based authentication.
You will find your API Token Key in your Account Overview.
For clients to authenticate, the token key should be included in the Authorization HTTP header. The key should be prefixed by the string literal "Token", with white space separating the two strings. For example:
Authorization: Token 1234d45678c90bcf1234fe123ddae4aabbc6abcd
Unauthenticated responses that are denied permission will result in an HTTP 403 Unauthorized response with an appropriate WWW-Authenticate header. For example:
WWW-Authenticate: Token
The curl command-line tool may be useful for testing token-authenticated APIs. For example:
curl -X GET https://api.parseur.com/ -H "Authorization: Token <enter-your-token>"
Picking a way to send documents to Parseur
The Parseur API offers two endpoints to send new documents:
The
/parser/:id/upload
one, which is used to upload binary files (PDF, DOC, EML, ...) in the application. This endpoint is recommended if you want to send one or many arbitrary file(s) with the help of a programming library or thecurl
command.The
/email
one, which is mostly used to receive documents originating from emails (so, it's most often HTML or text documents). This endpoint offers the possibility to send the document's text content directly. It also allows you to specify email-related metadata and headers.
Sending binary documents to Parseur
You can send a document to one of your mailboxes by issuing a POST
HTTP request to the URL https://api.parseur.com/parser/:id/upload , replacing the ID with the ID of the mailbox you want to send the document to.
The files are sent through the usual HTTP file upload mechanism described here: https://datatracker.ietf.org/doc/html/rfc1867
In order to upload a file from your disk to your mailbox, you need 2 things:
Your API token (found here: https://app.parseur.com/account )
The numerical ID of the mailbox you want to upload the file to. You can find this ID in the URL bar of your browser when browsing your mailbox:
You can then upload a file using the following curl
command, on Linux and Mac OS X systems:
curl \
-X POST \
https://api.parseur.com/parser/<Your mailbox ID>/upload \
-H "Authorization: Token <Your API token>" \
-F 'file=@/path/to/file.pdf' \
--http1.1
Notes:
Make sure to replace the <API token> and <Mailbox ID> placeholders with your own values
--http1.1 flag is needed for work around Mac OS X's
curl
incompatibility with Parseur's servers
A successful upload will return the document ID. If you want to tie the parsed data related to that upload to the extracted data at a later stage, save that ID and add the DocumentID
metadata field to the parsed result:
{
"message": "OK",
"attachments": [
{
"name": "file.pdf",
"DocumentID": "1e2e34cba5c678a9012f3e456c789a0f"
}
]
}
Getting a successful 200 return code and document ID from the Upload API only confirms that Parseur received the document. It does not guarantee you will see your document in Parseur. The document ingestion pipeline is async and includes several processing steps. A document that fails to process in an ulterior step may be rejected and not appear in your mailbox.
Check out the list of document formats supported by Parseur to learn more about the types of documents you can send.
Sending emails and text documents to Parseur
Once authenticated, you can send a new document for Parseur to process by issuing a POST request on https://api.parseur.com/email with the following payload:
{
'subject': 'The title of your document, or email subject',
'from': 'Sender Name <[email protected]>',
'recipient': '[email protected]',
'body_html': '<html><body>Document content as HTML. This one has priority over text content if both are present.</body></html>',
'body_plain': 'Document content as text. This one is only used if body_html is empty.',
'message_headers': [
["Standard-SMTP-Header", "Any usual email header goes here"],
["X-Envelope-From", "<[email protected]>"],
]
}
As an example, you can test the following curl command and it should create a new document for your [email protected] mailbox (attachments and message headers removed to keep the example simple but those are optional anyway):
curl \
-X POST \
https://api.parseur.com/email \
-d '{
"subject": "The title of your document, or email subject",
"from": "Sender Name <[email protected]>",
"recipient": "[email protected]",
"body_html": "<html><body>Document content as HTML. This one has priority over text content if both are present.</body></html>",
"body_plain": "Document content as text. This one is only used if body_html is empty.",
"message_headers": []
}' \
-H "Content-Type: application/json" \
-H "Authorization: Token <enter-your-token-here>"
Note: Make sure to replace the recipient email address and the token with your own values
Adding To, CC and BCC fields
You can add To, CC and BCC information by adding the respective keys to your payload:
to
cc
bcc
Important: When using those keys, make sure the recipient's email address (your Parseur mailbox's email address) is present in at least one of those three fields.
Example:
{
'subject': 'The title of your document, or email subject',
'from': 'Sender Name <[email protected]>',
'recipient': '[email protected]',
'to': '[email protected], Another Name <[email protected]>',
'cc': '[email protected]',
'bcc': '[email protected]',
[... rest of the request...]
}
Passing custom parameters
In the query string
When sending a document, you can send custom parameters along with the URL. Those are also called arguments or query string. For example:
curl \
-X POST \
https://api.parseur.com/parser/<Your mailbox ID>/upload?user.name=John&user.age=42 \
-H "Authorization: Token <Your API token>" \
-F 'file=@/path/to/file.pdf' \
--http1.1
This will upload file.pdf and once successfully processed, the keys user.name
and user.age
will get added to the result for this file. For example, if the initial extracted data looks like this:
{
"user.account_number": 123456,
"user.role": "admin"
}
Then the resulting data will be:
{
"user.account_number": 123456,
"user.role": "admin",
"user.name": "John",
"user.age": "42"
}
Query string parameters are available for both the /email
and /upload
endpoints.
URL-encoded form data in the payload of the POST request
Available only for the /upload
endpoint, you can also send custom parameters in the body of the POST request, URL-encoded as form data. For example: name=John%20Doe&age=42
. This will result in having "name": "John Doe"
and "age": "42"
show up in the result appearing.
Expand custom parameters
Taking advantage of our mailbox's "Expand field names in JSON Result" feature, the result will look like this:
{
"user": {
"account_number": 123456,
"role": "admin",
"name": "John",
"age": "42"
}
}
Custom parameters arrays
If you specify the same argument several times, it will turn it into an array. For example:
curl \
-X POST \
https://api.parseur.com/parser/<Your mailbox ID>/upload?user.names=John&user.names=Jo&user.name=Johnny \
-H "Authorization: Token <Your API token>" \
-F 'file=@/path/to/file.pdf' \
--http1.1
This will add the following to the document's result:
{
"user": {
"names": ["John", "Jo", "Johnny"]
}
}
Gotchas
Due to the way the HTTP protocol works, parameters keys will be turned into lowercase, but not the values. For example: <url>?Name=John
will add a name
key with the value John
.
When merging an existing key/value pair from the result with a key/value with the same key name (both keys have to be lowercase for this to happen) from the parameters, the value from the result overwrites the value from the parameters.
All parameters keys and values are strings. However, if a field of the same name exists in the mailbox and is not populated by parsing, the parameter value will get formatted according to this field's format, if possible.
API rates limit
Sending documents is limited to 5 requests per second and per IP. Requests that go over the rate limit will return a 429 error code. We can accommodate higher rate limits as part of our Enterprise plan. Contact us to discuss.