Skip to main content
All CollectionsControlling and managing data
Use Parseur document parsing API
Use Parseur document parsing API

How to use Parseur API to manage your documents and parsing workflow

Updated over a week ago

This article describes how to control your mailboxes via its native REST HTTPS API.

Does Parseur have an API?

Yes, we do! The article on this page details the main API to manage your Parseur account.

You may also want to check:

Parseur API Authentication

The Parseur API uses token-based authentication.

You will find your API Token Key in your Account Overview.

For clients to authenticate, the token key should be included in the Authorization HTTP header. The key should be prefixed by the string literal "Token", with white space separating the two strings. For example:

Authorization: Token 1234d45678c90bcf1234fe123ddae4aabbc6abcd

Unauthenticated responses that are denied permission will result in an HTTP 403 Unauthorized response with an appropriate WWW-Authenticate header. For example:

WWW-Authenticate: Token

The curl command-line tool may be useful for testing token-authenticated APIs. For example:

curl -X GET https://api.parseur.com/ -H "Authorization: Token <enter-your-token-here>" --compressed

Don't forget to specify the --compressed flag to take advantage of the on-th-fly compression of requests. Otherwise some requests may transfer gigabytes of uncompressed data and that will be slow.

Manage Mailboxes

List mailboxes

Mailbox objects are called parsers in the API.

To list all your mailboxes, make a GET request on /parser. The response is paginated.

Supported sorting keys (see the section below for more information):

  • name

  • document_count

  • template_count

  • PARSEDOK_count (number of documents processed)

  • PARSEDKO_count (number of documents not processed)

  • QUOTAEXC_count (number of documents in quota exceeded status)

  • EXPORTKO_count (number of documents in export failed status)

The search query parameter will search for the following properties:

  • mailbox name

  • mailbox email prefix

Create a mailbox

To create a mailbox, make a POST request on /parser passing the following keys:

  • name

  • email_prefix (optional, if not present, will be derived from name key)

  • process_attachments: true / false (optional, defaults to true)

  • ai_engine: GCP_AI_1 / none (optional, defaults to none, which disables the AI parsing completely for this mailbox)

  • master_parser_slug: optional, set it if you want your mailbox to use one of our set of ready-made templates.

  • parser_object_set: optional, the list of fields you want to create (if you set a master_parser_slug Parseur will create default field set for you)

Possible values for master_parser_slug are automotive, contact-list, delivery-notes, event-ticketing, financial-statement, food-delivery, invoices, job-application, job-search, leads, property-bookings, real-estate, resume-cv, search-alerts, statements, transactions, travel, utility, work-order

Example for parser_object_set:

[
{ "name": "MyField", "format": "TEXT" },
{ "name": "MyAddress", "format": "ADDRESS" },
{ "name": "MyTableField", "format": "TABLE",
"parser_object_set": [
{ "name": "MyTableColumn", "format": "TEXT", "type": "FIELD" }
// ... more columns here ...
]
}
// ... more fields here ...
]

Available field formats are: ADDRESS, DATE, DATETIME, LINK (linked document, will download the page behind the link and create a new document), NAME (person's name), NUMBER, ONELINE (single line text), TABLE, TEXT (multi line text), TIME

Retrieve a mailbox

You can retrieve a mailbox with a GET request on /parser/:mailbox_id

Update a mailbox

You can update a mailbox with a PUT or POST request on /parser/:mailbox_id

Copy a mailbox

You can copy (duplicate) an existing mailbox with a POST on /parser/:mailbox_id/copy

Delete a mailbox

You can delete a mailbox with a DELETE request on /parser/:mailbox_id

Get the field structure (schema) of a mailbox

You can get the mailbox schema with a GET request on /parser/:mailbox_id/schema.

Useful if you're planning to create a connector for Parseur.

Manage Documents in a mailbox

Send documents

To send a document via API, check out this article

List documents

You can list your documents in a given mailbox with a GET request on /parser/:mailbox_id/document_set. The response is paginated.

Supported sorting keys (see the section below for more information):

  • name

  • created (default - received date)

  • modified (processed date)

  • status

The search query parameter will search in the following properties:

  • document id (exact match)

  • document name

  • template name

  • from to, cc and bcc email addresses

  • document metadata header

Filter documents by date with keys:

  • received_after=yyyy-mm-dd

  • received_before=yyyy-mm-dd

  • tz=timezone (optional, example: Asia%2FSingapore) to filter dates in the given timezone. If not present, timezone will be set to UTC

  • you can use either one or both of the date filters in a query

Get the parsed result for each document:

  • this endpoint no longer returns the results by default

  • add query parameter with_result=true to get the result string with each document

Retrieve a document

You can retrieve a document and its parsed results with a GET request on /document/:document_id

Update a document

You cannot update a document.

Reprocess a document

You can reprocess (parse) a document with a POST on /document/:document_id/process

Skip a document

You can set the Skipped status on a document with a POST on /document/:document_id/skip

Copy a document

You can copy a document to another mailbox with a POST on /document/:document_id/copy/:target_mailbox_id

Retrieve the logs for a document

You can access the activity logs of a document with a GET on /document/:document_id/log_set. Logs are paginated.

Delete a document

You can delete a document with a DELETE request on /document/:document_id

Manage Templates in a mailbox

List templates

You can list your templates in a given mailbox with a GET request on /parser/:mailbox_id/template_set. The response is paginated.

Supported sorting keys (see the section below for more information):

  • name

  • created (creation date)

  • modified (default: last template update time or last time template was used)

  • last_activity (last time template was used)

  • status

  • document_count (number of documents matched by the template)

The search query parameter will search for the following properties:

  • template name

Create a template

Retrieve a template

You can retrieve a template with a GET request on /template/:template_id

Copy a template

You can copy a template with a POST on /template/:template_id/copy/:target_mailbox_id

Delete a template

You can delete a template with a DELETE request on /template/:template_id

Manage Webhooks in a mailbox

List webhooks

You can list your webhooks in a given mailbox with a GET request on /parser/:mailbox_id

  • Enabled webhooks are under the webhook_set key

  • Paused webhooks are under the available_webhook_set key.

Create a webhook

You can create a new webhook with a POST request on /webhook passing the following keys:

  • event: must be one of document.processed, document.processed.flattened, document.template_needed or table.processed (see our webhook reference article for more information)

  • target: URL to send the data to, e.g. https://api.example.com/parseur

  • category: must be set to CUSTOM

  • parser: ID of the mailbox you want to add the webhook to, in numerical format

  • name: Custom name for the webhook. If omitted, it will use the target URL instead. Optional

  • headers: JSON object containing the HTTP headers you want to send along with the result data. Optional

  • parser_field: ID of a field or a table field you want the webhook to react to, in the "PF12345" format

Enable a webhook

You can enable an existing webhook for a given mailbox with a POST request on /parser/:mailbox_id/webhook_set/:webhook_id

Pause a webhook

You can pause an existing webhook for a given mailbox with a DELETE request on /parser/:mailbox_id/webhook_set/:webhook_id

Getting parsed data

Using webhooks

Parseur can send parsed data in real-time to your server via its Webhook feature. Check out the webhook article to learn more.

Using download URLs

Using webhooks is the recommended way to get your data back to your servers. If that is not possible (for example, if you are not able to create an URL endpoint that listens for the data, or if your organization's security team doesn't allow you to open your firewall), you can use the download URLs provided when you receive a mailbox.

In a parser mailbox payload, you will find the following attributes:

  • csv_download. Download the data as a CSV. Example: /parser/<secret>/download/my.mailbox.csv

  • json_download. Download the data as JSON. Example: /parser/<secret>/download/my.mailbox.json

  • xls_download. Download the data as an XLSX. Example: /parser/<secret>/download/my.mailbox.xlsx

Filtering: You can filter the parsed data in the same way you do it in the app:

  • Add last_document_only=true HTTP query parameter to only retrieve the data of the last processed document

  • Add date=yyyy HTTP query parameter to retrieve data from year yyyy (e.g. date=2023)

  • Add date=yyyy-mm to retrieve data from year yyyy and month mm (e.g. date=2023-09)

  • Add date=yyyy-mm-dd to retrieve data from year yyyy, month mm and day dd (e.g. date=2023-09-05)

Notes:

  • You need to prefix those pathnames with https://api.parseur.com to get the full URL.

  • You don't need to add authentication headers to get the data. So make sure you keep those URLs private (for example, save the secret key as an environment variable and don't commit it to your code repository)

  • Date filtering is done based on UTC timezone

Optional HTTP Query parameters

The following query parameters can be mixed and matched.

Pagination

All GET requests that return a list of documents, templates, and mailboxes that support pagination by appending a page option to the URL. The default page size is 25. You can change the page size using the page_size query parameter.

For example: /parser?page=2&page_size=50 will list the second page of your mailboxes, each page containing 50 records.

Searching

Some endpoints support sorting via the search query parameter. The search value needs to be URL encoded.

For example, /parser?search=test%20mailbox will search for mailbox names containing "test mailbox"

Unless stated otherwise, search is not case sensitive and will retrieve all entities that partially match the search string. For example, a mailbox search for foo will return mailboxes named test.foo and FOO Mailbox 123.
โ€‹

Sorting

Some endpoints support sorting via the ordering query parameter.

  • to sort a list ascending on the foo key, use ?ordering=foo

  • to sort a list descending on the foo key, use ?ordering=-foo

For example, /parser?ordering=-document_count will list your mailboxes starting with the one with the most documents.

API rates limit

Requests to the /login and /signup endpoints are strictly rate-limited.

Sending requests to other endpoints is limited to 5 requests per second per IP, with an initial burst allowance of 50 requests.

Requests that go over the rate limit will return a 429 error code. We can accommodate higher rate limits as part of our Enterprise plan. Contact us to discuss.

Do more with the API

This article just lists the most common use cases for our API. There is more you can do; feel free to ask us for more details!

Did this answer your question?