Prerequisite

This article assumes you know how to use field formats to normalize and format your data in Parseur. Check out this article for more information.

Controlling Text format

Parseur does most of the text sanitizing automatically. However, if offers two variations for standard Text fields, depending on whether you want to keep new lines.

Text (Multi-line) format

This is the default format when creating a field.

The "Text" format will extract all visible text from your emails, including visible new lines.

When selecting "Text" output format, you can further tweak the format by selecting an input format:

  • HTML text (default): tells Parseur that the documents contain HTML. Parseur will use HTML markup to determine line breaks and then remove all HTML markup from field result
  • Plain Text: tells Parseur that the document is text-only. Parseur will keep line breaks and any HTML markup in the original value, but it will remove consecutive spaces
  • Raw Text: tells Parseur to keep the original value as is.

Text (single-line) format

The "One line text" format will extract all visible text from your emails, excluding visible new lines. Like the text format, it will also strip out any formatting and HTML elements and just keep the text.

Use the One-line Text format if you require the result field to be on a single line and exclude any line breaks.

Same as for the multi-line format, you can further tweak the format by selecting an input format:

  • HTML text (default): tells Parseur that the documents contain HTML. Parseur will remove all HTML markup from field result
  • Plain Text: tells Parseur that the document is text-only. Parseur will remove any consecutive space but keep any HTML markup from the original value
  • Raw Text: tells Parseur to remove line breaks but otherwise keep the original value as is.
Did this answer your question?