Prerequisite
This article assumes you know how to use field formats to normalize and format your data in Parseur. Check out this article for more information.
Controlling Text format
Parseur does most of the text sanitizing automatically. However, if offers two variations for standard Text fields, depending on whether you want to keep new lines.
Text (Multi-line) format
This is the default format when creating a field.
The "Text" format will extract all visible text from your emails, including visible new lines.
When selecting "Text" output format, you can further tweak the format by selecting an input format:
HTML text (default): tells Parseur that the documents contain HTML. Parseur will use HTML markup to determine line breaks and then remove all HTML markup from the field result
Plain Text: tells Parseur that the document is text-only. Parseur will keep line breaks and any HTML markup in the original value, but it will remove consecutive spaces
Raw Text: tells Parseur to keep the original value as is.
Text (single-line) format
The "One line text" format will extract all visible text from your emails, excluding visible new lines. Like the text format, it will also strip out any formatting and HTML elements and just keep the text.
Use the One-line Text format if you require the result field to be on a single line and exclude any line breaks.
Same as for the multi-line format, you can further tweak the format by selecting an input format:
HTML text (default): tells Parseur that the documents contain HTML. Parseur will remove all HTML markup from the field result
Plain Text: tells Parseur that the document is text-only. Parseur will remove any consecutive space but keep any HTML markup from the original value
Raw Text: tells Parseur to remove line breaks but otherwise keep the original value as is.