Extract text from plain-text documents

Tips and best practices to parse plain-text documents

Updated over a week ago

Tips for parsing plain-text documents with layout

In order to preserve the layout, converted plain-text documents use space characters to separate different blocks on the same line. From one document to the other, the number of spaces can vary.

When creating fields in a template from text documents with layout, it is recommended to capture some spaces surrounding the fields you want to capture.

This will make Parseur more reliable when the number of spaces around blocks of text changes. This is because Parseur uses delimiters around fields to locate them in a document (see that article for more information about how Parseur works).

What are those <!--psr-to TT123--> symbols in my plain-text templates about?

Parseur uses markers internally in the form of <!--TT psr-123 --> to locate the fields in a template. You can safely ignore those markers while working on your template.

Did this answer your question?