The Parseur extension to the Google Chrome Browser allows you to extract data and scrape web pages. Using this useful Chrome extension to automatically send a web page to Parseur. Parseur will then be able to parse the content of the page and extract the data you need. What's more, with a little trick you can even go one step further and automate the crawling and scraping of web pages.
Note: this article assumes you already have an account and are comfortable with Parseur basics. Check out this page if you're unsure about how to get started.
Step 1: Install the extension
First, you need to install the Google Chrome Web Browser. If you haven't already, here is the Chrome installation page.
Then, go to the Chrome Web Store and search for "Parseur", or click here to directly access the Parseur extension page.
You will then see the extension in the top right corner of your browser.
Step 2: Send web pages to your Parseur mailbox with the extension
Connect to your Parseur account,
Make sure you already have a mailbox created (or create a new one if necessary)
Copy the mailbox address by clicking the "copy to clipboard" button, as shown below:
Go to a web page that you'd like to parse
Click the Parseur extension icon
Paste (or type) your mailbox address in the mailbox field
You may have to remove the @in.parseur.com part from your mailbox name. Don't worry, you'll only have to type this in once.
Then click send.
Your web page will show up as a new document in your mailbox.
Note: unlike emails, we can't always display the web pages exactly as they show up in your browser. The page may look bare but all the data from the original web page should be here, though.
Step 3: Create a template from the downloaded document
Create a template like you would normally do for any email or document. If necessary check out this page for more information about creating templates.
Step 4: Automate web crawling and scrape more pages
Sometimes, the data you need is spanned across several pages. That can be the case, for instance, if you have a large paginated table. In that case, you can automatically move from one page to the other using the "Linked Document" fields.
A "Linked Document" field is a field that captures a URL and will then automatically download that URL as a new document in Parseur.
Check out this page to learn how to set up a Linked Document field.
If the downloaded document is in the same format as the original one, data will be automatically extracted using the template you created in step 3. If the document is in a different format, Parseur will ask you to create a new template.