The Parseur extension to the Google Chrome Browser allows you to extract data and scrape web pages. Using this useful Chrome extension to automatically send web page to Parseur. Parseur will then be able to parse the content of the page and extract data you need. What's more, with a little trick you can even go one step further and automate crawling of and scraping of web pages.
Note: this article assumes you already have an account and are comfortable with Parseur basics. Checkout this page if you're unsure about how to get started.
Step 1: Install the extension
First you need to install the Google Chrome Web Browser. If you haven't already, here is the Chrome installation page.
Then, go to the Chrome Web Store and search for "Parseur", or click here to directly access the Parseur extension page.
You will then see the extension in the top right corner of your browser.
Step 2: Send web pages to your Parseur mailbox with the extension
- Connect to your Parseur account,
- Make sure you already have a mailbox created (or create a new one if necessary)
- Copy the mailbox address by clicking the "copy to clipboard" button, as shown below:
- Go to a web page that you'd like to parse
- Click the Parseur extension icon
- Paste (or type) your mailbox address in the mailbox field
You may have to remove the @in.parseur.com part from your mailbox name. Don't worry, you'll only have to type this in once.
Then click send.
Your web page will show up as a new document in your mailbox.
Note: unlike emails, we can't always display the web pages exactly as they show up in your browser. The page may look bare but all the data from the original web page should be here, though.
Step 3: Create template from the downloaded document
Create a template like you would normally do for any email or document. If necessary checkout this page for more information about creating templates.
Step 4: Automate web crawling and scrape more pages
Sometimes, the data you need is spanned across several pages. That can be the case, for instance, if you have a large paginated table. In that case you can automatically move from one page to the other using "Linked Document" fields.
A "Linked Document" field is a field that captures a URL and will then automatically download that URL as a new document in Parseur.
Check out this page to learn how to set up a Linked Document field.
If the downloaded document is in the same format as the original one, data will be automatically extracted using the template you created at step 3. If the document is in a different format, Parseur will ask you to create a new template.