Crawl

Data Miner Crawl is for when you have a list of items on a webpage that you need to click into to see additional data. This is a two step process that uses a combination of a List Recipe and a Detail Recipe .

Crawl Process Overview

1) This process requires two recipes. The first recipe is used on the search results page and extracts the detail page URLs from every individual item.

2) You will then take this list of URLs and upload it to Data Miner. Data Miner will then visit every URL and apply the second recipe, which is used to scrape the details.

3) Once the process is complete, you will have a file with the combined data from the list page and each detail page. Continue below to see complete documentation.

Crawl Tutorial Video

Step By Step Instructions:

Part One - Collecting the URLs

  1. Navigate to the search results page you want to scrape, launch Data Miner, and click Page Scrape from the left side menu.
  2. Page Scrape
  3. Click "Select a Recipe" from the top tabs in Data Miner. Now choose a recipe that captures the URLs of each search results item. You can choose from Public, Generic or My Recipes. If no Public Recipes are available you will have to make your own. You can learn how to do that here: Recipe Creator
  4. Click, "Select and Scrape" on the chosen recipe. Confirm the URLs are correct. If not, choose another recipe.
  5. Select a Recipe
  6. There may be additional pages you need to scrape. To scrape these pages you will use "Scrape Page again" to manually scrape. Or you can use "Next Page Pagination" to auto scrape.
  7. Scrape Page Again would be if you wanted to manually chose the pages to be scraped. You visit the page and click "Scrape Page Again".
  8. If the page has a Next button and many pages at the bottom you will want to use Next Page Pagination. Data Miner will automatically click the next button and scrape the data for you.
  9. Scrape tab
  10. Once you've acquired all the URLs. You can now continue to the last tab, Download. From here you can Save the URLs or download them. For this example, we will be using the Save option. If you download, you must later upload the URLs as a CSV.
  11. Download tab
  12. Give the file a unique Name and click "Save As".

Part Two - Running a Crawl

  1. Once you have a list of URLs, Click Crawl Scrape from the left side menu.
  2. Crawl
  3. Click "Load/New Crawl" from the top tabs in Data Miner. And then from the center options, click "Create new Crawl".
  4. New Crawl
  5. Next we will tell Data Miner where the URLs will be coming from. This is done from the "Set URLs" Tab. There are multiple options which are covered in the advanced Crawl tutorials (coming soon) . For this example, we will be using "Saved Scrape Results"
  6. Click "Saved Scrape Results"
  7. Saved Results
  8. Now from the drop down menu labeled "Saved results name:", choose the file that was saved from Step One.
  9. URLs
  10. For the second field, "Which column contains URLs", input the column number where the URL is found. For example, if the URL is in column 1 of our output file, put "1".
  11. Click "Confirm", this will check that the URLs are valid. If some return invalid, we will cover this in in the advanced Crawl tutorials (coming soon) . The invalid URLs will not interfere with the Crawl process
  12. confirm
  13. Once the URLs are confirmed, move onto the Recipe tab. This is where you will select the detail recipe that will be used on each URL to scrape the data. If you do not have a detail recipe, you can make one by following our tutorials on how to create recipes.
  14. Once the recipe is selected, you will see a preview scrape to the right.
  15. select detail
  16. If the data looks good, continue to the Crawl tab
  17. From the Crawl tab, you will give the Crawl a name and name the output file. The additional settings will get covered in the advanced Crawl tutorials (coming soon) .
  18. crawl settings
  19. Now click "Save and Start Crawl". Data Miner will not one by one, visit each URL, apply the detail recipe and scrape the data. The scraped data will begin to accumulate in the Download tab after a few seconds.
  20. start crawl
  21. From the Download tab you will be able to save the data, or download it as a CSV, Excel file or copy it to your clipboard. crawl download

Try on your own!

Using our Practice Sandbox try running through the above steps on your own!

To continue your learn please visit our additional tutorials

Do you not see any Public or Generic Recipes? You can learn to make recipes yourself from our How to Write Recipes section.