Data Miner Crawl is for when you have a list of items on a webpage that you need to click into to see additional data. This is a two step process that uses a combination of a List Recipe and Detail Recipes .
Crawl Process Overview
This process requires two recipes. The first recipe is used on the search results page and extracts the detail page URLs from every individual item.
You will then take this list of URLs and upload it to Data Miner. Data Miner will then visit every URL and apply the second recipe, which is used to scrape the details.
Once the process complete, you will have a file with the combined data from the list page and each detail page. Continue below to see complete documentation.
Crawl Tutorial Video
Step By Step Instructions:
Part One - Collecting the URLs
- Navigate to the search results page you want to scrape, launch Data Miner, and click List Scrape from the left side menu.
- Click "Select a Recipe" from the top tabs in Data Miner. Now choose a recipe that captures the URLs of each search results item. You can choose from Public, Generic or My Recipes. If no Public Recipes are available you will have to make your own. You can learn how to do that here: Recipe Creator
- Click, "Select and Scrape" on the chosen recipe. Confirm the URLs are correct. If not, choose another recipe.
- There may be additional pages at the bottom you need to scrape. To scrape these pages continue to the next tab. "Next Page Pagination". From this page can tell Data Miner how many more pages to Scrape and it will accumulate the URLs to your current list.
- Once you've acquired all the URLs. You can now continue to the last tab, Download. From here you can Save the URLs or download them. For this example, we will be using the Save option.
- Give the file a unique Name and click "Save As".
Part Two - Running a Crawl
- Once you have a list of URLs, Click Crawl Scrape from the left side menu.
- Click "Load/New Crawl" from the top tabs in Data Miner. And then from the center options, click "Create new Crawl".
- Next we will tell Data Miner where the URLs will be coming from. This is done from the "Set URLs" Tab. There are multiple options which are covered in the advanced Crawl tutorials (coming soon) . For this example, we will be using "Saved Scrape Results"
- Click "Saved Scrape Results"
- Now from the drop down menu labeled "Saved results name:", choose the file that was saved from Step One.
- For the second field, "Which column contains URLs", input the column number where the URL is found. For example, if the URL is in column 1 of our output file, put "1".
- Click "Confirm", this will check that the URLs are valid. If some return invalid, we will cover this in in the advanced Crawl tutorials (coming soon) . The invalid URLs will not interfere with the Crawl process
- Once the URLs are confirmed, move onto the Recipe tab. This is where you will select the detail recipe that will be used on each URL to scrape the data. If you do not have a detail recipe, you can make one by following our tutorials on how to create recipes.
- Once the recipe is selected, you will see a preview scrape to the right.
- If the data looks good, continue to the Crawl tab
- From the Crawl tab, you will give the Crawl a name and name the output file. The additional settings will get covered in the advanced Crawl tutorials (coming soon) .
- Now click "Save and Start Crawl". Data Miner will not one by one, visit each URL, apply the detail recipe and scrape the data. The scraped data will begin to accumulate in the Download tab after a few seconds.
From the Download tab you will be able to save the data, or download it as a CSV, Excel file or copy it to your clipboard.
To continue your learn please visit our additional tutorials
Do you not see any Public or Generic Recipes? You can learn to make recipes yourself from our How to Write Recipes section.