List scrape is used when you want to scrape a page that has information for multiple people, places, or things. The page style will have multiple rows or grid squares and may have multiple pages at the bottom. List pages are most commonly called search results pages.
List pages are very important because they are also the first step in creating a Crawl. A crawl is the process of scraping sub level data you have to click into to scrape. We will cover that topic in more detail in the Crawl section. In this tutorial, we will cover how to scrape a List page, with the option to automatically visit additional pages by having Data Miner click the next button and collecting data as it goes.
List Scrape Workflow Video
Written Detail Scrape Instructions:
- Navigate to the page you want to scrape, open up Data Miner, click "Start Scraping".
- Select List Scrape form the left hand side.
- Choose your recipe. You can choose from Public, Generic, or Private. To test a recipe, simply click on the Recipe and a single scrape will run and data will display on the right hand side of Data Miner.
- If no data is extracted. Continue trying additional Recipes. If no additional Recipes are available scroll down and learn how to write your own List Recipe.
- Once you have a recipe that extracts the data you want. Continue to the "Next Page Pagination" tab. This is where you can continue to automatically scrape additional pages using Next Page button.
- From the "Next Page Pagination" tab you can tell Data Miner how many pages to scrape and the amount of time to wait before each scrape. The minimum is 3 seconds. This is so the page has enough time to load the data. Click "Scrape".
- Data Miner will not automatically click the next page for you and scrape the data. You can view the progress from the the summary table below or you can click into the download tab to see the data accumulate. You also have to option to stop the pagination and clear the data if something needs to be adjust with the page or recipe.
- Once you have finished scraping all the desired detail pages, continue to the download tab. From this tab you can download the data directly to your computer as a CSV file, Excel file or you can copy it to your clipboard. Data Miner also has the option to Save it directly in Data Miner. Just give the file a name and click "Save As". This data will be available from the Save Results folder on the left hand side. Once the data is saved or downloaded, you can close Data Miner.
Writing your own List Recipe
In this section, we'll quickly cover how to write a List Recipe. For more indepth information on Recipe Writing please visit our Recipe Creator section.
- Navigate to a page you want to scrape. We will use this as a template to build the Recipe. Once the recipe is complete, you will be able to use the recipe on any page with the same style for this one website. A new recipe must be build for every new website.
- Once the data is visible, open Data Miner from the toolbar in the top right corner of your browser. Click the little pencil icon. This opens Recipe Creator and starts a new recipe.
- Click into the 2nd tab and choose your page Type. We are currently working on a List page. Choose List Page.
- The first step in creating a List Recipe is creating the Rows. These rows are the containers that tell Data Miner where the data is coming from and how to organize the output. The Rows created here, are the rows in your final file.
- To begin selecting the rows data. Click on the Rows tab and click the Find button. This turns your mouse into a magic wand and allows you to select parts of the page. To select rows, hover your mouse over the area around one row of data. This will highlight your data. You only need this highlighting color to cover one row or grid of data. Not the entire page. Once there is a highlighting color covering the area of a single row then hit shift on your keyboard. (Helpful tip - hit shift on a piece of data and then use the "Select Parent" button in Recipe Creator for more options if the mouse cannot cover an entire row)
- There will now be a dotted line around one row. In Recipe Creator, there will be some selectors suggested. These selectors are pieces of HTML copied from the web page and is what Data Miner uses to locate the row later. Pick the class or HTML element that highlights the row the best. Do this by click the checkbox in Recipe Creator
- Once the row is highlighted correctly, lock in the selector by clicking "confirm" at the bottom of Recipe Creator
- To begin selecting the individual data. Click on the Column tab and select “column 1”.
- Give the column a name and pick the extraction type you'd like to perform. By default it is text. What the different extraction types do will be covered in the Recipe Creator section.
- Next, click the "Find" button. This turns your mouse into a magic wand and allows you to select data. To select data, hover your mouse over the actual value you wish to scrape. Once there is a highlighting color covering the data then hit shift on your keyboard.
- Now in Recipe Creator, there will be some selectors suggested. These selectors are pieces of HTML copied from the web page and is what Data Miner uses to locate the data on the page so it can be scarped later. Pick the class or HTML element that highlights the data the best. Do this by click the checkbox in Recipe Creator
- Once the data is highlighted correctly, lock in the selector by clicking "confirm" at the bottom of Recipe Creator and click the "Extracted" button in the corner to check your work.
- Continue creating by clicking "Add new Column".
- Once you have all your columns done, continue to the Nav tab. This is where you set up the Next Page Pagination. The process of Data Miner clicking the next button for you.
- Similar to Rows and Columns, you will use the Hover and shift process. You will click the Find button, over over the "Next" button on the website and then hit shift on your keyboard. Choose the best selector and then hit confirm. Fore more advanced Next Page selectors or tricks please visit our Recipe Creator section.
- Once you have all your Nav done, finish by clicking the Save tab at the top. Give the recipe a name, click save and then run to scrape.
Have you created a List Recipe, but want to scrape data after you click into a detail page? This process is covered in our Crawl tutorial.