Web Scraping with Puppeteer, NodeJS & Shopify

Web Scraping with Puppeteer, NodeJS & Shopify

307
41



I really love Google’s Puppeteer. It’s a great utility for PDF generation, screenshots and web scraping. Today we look at how to scrape some public shopify data using Puppeteer.

We need to put together a lot of the async-await things from the last few videos.

41 Comments

  1. How can I click a link in 'a' tag. Ex: <a href='something'>Something</a>
    Thanks

  2. could anyone provide the code in gist or repo?

  3. This short tutorial is excellent. It has snippets required for many scraping scenarios. Explained very clearly and easy to understand.

  4. This is just awesome, could you make a video of how to use async/await in a chrome extension?

  5. Great video !
    Thanks for sharing!

  6. Fantastic video.. I am curious about how to scrape url like http://www.example.com because here http or https is unknown?

  7. How about storing the url of each buttons and running through each of them instead of returning to the main page and clicking those buttons?

  8. Excellent tutorial, any on downloading images?

  9. Anyone experience a problem with headless set to true?

  10. Excellent! How would one use an external list of search keywords to send to search box element?

  11. This is a great explanation on how we can use puppeteer. You're awesome! Thanks!

  12. Thanks for the interesting video. I attempted scraping with this code a couple of times but frequently I get this error around See our Photographers, "our error Error: Error: failed to find element matching selector "h2"

    ". Only a few times it successfully scraped everything by iterating through every section. Probably, at somewhere await page.waitForSelector doesn't wait long enough to find h2 element? Does anybody have a similar issue?

  13. bloody awesome tutorial mate!!

  14. Very nice and instructive video! Thank you!

  15. Im having a random error:(some iterations of the code only)

    "
    UnhandledPromiseRejectionWarning: Error: Error: failed to find element matching selector "h2";
    "

    for the code in:

    "
    await page.waitForSelector('#ExpertsResults');

    const lis = await page.$$('#ExpertsResults > li ');

    console.log('length: ', lis.length)

    //loop over each li on inner page

    for (let i = 0; i < lis.length; i++) {

    const name = await lis[i].$eval('h2', (h2) => h2 ? h2.innerText : '');

    if (name) {

    console.log({

    name: name

    })

    }

    }
    "

    I'd appreciate some input

    Im attaching a link to the code I wrote from this video:

    https://github.com/Darkmift/scrapeAmazon/blob/master/testYT.js

    Your assistance is most appreciated

  16. Please consider sharing a bin with your code

  17. Excellent narration – well done!

  18. button.click();
    gives me an error
    Execution context was destroyed, most likely because of a navigation
    please help!!!

  19. AMAZING!!!!!

    Congrats my friend
    Thanks for sharing

  20. Great but with some duplicate coding… anyway it was awesome!!

  21. Helpful Toto
    Thank you very much

  22. Can someone help me how to work with popup windows. Basically when I click a link and it opens a new popup window, I want to say click an element inside the new window. I've searched everywhere and all the samples that I tried are not working.

  23. Thank you so much for this! Great examples and common errors 👍🏻

  24. Thanks. It's amazing! ^^ I have a question.
    After completing the crawl, how do I print the results in an HTML document?

  25. Great video, especially the explanation of the problems/issues 🙂 just one question: couldn't you keep the original goto/waitForSelector aswell to get the boundaries for the for loop?
    Thanks for the great tutorial! i'm actually using pupeetterererer(however) in my current project 😉

  26. Can I scrape LinkedIn's profile data using Puppeteer?

  27. First video i watched on your channel and i have to say the way you explain things is superb. Its so much clearer to me now. Thank you A LOT

  28. hey i follow the same wht u did but m getting blank web page .do i make some mistake or wht please suggest me

  29. I've seen a lot of tech videos. You're number 1.
    Clear audio, video , professional , straight to the point. Wow.

  30. thanks alot for this awesome tut but
    i tried to do search worker and i can't get it working as expected
    here is the code

    const puppeteer = require('puppeteer');

    (async () => {
    try{
    const browser = await puppeteer.launch({headless:true});
    const page = await browser.newPage();
    await page.goto('https://www.bing.com/&#39😉;
    await page.waitForSelector('.b_searchbox')
    await page.$eval('.b_searchbox', el => el.value = 'intitle:hacked by');
    await page.click('.b_searchboxSubmit');
    await page.waitForSelector('.b_algo');

    const lis = await page.$$('.b_algo');

    for(const li of lis){
    const title = await li.$('p');
    console.log(lis)
    }
    }
    catch(e){
    console.log(e)
    }
    })();

  31. Hey, great job man. I was struggling with some of the async-await concepts and you talking through your example helped me with mine 🙂 happy scraping.

  32. Hi Alex, could you please help me? I'm trying to scrap some web just for interview section, and I'm required to use node.js which is not my skills at all, I'm trying to figure out and found some tools to use, I have using puppeteer as robot, because the target web using ajax for every url load function, there's no <a> at all, I have write down a very simple test code to get their each page with clicking the element, but anyway it doesn't run smoothly, mostly got navigation timeout error, node detached from documents, while loop clicking the pagination button …

    This is my loop :
    await cat.click();
    page.waitforNavigation()

    Where cat is elementHandle and
    Default navigation timeout already set to 900000

    Can you please explain me the mechanism ?

  33. Thank you for your puppeteer web scraping introduction. I am new to web development and wanted to use this framework to work on automation projects, this is a very good beginner's guide to how to handle a web page using puppeteer js.

  34. Wouldn't it make more sense to move everything into page.eval? Instead of doing await li.eval?

  35. Can you tell why this is happening?

    Error: Execution context was destroyed, most likely because of a navigation.

  36. I am getting error as below

    (async function main() {
    ^^^^^^^^

    SyntaxError: Unexpected token function

    at ScriptTransformer._transformAndBuildScript (node_modules/jest-runtime/build/script_transformer.js:402:17)

Leave a Reply to Paltibenlaish Cancel Reply

Your email address will not be published. Required fields are marked *