How to do Web Scraping in JavaScript using Puppeteer: A Step-By-Step Tutorial

Banoo Gargari
5 min readApr 22, 2021

In this post, I am going to explain web scraping in JavaScript using Puppeteer and show step by step how to create a simple tool that can turn an event URL in the Eventbrite website into a graphical flyer with QRcode that can be shared on social media or printed and distributed.

What will we Build?

The following image shows the final event generator. I have kept the HTML and CSS part of this project simple to focus more on the main concept of web scraping using Puppeteer.

As you see we have a simple form that gets the event URL on the Eventbrite website. Then when you click on the “submit” button the web scraper reads all the event info including:

  • Event Title
  • Event Data
  • Event Time

and puts them together in the form of a flyer. Other event info such as event image, presenter, location, can be easily added as well.

Sample flyer with QR code automatically created using the event URL on Eventbrite website.

Access the Final Code on my GitHub

You can access the code for this project from the following GitHub link:

QRcode using qrcode.js + Node.js

In this project, we are going to use Node.js for the scraping using Puppeteer and QRCode.js for the creation of the QRcode. Please make sure you have installed the latest version of Node.js.

You may also download QRCode.js using the following link:

What is Puppeteer?

Puppeteer is a Node library that provides a high-level API to control headless Chrome browsers (a web browser, without a graphical user interface, mainly used for automated testing).

Think of Puppeteer as a browser that is not controlled by your computer mouse or even keyboard events but by the code you have written. You can write a code to interact with this browser and for example, open a URL or even click on a link. Most things that you can do manually in the browser can be done using Puppeteer! For example (+):

  • Generate screenshots and PDFs of pages.
  • Automate form submission, UI testing, keyboard input, etc.
  • Create an up-to-date, automated testing environment. Run your tests directly in the latest version of Chrome using the latest JavaScript and browser features.
  • Test Chrome Extensions.

A Simple Front-End Form + Event Listener

The first step is to create a Front-End folder. Make sure to put HTML, qrcode.mine.js, and a JavaScript file in that folder.

Here is a sample code to create an input box to get a URL from the user in your HTML file.

<form class=”banner-form”><input style=”width:100%” type=”url” id=”eventLinkInput” placeholder=”Event URL” value=”https://eventbrite.co.uk/e/131463864959" /><input type=”submit” value=”submit” /></form>

Now let's use the following simple div to put all the event info as a flyer.

<div id="banner-design">
<p id="joinUs">Join us for an exciting online event</p>
<p id="eventTitle"></p>
<p id="time"></p>
<div id="qrcode" style="align-content: center; text-align: center; margin-left: 161px;"></div>
</div>

I have given the full CSS file in the GitHub link above. Hare are some of the CSS rules that control the above flyer design and give it some font and background colors.

#banner-design{    background: rgb(255,225,225);    padding: 20px;    width: 450px;    height: 450px;    display: flex;    flex-wrap: wrap;    align-content: center;    border-style: dashed;    }  #joinUs{    text-align: center;    flex: 0 0 100%;    font-family: cursive; } #eventTitle{    background: darkmagenta;    color: azure;    flex: 0 0 90%;     font-family: 'Ranchers', cursive;    font-size: 170%;    text-align: center;    border-radius: 5px;    padding: 10px;    border-style: solid;}   #time{    color: darkgoldenrod;    flex: 0 0 100%;    font-family: 'Ranchers', cursive;    text-align: center;    }#presenter{    color: darkgreen;    flex: 0 0 100%;    text-align: center;    font-family: 'Ranchers', cursive;    }#summary{    color: rgb(158, 28, 34);    flex: 0 0 100%;    text-align: center;    font-family: 'Ranchers', cursive;        }

To submit the form and interact with the application I created the following event listener. This is

form.addEventListener(‘submit’, e => {e.preventDefault();let url = form.eventLinkInput.value;let api_url = ‘http://localhost:3000/event/?url=' + url;fetch(api_url).then(res => res.text()).then(body => {let data = JSON.parse(body);document.getElementById(“eventTitle”).innerText = data.title;document.getElementById(“time”).innerText = “Start Date: “ + data.start_date + “ — Start Time: “ + data.start_time;document.getElementById(‘qrcode’).innerHTML = ‘’;let qrcode = new QRCode(document.getElementById(‘qrcode’),{width: 128, height: 128});qrcode.makeCode(url);});})

Back-End Development

After finishing the front-end of your application, now it's time to create the back-end of the application to scrape the web page and collect the data and return it.

To set up the Express server in your back-end directory first run the following code at your terminal:

npm install init -y

Then install express:

npm install — — save express

Then create an app.js file and put the following code in the file. Please note how the lines in bold font extract the “title, image, start_date, start_time, end_date, end_time”, and then these parameters are returned to be used by the front-end to generate the flyer.

const fetch = require(‘node-fetch’);const express = require(‘express’);const cors = require(‘cors’);const cheerio = require(‘cheerio’);const app = express();const port = 3000;app.use(cors());app.get(‘/event’, async (request, response) => {const url = request.query.url;if (url) {let data = await fetch(url).then(r => r.text()).then(html => {let $ = cheerio.load(html);let title = $(‘meta[property=”og:title”]’).attr(‘content’);let image = $(‘meta[property=”og:image”]’).attr(‘content’);let start_datetime = $(‘meta[property=”event:start_time”]’).attr(‘content’);let start_datetime_parts = start_datetime.split(‘T’);let start_date = start_datetime_parts[0];let start_time = start_datetime_parts[1];start_time = start_time.replace(‘+00:00’, ‘’);let end_datetime = $(‘meta[property=”event:end_time”]’).attr(‘content’);let end_datetime_parts = end_datetime.split(‘T’);let end_date = end_datetime_parts[0];let end_time = end_datetime_parts[1];end_time = end_time.replace(‘+00:00’, ‘’);return {title,image,start_date,start_time,end_date,end_time};});response.send(data);} else {response.send(null);}});app.listen(port, () => {let url = ‘https://eventbrite.co.uk/e/131463864959';console.log(`Listening at http://lvh.me:3000 or http://localhost:${port}`);console.log(`Sample Usages:`);console.log(`http://lvh.me:3000/event?url=${url}`);});

Implement the Application

When you are done with implementing this application (or downloaded it from my GitHub using the link at the beginning of this post), to run the application and see the result follow these steps:

  1. At your backend directory run the following code:
node app.js

2. In your front-end folder: Open index.html with live server

Here is the result of the following mentioned steps:

Please let me know how you would like to use Web Scraping in your project? I am planning to create a new video tutorial on this subject. Which website would you like to see scraped in the tutorial?

--

--

Banoo Gargari
0 Followers

Software Engineer and Web Developer