The Supervisor Programming Model

The Supervisor Programming Model

Prerequisites

npm install node-fetch cheerio

Scraper Code

const fetch = require('node-fetch');
const cheerio = require('cheerio');

const url = 'https://example.com'; // Replace with the target URL
const fileExtension = '.pdf'; // Replace with the desired file extension

async function scrapeLinks(url, extension) {
    try {
        // Fetch the HTML from the URL
        const response = await fetch(url);
        if (!response.ok) {
            throw new Error(`Failed to fetch the URL: ${response.statusText}`);
        }
        const html = await response.text();

        // Load the HTML into Cheerio
        const $ = cheerio.load(html);

        // Find all anchor tags and filter by the file extension
        const links = [];
        $('a').each((index, element) => {
            const href = $(element).attr('href');
            if (href && href.endsWith(extension)) {
                links.push(href);
            }
        });

        // Log the found links
        console.log(`Found ${links.length} links with the extension '${extension}':`);
        links.forEach(link => console.log(link));
    } catch (error) {
        console.error('Error:', error.message);
    }
}

// Run the scraper
scrapeLinks(url, fileExtension);

How It Works

  1. Fetch the HTML: The fetch function retrieves the HTML content from the specified URL.
  2. Load HTML with Cheerio: The HTML is then loaded into Cheerio, allowing you to use jQuery-like syntax to parse and manipulate it.
  3. Extract Links: The script looks for all <a> tags and filters the href attributes for links that end with the specified file extension.
  4. Output Links: Finally, it logs the found links to the console.

Running the Scraper

node your-script-name.js

Note