PDFJS: Read all pages using async/await in NodeJS

PDFJS has an official example that - among other things, reads all pages from a PDF document. However, their promise-based method is rather complex to understand and to write. Luckily, there is an easier way using async/await (which is supported starting from NodeJS 8.x).

I’m using the bereich library (bereich is german for range) in order to generate an array of page numbers (1..numPages). Install the required libraries using

npm install pdfjs-dist bereich

Here’s the source code example:

const pdfjs = require('pdfjs-dist');
const bereich = require('bereich');

async function readPDFPages() {
  const pdf = await pdfjs.getDocument('mypdf.pdf');
  const numPages = pdf.numPages;

  const pageNumbers = Array.from(bereich(1, numPages));
  // Start reading all pages 1...numPages
  const promises = pageNumbers.map(pageNo => pdf.getPage(pageNo));
  // Wait until all pages have been read
  const pages = await Promise.all(promises);
  // You can do something with pages here.
  return pages;
}

readPDFPages().then(pages => {
    console.log(pages)
}).catch(err => {
    console.error(`Error while reading PDF: ${err}`)
})