PDFJS: Read all pages using async/await in NodeJS
PDFJS has an official example that - among other things, reads all pages from a PDF document.
However, their promise-based method is rather complex to understand and to write. Luckily, there is an easier way using async/await
(which is supported starting from NodeJS 8.x).
I’m using the bereich library (bereich is german for range) in order to generate an array of page numbers (1..numPages
).
Install the required libraries using
npm install pdfjs-dist bereich
Here’s the source code example:
const pdfjs = require('pdfjs-dist');
const bereich = require('bereich');
async function readPDFPages() {
const pdf = await pdfjs.getDocument('mypdf.pdf');
const numPages = pdf.numPages;
const pageNumbers = Array.from(bereich(1, numPages));
// Start reading all pages 1...numPages
const promises = pageNumbers.map(pageNo => pdf.getPage(pageNo));
// Wait until all pages have been read
const pages = await Promise.all(promises);
// You can do something with pages here.
return pages;
}
readPDFPages().then(pages => {
console.log(pages)
}).catch(err => {
console.error(`Error while reading PDF: ${err}`)
})