PDFJS: Read PDF from memory Buffer in NodeJS

Note: This post uses async/await and therefore requires NodeJS 8+.

This is how to read a PDF file from a file, e.g. mypdf.pdf:

pdfjs.getDocument('mypdf.pdf');

Full example:

const pdfjs = require('pdfjs-dist');

async function readPDF() {
    const pdf = await pdfjs.getDocument('mypdf.pdf');
    // ...
}

Here’s how you can read the PDF from a memory buffer:

pdfjs.getDocument({data: buffer});

Full example

const fs = require('mz/fs')
const pdfjs = require('pdfjs-dist');

async function readPDF() {
    // Read file into buffer
    const buffer = await fs.readFile('mypdf.pdf')
    // Parse PDF from buffer
    const pdf = await pdfjs.getDocument({data: buffer});
    // ...
}

Using mz/fs is not required, it’s just used as an utility library to be able to use await with files.

 

Convert pt (postscript/PDF unit) to inch or mm in Javascript

Here are some simple utility functions to convert the preprint unit pt (defined as 1/72 inch) into inches or mm.

function convertPtToInch(pt) { return pt / 72; }
function convertInchToMM(inch) { return inch * 25.4; }
function convertPtToMM(pt) {
  return convertInchToMM(convertPtToInch(pt)); }

// Example usage
console.log(convertPtToMM(595)) // Prints 209.90277777777777

Note that while this conversion is exact, there is some tolerance required when comparing these units:
An ISO A4 paper is defined as 210x297 mm – or 595x842 pt.

However, converting 595×842 pt into mm results in 209.902777 mm and 297.038888 mm respectively. Watch out for those tolerances if you try to compare paper sizes. I recommend a tolerance of at least 0.25 mm.

Extract PDF page sizes using PDFJS & NodeJS

Although most PDFs have some pages with only one page size (e.g. DIN A4 or Letter in portrait orientation), PDFs sometimes also have pages that have another size or orientation (which is treated just like another size) that other pages in the same document.

This post provides an easy-to-reuse example on how to use PDFJS in NodeJS (though it will be just as easy to do in the browser) to extract the PDF

It is based on this previous post on how to read all pages from a PDF document using PDFJS, so be sure to check that out first.

First install the required dependencies:

npm install bereich pdfjs-dist

then you can use this source code to read the page sizes of mypdf.pdf:

const pdfjs = require('pdfjs-dist');
const bereich = require('bereich');

class PageSize {
  constructor(width, height) {
    this.width = width;
    this.height = height
  }
}

function getPageSize (page) {
    const [x, y, w, h] = page.pageInfo.view;
    const width = w - x;
    const height = h - y;
    const rotate = page.pageInfo.rotate;
    // Consider rotation
    return (rotate === 90 || rotate === 270)
        ? new PageSize(height, width) : new PageSize(width, height);
}

async function readPDFPageSizes() {
  const pdf = await pdfjs.getDocument('mypdf.pdf');
  const numPages = pdf.numPages;

  const pageNumbers = Array.from(bereich(1, numPages));
  // Start reading all pages 1...numPages
  const promises = pageNumbers.map(pageNo => pdf.getPage(pageNo));
  // Wait until all pages have been read
  const pages = await Promise.all(promises);
  // You can do something with pages here.
  return pages.map(getPageSize);
}

readPDFPageSizes()
    .then(pageSizes => {console.log(pageSizes)})
    .catch(err => {console.error(`Error while reading PDF: ${err}`)})

Running this with a document having a single A4 page will result in

[ PageSize { width: 595, height: 842 } ]

Note that the width & height unit is pt (Points). One pt is defined as 1/72 inches. A DIN A4 page (portrait) is 595x842pt, therefore you see those values here.
See this TechOverflow post for code to convert pt to mm and inches.

PDFJS: Read all pages using async/await in NodeJS

PDFJS has an official example that – among other things, reads all pages from a PDF document.
However, their promise-based method is rather complex to understand and to write. Luckily, there is an easier way using async/await (which is supported starting from NodeJS 8.x).

I’m using the bereich library (bereich is german for range) in order to generate an array of page numbers (1..numPages).
Install the required libraries using

npm install pdfjs-dist bereich

Here’s the source code example:

const pdfjs = require('pdfjs-dist');
const bereich = require('bereich');

async function readPDFPages() {
  const pdf = await pdfjs.getDocument('mypdf.pdf');
  const numPages = pdf.numPages;

  const pageNumbers = Array.from(bereich(1, numPages));
  // Start reading all pages 1...numPages
  const promises = pageNumbers.map(pageNo => pdf.getPage(pageNo));
  // Wait until all pages have been read
  const pages = await Promise.all(promises);
  // You can do something with pages here.
  return pages;
}

readPDFPages().then(pages => {
    console.log(pages)
}).catch(err => {
    console.error(`Error while reading PDF: ${err}`)
})

 

How to read PDF creation & modification date in NodeJS

Problem:

You have a PDF file from which you want to know the creation and modification date: Not the dates stored in the file itself but those from the PDF metadata.

Solution:

This solution assumes you use NodeJS version 8+ which supports async/await.
You can use pdfjs to read these dates. First install it using

npm install pdfjs-dist

Then use this code to extract the dates.

const pdfjs = require('pdfjs-dist');

async function readPDFDates() {
  const pdf = await pdfjs.getDocument('mypdf.pdf');
  const metadata = await pdf.getMetadata();

  const modDate = new Date(metadata.metadata._metadata['xmp:modifydate']);
  const createDate = new Date(metadata.metadata._metadata['xmp:createdate']);
  return [modDate, createDate]
}

readPDFDates().then(([modDate, createDate]) => {
    console.log(`Creation date: ${createDate}`)
    console.log(`Modification date: ${modDate}`)
}).catch(err => {
    console.error(`Error while reading PDF: ${err}`)
})

 

The PDF files I’ve seen use ISO8601-style formatting, but without a timezone specification. The code therefore assumes that the times are in the local timezone.

Note: metadata is e.g. the following object (not all attributes are present for all PDFs):

{ info: 
   { PDFFormatVersion: '1.5',
     IsAcroFormPresent: false,
     IsXFAPresent: false,
     Title: 'Microsoft Word - mypdf',
     Author: 'uli',
     Creator: 'PScript5.dll Version 5.2.2',
     Producer: 'Acrobat Distiller 9.3.0 (Windows)',
     CreationDate: 'D:20100209100924+01\'00\'',
     ModDate: 'D:20100209100924+01\'00\'' },
  metadata: 
   Metadata {
     _metadata: 
      { 'dc:format': 'application/pdf',
        'dc:creator': 'peter',
        'dc:title': 'Microsoft Word - mypdf',
        'xmp:createdate': '2010-02-09T10:09:24+01:00',
        'xmp:creatortool': 'PScript5.dll Version 5.2.2',
        'xmp:modifydate': '2010-02-09T10:09:24+01:00',
        'pdf:producer': 'Acrobat Distiller 9.3.0 (Windows)',
        'xmpmm:documentid': 'uuid:2fd66f45-5f2a-4dd6-8cb0-297ce85ee9e1',
        'xmpmm:instanceid': 'uuid:f6e62218-4b40-47c7-837b-6cb1e6e90995' } },

 

How to add JS Drag-&-Drop file upload without any dependencies

Problem:

For your new web application, you want to add drag&drop file uploads without using any external library.

Solution:

You can use this set of functions, which you can adapt to your application.

/**
 * Initialize drag & drop event handling for a DOM element.
 * The DOM element does not have to be empty in order to do this.
 * @param elem The DOM element where files can be dragged & dropped
 * @param callback The callback(files) function that gets passed a list of files
 * when files are dragged and dropped.
 *
 * Basic usage example:
 *  var elem = document.getElementById('mydiv');
 *  initializeDragAndDrop(elem, function(files) {
 *      for (var i = 0; i < files.length; i++) {
 *          // Do something with files[i]...
 *          handleUploadedFile(files[i]); // Replace by your code
 *      }
 *  });
 */
function initializeDragAndDrop (elem, callback) {
    elem.addEventListener('drop', function (event) {
        _dragndrop_preventDefault(event);
        callback(event.dataTransfer.files);
    }, false);
    elem.addEventListener('dragover', _dragndrop_preventDefault, false);
    elem.addEventListener('dragdrop', _dragndrop_preventDefault, false);
    elem.addEventListener('dragenter', _dragndrop_preventDefault, false);
    elem.addEventListener('dragleave', _dragndrop_preventDefault, false);
}

/**
 * Internal utility function to prevent default
 * handling for a given event.
 */
function _dragndrop_preventDefault (event) {
    event.stopPropagation();
    event.preventDefault();
}

Usage example:

var elem = document.getElementById('mydiv');
initializeDragAndDrop(elem, function(files) {
    for (var i = 0; i < files.length; i++) {
        // Do something with files[i]...
        handleUploadedFile(files[i]); // Replace by your code
    }
});

Note that you need to run initializeDragAndDrop only after the respective DOM element (mydiv in this example) has been loaded. For example, you could call it like this if you use jQuery:

$(document).ready(function() {
    initializeDragAndDrop(/* ... */);
});

Pure Javascript (no jQuery):

// WARNING: This will replace any window.onload function
// that is currently set.
// Also, this will only fire after everything on the page
// has been loaded, which might not be the desired behaviour.
window.onload = function() {
    initializeDragAndDrop(/* ... */);
};

Also see this previous TechOverflow post on how to read the uploaded files into memory, if you need to.

Reading an uploaded file into memory using pure Javascript

You have a File object in Javascript (e.g. from a drag & drop upload or a <input type="file">) which you want to read into memory in the browser. You don’t want to use any library to do that but prefer a pure Javascript solution.

Solution:

Use this function:

/**
 * Utility function to read an entire file into memory.
 *
 * The handler function gets passed an array of objects:
 * {
 *     name: filename as string,
 *     size: size in bytes as number,
 *     type: MIME type as string,
 *     content: file content as Uint8Array
 * }
 * @param file The file to read
 * @param handler
 */
function readFileIntoMemory (file, callback) {
    var reader = new FileReader();
    reader.onload = function () {
        callback({
            name: file.name,
            size: file.size,
            type: file.type,
            content: new Uint8Array(this.result)
         });
    };
    reader.readAsArrayBuffer(file);
}

Usage example:

// Usage example
readFileIntoMemory(file, function(fileInfo) {
    console.info("Read file " + fileInfo.name + " of size " + fileInfo.size);
    // You can use fileInfo.content, which is a Uint8Array, here
});

Copying strings to the clipboard using pure Javascript

You want to copy a string to the system clipboard in the browser without using any dependency like clipboard.js .

Solution:

Use this function:

function copyStringToClipboard (str) {
   // Create new element
   var el = document.createElement('textarea');
   // Set value (string to be copied)
   el.value = str;
   // Set non-editable to avoid focus and move outside of view
   el.setAttribute('readonly', '');
   el.style = {position: 'absolute', left: '-9999px'};
   document.body.appendChild(el);
   // Select text inside element
   el.select();
   // Copy text to clipboard
   document.execCommand('copy');
   // Remove temporary element
   document.body.removeChild(el);
}

Note: If the user selected anything when you ran the function, this selection will be cleared. If you need to preserve the selection, see this Hackernoon article for a more elaborate solution..

You can use it like this:

// Usage example:
copyStringToClipboard("abc123");

It works by adding a temporary <textarea> element onto the DOM which is moved outside (credits to Angelos Charalis on Hackernoon for the original idea) the viewport in order to avoid wreaking havoc on screenreaders etc.

How to fix NodeJS request Error: Argument error, options.body

Problem:

You’re using request to send a POST request with a body that should be JSON-encoded, but you are encountering an error similar to this:

Error: Argument error, options.body.
    at setContentLength (/home/uli/myproj/node_modules/request/request.js:434:28)
    at Request.init (/home/uli/myproj/node_modules/request/request.js:439:5)
    at new Request (/home/uli/myproj/node_modules/request/request.js:128:8)
    at request (/home/uli/myproj/node_modules/request/index.js:53:10)
    at Function.post (/home/uli/myproj/node_modules/request/index.js:61:12)

and, additionally this stacktrace:

TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be one of type string or Buffer
    at write_ (_http_outgoing.js:647:11)
    at ClientRequest.write (_http_outgoing.js:622:10)
    at Request.write (/home/uli/myproj/node_modules/request/request.js:1501:27)
    at end (/home/uli/myproj/node_modules/request/request.js:546:18)
    at Immediate.<anonymous> (/home/uli/myproj/node_modules/request/request.js:575:7)
    at runCallback (timers.js:763:18)
    at tryOnImmediate (timers.js:734:5)
    at processImmediate (timers.js:716:5)

Solution:

The error basically tells you that request can’t determine the length of the body, as you didn’t tell it how to encode the body and it isn’t a simple string or buffer. The solution is to tell request to use JSON body encoding by adding json: true to the options parameter (first argument to request.post). A valid options parameter looks like this:

const opts = {
    url: 'http://localhost:1234/api/myapi',
    body: {/* your body object */},
    json: true // <-- Add this line
};

How to fix KaTeX parse error: KaTeX doesn’t work in quirks mode

Problem:

You’re using KaTeX on a website, but every time you try to render, you encounter the following error message:

Uncaught Error: KaTeX parse error: KaTeX doesn't work in quirks mode.
    at new e (VM697 katex.min.js:1)
    at Object.l [as render] (VM697 katex.min.js:1)
[...]

Solution:

KaTeX requires a valid DOCTYPE declaration on your HTML page, so add this line at the top of your HTML file(s), above <html>

<!DOCTYPE html>

Thanks to @xymostech on GitHub for the original solution in the the KaTeX issue tracker:

How to fix Angular ‘TypeError: templateRef.createEmbeddedView is not a function’

Problem:

You encounter an error message like this:

ERROR TypeError: templateRef.createEmbeddedView is not a function
    at ViewContainerRef_.createEmbeddedView (core.js:11389)
    at NgIf._updateView (common.js:2843)
    at NgIf.set [as ngIfElse] (common.js:2815)
    at updateProp (core.js:12602)
    at checkAndUpdateDirectiveInline (core.js:12313)
    at checkAndUpdateNodeInline (core.js:13876)
    at checkAndUpdateNode (core.js:13819)
    at debugCheckAndUpdateNode (core.js:14712)
    at debugCheckDirectivesFn (core.js:14653)
    at Object.eval [as updateDirectives] (MyComponent.html:1)

in a component where you have a source code similar to this

<div *ngIf="myCondition ; else elseSection">
    <!-- ... -->
</div>
<div #elseSection>
    <!-- ... -->
</div>

Solution:

Whatever element you reference in the *ngIf else clause can’t be any arbitrary component, but it must be a ng-template.

In order to solve this, change <div #elseSection> to <ng-template #elseSection>. Note that using just <template> is deprecated since Angular4.

The resulting source code should look like this:

<div *ngIf="myCondition ; else elseSection">
    <!-- ... -->
</div>
<ng-template #elseSection>
    <!-- ... -->
</ng-template>

How to fix Angular ‘Cannot find control with unspecified name attribute’

Problem:

In your Angular2/4/5 application you see this error message:

Cannot find control with unspecified name attribute

Solution:

Look for a statement in the HTML angular template like this:

[formControl]="myCtrl"

The error message means that myCtrl can’t be found. Check if this variable is present in your class – it needs be a FormControl which you can import from @angular/forms:

import { FormControl } from '@angular/forms';

In my case, changing it to

[formControl]="myControl"

fixed the issue

How to fix Angular4/5 ‘No provider for ControlContainer’

Problem:

In your Angular2/4/5 application you’re getting the following error message:

No provider for ControlContainer ("<div class="recall-container mat-elevation-z8">

Solution:

You have not added the @angular/forms FormsModule to your module’s import list.

Go to your app.module.ts and add this line to the imports:

import { FormsModule } from '@angular/forms';

and look for a line like this in your module definition:

imports: [ /* several import modules may be listed here */ ],

and add FormsModule like this (if there are already imports, add FormsModule to the list):

imports: [ FormsModule ],

How to use query string parameters in NodeJS request

Problem:

You’re using the request library in order to make a HTTP GET request:

const request = require("request")

request.get("http://localhost:8000", function(err, response, body) {
    console.log(err, body);
})

Now you’re trying to add query parameters to the request. For this example, we’ll assume that you want to add one parameter: foo=bar

Solution:

You can use the qs parameter like this:

const request = require("request")

request.get({url: "http://localhost:8000", qs: {"foo": "bar"}}, function(err, response, body) {
    console.log(err, body);
})

Note that just adding a qs parameter to request.get won’t work, you need to have a dictionary as first argument that contains at least {"url": <your URL>, "qs": {<one or multiple query parameters>}}

Credits to Daniel at StackOverflow

Disabling SSL certificate checking in unirest (NodeJS)

Problem:

You want to make a HTTP request with unirest like this:

const unirest = require('unirest');
unirest.get("https://mydomain.net").end(console.log)

but you encounter the following error:

{ error: 
   { Error: unable to verify the first certificate
       at TLSSocket.<anonymous> (_tls_wrap.js:1088:38)
       at emitNone (events.js:86:13)
       at TLSSocket.emit (events.js:188:7)
       at TLSSocket._finishInit (_tls_wrap.js:610:8)
       at TLSWrap.ssl.onhandshakedone (_tls_wrap.js:440:38) code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE' } }

 Solution:

You can work around this problem by using strictSSL(false) like this:

const unirest = require('unirest');

unirest.get("https://mydomain.net")
.strictSSL(false)
.end(console.log)

Note however that this might have negative effects on the security of your application as this request will be vulnerable to man-in-the-middle attacks.