How to read PDF creation & modification date in NodeJS
Problem:
You have a PDF file from which you want to know the creation and modification date: Not the dates stored in the file itself but those from the PDF metadata.
Solution
This solution assumes you use NodeJS version 8+ which supports async/await
.
You can use pdfjs to read these dates. First install it using
npm install pdfjs-dist
Then use this code to extract the dates.
const pdfjs = require('pdfjs-dist');
async function readPDFDates() {
const pdf = await pdfjs.getDocument('mypdf.pdf');
const metadata = await pdf.getMetadata();
const modDate = new Date(metadata.metadata._metadata['xmp:modifydate']);
const createDate = new Date(metadata.metadata._metadata['xmp:createdate']);
return [modDate, createDate]
}
readPDFDates().then(([modDate, createDate]) => {
console.log(`Creation date: ${createDate}`)
console.log(`Modification date: ${modDate}`)
}).catch(err => {
console.error(`Error while reading PDF: ${err}`)
})
The PDF files I’ve seen use ISO8601-style formatting, but without a timezone specification. The code therefore assumes that the times are in the local timezone.
Note: metadata is e.g. the following object (not all attributes are present for all PDFs):
{ info:
{ PDFFormatVersion: '1.5',
IsAcroFormPresent: false,
IsXFAPresent: false,
Title: 'Microsoft Word - mypdf',
Author: 'uli',
Creator: 'PScript5.dll Version 5.2.2',
Producer: 'Acrobat Distiller 9.3.0 (Windows)',
CreationDate: 'D:20100209100924+01\'00\'',
ModDate: 'D:20100209100924+01\'00\'' },
metadata:
Metadata {
_metadata:
{ 'dc:format': 'application/pdf',
'dc:creator': 'peter',
'dc:title': 'Microsoft Word - mypdf',
'xmp:createdate': '2010-02-09T10:09:24+01:00',
'xmp:creatortool': 'PScript5.dll Version 5.2.2',
'xmp:modifydate': '2010-02-09T10:09:24+01:00',
'pdf:producer': 'Acrobat Distiller 9.3.0 (Windows)',
'xmpmm:documentid': 'uuid:2fd66f45-5f2a-4dd6-8cb0-297ce85ee9e1',
'xmpmm:instanceid': 'uuid:f6e62218-4b40-47c7-837b-6cb1e6e90995' } },