Problem:
You have a PDF file from which you want to know the creation and modification date: Not the dates stored in the file itself but those from the PDF metadata.
Solution:
This solution assumes you use NodeJS version 8+ which supports async/await
.
You can use pdfjs to read these dates. First install it using
npm install pdfjs-dist
Then use this code to extract the dates.
const pdfjs = require('pdfjs-dist'); async function readPDFDates() { const pdf = await pdfjs.getDocument('mypdf.pdf'); const metadata = await pdf.getMetadata(); const modDate = new Date(metadata.metadata._metadata['xmp:modifydate']); const createDate = new Date(metadata.metadata._metadata['xmp:createdate']); return [modDate, createDate] } readPDFDates().then(([modDate, createDate]) => { console.log(`Creation date: ${createDate}`) console.log(`Modification date: ${modDate}`) }).catch(err => { console.error(`Error while reading PDF: ${err}`) })
The PDF files I’ve seen use ISO8601-style formatting, but without a timezone specification. The code therefore assumes that the times are in the local timezone.
Note: metadata is e.g. the following object (not all attributes are present for all PDFs):
{ info: { PDFFormatVersion: '1.5', IsAcroFormPresent: false, IsXFAPresent: false, Title: 'Microsoft Word - mypdf', Author: 'uli', Creator: 'PScript5.dll Version 5.2.2', Producer: 'Acrobat Distiller 9.3.0 (Windows)', CreationDate: 'D:20100209100924+01\'00\'', ModDate: 'D:20100209100924+01\'00\'' }, metadata: Metadata { _metadata: { 'dc:format': 'application/pdf', 'dc:creator': 'peter', 'dc:title': 'Microsoft Word - mypdf', 'xmp:createdate': '2010-02-09T10:09:24+01:00', 'xmp:creatortool': 'PScript5.dll Version 5.2.2', 'xmp:modifydate': '2010-02-09T10:09:24+01:00', 'pdf:producer': 'Acrobat Distiller 9.3.0 (Windows)', 'xmpmm:documentid': 'uuid:2fd66f45-5f2a-4dd6-8cb0-297ce85ee9e1', 'xmpmm:instanceid': 'uuid:f6e62218-4b40-47c7-837b-6cb1e6e90995' } },