Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add getAttachments to PDFDocument #80

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ Install with: `npm install @cantoo/pdf-lib`
- [Embed PDF Pages](#embed-pdf-pages)
- [Embed Font and Measure Text](#embed-font-and-measure-text)
- [Add Attachments](#add-attachments)
- [Extract Attachments](#extract-attachments)
- [Set Document Metadata](#set-document-metadata)
- [Read Document Metadata](#read-document-metadata)
- [Set Viewer Preferences](#set-viewer-preferences)
Expand Down Expand Up @@ -117,6 +118,7 @@ Install with: `npm install @cantoo/pdf-lib`
- Set viewer preferences
- Read viewer preferences
- Add attachments
- Extract attachments

## Motivation

Expand Down Expand Up @@ -765,6 +767,23 @@ const pdfBytes = await pdfDoc.save()
// • Rendered in an <iframe>
```

### Extract Attachments

If you load a PDF that has `cars.csv` as an attachment, you can use the
following to extract the attachments:

<!-- prettier-ignore -->
```js
const pdfDoc = await PDFDocument.load(...)
const attachments = pdfDoc.getAttachments()
const csv = attachments.find(({ name }) => name === 'cars.csv')
fs.writeFileSync(csv.name, csv.data)
```

> NOTE: If you are building a pdf file with this library, any attachments you've
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason why not? This might be confusing for users...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the attachments being stored before they are serialized? We can probably include that location when returning the attachments, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be better. It's been a while since I've looked at this code, but I vaguely remember that being difficult. I see a comment on PDFEmbeddedFile.embed that says something to the effect of it not being embedded until save is called. I also don't see how to get this information out of PDFEmbeddedFile.

I won't be able to spend a lot of time on this soon, but I'm planning to come back at some point and give it another shot to see if what you're saying is possible.

> added won't be returned by this function until after you call `save` on the
> document.

### Set Document Metadata

_This example produces [this PDF](assets/pdfs/examples/set_document_metadata.pdf)_.
Expand Down
45 changes: 45 additions & 0 deletions src/api/PDFDocument.ts
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ import {
PDFCatalog,
PDFContext,
PDFDict,
PDFArray,
decodePDFRawStream,
PDFStream,
PDFRawStream,
PDFHexString,
PDFName,
PDFObjectCopier,
Expand Down Expand Up @@ -940,6 +944,47 @@ export default class PDFDocument {
this.embeddedFiles.push(embeddedFile);
}

private getRawAttachments() {
if (!this.catalog.has(PDFName.of('Names'))) return [];
const Names = this.catalog.lookup(PDFName.of('Names'), PDFDict);

if (!Names.has(PDFName.of('EmbeddedFiles'))) return [];
const EmbeddedFiles = Names.lookup(PDFName.of('EmbeddedFiles'), PDFDict);

if (!EmbeddedFiles.has(PDFName.of('Names'))) return [];
const EFNames = EmbeddedFiles.lookup(PDFName.of('Names'), PDFArray);

const rawAttachments = [];
for (let idx = 0, len = EFNames.size(); idx < len; idx += 2) {
const fileName = EFNames.lookup(idx) as PDFHexString | PDFString;
const fileSpec = EFNames.lookup(idx + 1, PDFDict);
rawAttachments.push({ fileName, fileSpec });
}

return rawAttachments;
}

/**
* Get all attachments that are embedded in this document.
*
* > **NOTE:** If you build a document with this library, this won't return
* > anything until you call [[save]] on the document.
*
* @returns Array of attachments with name and data
*/
getAttachments() {
const rawAttachments = this.getRawAttachments();
return rawAttachments.map(({ fileName, fileSpec }) => {
const stream = fileSpec
.lookup(PDFName.of('EF'), PDFDict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we make sure "EF" and "F" key exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. These lookup and of functions don't ever return undefined in their signatures, and I'm not sure what the default value is if they don't find anything.

.lookup(PDFName.of('F'), PDFStream) as PDFRawStream;
return {
name: fileName.decodeText(),
data: decodePDFRawStream(stream).decode(),
};
});
}

/**
* Embed a font into this document. The input data can be provided in multiple
* formats:
Expand Down
142 changes: 142 additions & 0 deletions tests/api/PDFDocument.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ const normalPdfBytes = fs.readFileSync('assets/pdfs/normal.pdf');
const withViewerPrefsPdfBytes = fs.readFileSync(
'assets/pdfs/with_viewer_prefs.pdf',
);
const hasAttachmentPdfBytes = fs.readFileSync(
'assets/pdfs/examples/add_attachments.pdf',
);

describe(`PDFDocument`, () => {
describe(`load() method`, () => {
Expand Down Expand Up @@ -573,4 +576,143 @@ describe(`PDFDocument`, () => {
expect(pdfDoc.defaultWordBreaks).toEqual(srcDoc.defaultWordBreaks);
});
});

describe(`attach() method`, () => {
it(`Saves to the same value after attaching a file`, async () => {
const pdfDoc1 = await PDFDocument.create({ updateMetadata: false });
const pdfDoc2 = await PDFDocument.create({ updateMetadata: false });

const jpgAttachmentBytes = fs.readFileSync(
'assets/images/cat_riding_unicorn.jpg',
);
const pdfAttachmentBytes = fs.readFileSync(
'assets/pdfs/us_constitution.pdf',
);

await pdfDoc1.attach(jpgAttachmentBytes, 'cat_riding_unicorn.jpg', {
mimeType: 'image/jpeg',
description: 'Cool cat riding a unicorn! 🦄🐈🕶️',
creationDate: new Date('2019/12/01'),
modificationDate: new Date('2020/04/19'),
});

await pdfDoc1.attach(pdfAttachmentBytes, 'us_constitution.pdf', {
mimeType: 'application/pdf',
description: 'Constitution of the United States 🇺🇸🦅',
creationDate: new Date('1787/09/17'),
modificationDate: new Date('1992/05/07'),
});

await pdfDoc2.attach(jpgAttachmentBytes, 'cat_riding_unicorn.jpg', {
mimeType: 'image/jpeg',
description: 'Cool cat riding a unicorn! 🦄🐈🕶️',
creationDate: new Date('2019/12/01'),
modificationDate: new Date('2020/04/19'),
});

await pdfDoc2.attach(pdfAttachmentBytes, 'us_constitution.pdf', {
mimeType: 'application/pdf',
description: 'Constitution of the United States 🇺🇸🦅',
creationDate: new Date('1787/09/17'),
modificationDate: new Date('1992/05/07'),
});

const savedDoc1 = await pdfDoc1.save();
const savedDoc2 = await pdfDoc2.save();

expect(savedDoc1).toEqual(savedDoc2);
});
});

describe(`getAttachments() method`, () => {
it(`Can read attachments from an existing pdf file`, async () => {
const pdfDoc = await PDFDocument.load(hasAttachmentPdfBytes);
const attachments = pdfDoc.getAttachments();
expect(attachments.length).toEqual(2);
const jpgAttachmentExtractedBytes = attachments.find(
(attachment) => attachment.name === 'cat_riding_unicorn.jpg',
)!;
const pdfAttachmentExtractedBytes = attachments.find(
(attachment) => attachment.name === 'us_constitution.pdf',
)!;
expect(pdfAttachmentExtractedBytes).toBeDefined();
expect(jpgAttachmentExtractedBytes).toBeDefined();
const jpgAttachmentBytes = fs.readFileSync(
'assets/images/cat_riding_unicorn.jpg',
);
const pdfAttachmentBytes = fs.readFileSync(
'assets/pdfs/us_constitution.pdf',
);
expect(jpgAttachmentBytes).toEqual(
Buffer.from(jpgAttachmentExtractedBytes.data),
);
expect(pdfAttachmentBytes).toEqual(
Buffer.from(pdfAttachmentExtractedBytes.data),
);
});

it(`Saves to the same value after round tripping`, async () => {
const pdfDoc1 = await PDFDocument.create({ updateMetadata: false });
const pdfDoc2 = await PDFDocument.create({ updateMetadata: false });

const jpgAttachmentBytes = fs.readFileSync(
'assets/images/cat_riding_unicorn.jpg',
);
const pdfAttachmentBytes = fs.readFileSync(
'assets/pdfs/us_constitution.pdf',
);

await pdfDoc1.attach(jpgAttachmentBytes, 'cat_riding_unicorn.jpg', {
mimeType: 'image/jpeg',
description: 'Cool cat riding a unicorn! 🦄🐈🕶️',
creationDate: new Date('2019/12/01'),
modificationDate: new Date('2020/04/19'),
});

await pdfDoc1.attach(pdfAttachmentBytes, 'us_constitution.pdf', {
mimeType: 'application/pdf',
description: 'Constitution of the United States 🇺🇸🦅',
creationDate: new Date('1787/09/17'),
modificationDate: new Date('1992/05/07'),
});

// This is the currently documented behavior before save has been called
const noAttachments = pdfDoc1.getAttachments();
expect(noAttachments).toEqual([]);

const savedDoc1 = await pdfDoc1.save();
const attachments = pdfDoc1.getAttachments();
const jpgAttachmentExtractedBytes = attachments.find(
(attachment) => attachment.name === 'cat_riding_unicorn.jpg',
)!;
const pdfAttachmentExtractedBytes = attachments.find(
(attachment) => attachment.name === 'us_constitution.pdf',
)!;

await pdfDoc2.attach(
jpgAttachmentExtractedBytes.data,
'cat_riding_unicorn.jpg',
{
mimeType: 'image/jpeg',
description: 'Cool cat riding a unicorn! 🦄🐈🕶️',
creationDate: new Date('2019/12/01'),
modificationDate: new Date('2020/04/19'),
},
);

await pdfDoc2.attach(
pdfAttachmentExtractedBytes.data,
'us_constitution.pdf',
{
mimeType: 'application/pdf',
description: 'Constitution of the United States 🇺🇸🦅',
creationDate: new Date('1787/09/17'),
modificationDate: new Date('1992/05/07'),
},
);

const savedDoc2 = await pdfDoc2.save();
expect(savedDoc1).toEqual(savedDoc2);
});
});
});