Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process and strip markdown/HTML in OPF meta tags #583

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arthurattwell
Copy link
Member

As described by @jaycolmvar in #560, this strips markdown/HTML from meta tags in the OPF file. That markup may be present
in meta.yml since we do allow it, so that it's possible to include, say, italics in a title or project name.

I haven't tested this very thoroughly – only on one simple epub so far.

@LouiseSteward @bertuss @jaycolmvar can you think of any cases or reasons this might be a bad idea? Essentially, we need plain text that is valid XML in the package.opf file we build with the epub-package include.

As described by @jaycolmvar in #560, this strips markdown/HTML
from meta tags in the OPF file. That markup may be present
in meta.yml since we do allow it, so that it's possible to
include, say, italics in a title or project name.
@jaycolmvar
Copy link

I wasn't checking email for a while, so just saw this. Two comments:

First, in looking at the Dublin Core metadata definitions (http://dublincore.org/specifications/dublin-core/dcmi-terms/) it looks as if some properties might allow for more than plain text. For example, the "description" properties is defined as follows: "Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource." Maybe I'm missing something, but this seems to allow for the inclusion of say, an IMG tag to reference a "graphical representation".

Second, what about things like HTML character entities? Although I can't find a reference, I presume that the DC properties like "creator", "publisher", etc., can contain non-ASCII7 characters. What would happen if you put a line in _data/meta.yml like

title: All About © and ®

Would the entities get replaced with the corresponding UTF-8 strings?

@arthurattwell
Copy link
Member Author

@jaycolmvar Thanks so much for your input here. Sorry I haven't had a chance to test what you suggest. I'll get there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants