Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICML writer: Differentiate hyperlinks and cross-references #5541

Closed
nathan-artist opened this issue Jun 1, 2019 · 12 comments
Closed

ICML writer: Differentiate hyperlinks and cross-references #5541

nathan-artist opened this issue Jun 1, 2019 · 12 comments

Comments

@nathan-artist
Copy link

nathan-artist commented Jun 1, 2019

ICML differentiates between hyperlinks (external links) and cross-references (document-internal links). Pandoc's ICML writer does not correctly write document-internal links. I will summarize below the output changes that are needed to fix this; the full input and output files for the markup below can be found in the related topic on the pandoc-discuss list (see link-citations in ICML writer). It is likely not a simple one-line change but does not require a massive overhaul of the entire writer either, only the part that writes links. I don't know enough about Haskell to fix the writer myself, though I may try if nobody else wants to fix this.

There are essentially four things to change, enumerated below:

  1. A <CrossReferenceFormat> element should be present just before the <Story> element. The following example worked in my test:
    <CrossReferenceFormat Self="u1" Name="Text Anchor Name">
      <BuildingBlock Self="u1BuildingBlock0" BlockType="BookmarkNameBuildingBlock" CustomText="$ID/" AppliedDelimiter="$ID/" IncludeDelimiter="false" />
    </CrossReferenceFormat>
  1. Internal-link source points should use a <CrossReferenceSource> element, NOT a <HyperlinkTextSource> element (which is for external links only). The Self attribute of the <CrossReferenceFormat> tag (from number 1 above) should be referenced in the AppliedFormat attribute of <CrossReferenceSource> tags, and the Name attribute of <CrossReferenceSource> tags should have a relevant value. For example:
  • Current incorrect output of internal-link source point:
  <HyperlinkTextSource Self="htss-1" Name="" Hidden="false">
    <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Cite Link">
      <Content>2017</Content>
    </CharacterStyleRange>
  </HyperlinkTextSource>
  • Desired correct output of internal-link source point:
  <CrossReferenceSource Self="htss-1" AppliedFormat="u1" Name="2017" Hidden="false">
    <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Cite Link">
      <Content>2017</Content>
    </CharacterStyleRange>
  </CrossReferenceSource>
  1. Internal-link destination points should use a <HyperlinkTextDestination> element, NOT a <HyperlinkURLDestination> element (which is for external links only). <HyperlinkTextDestination> elements for internal-link destination points should be written at relevant points in the document, NOT at end of file, and the Name attribute of <HyperlinkTextDestination> tags should have a relevant value. For example:
  • Current incorrect output of internal-link destination point (written just before the <Hyperlink> element at end of file):
  <HyperlinkURLDestination Self="HyperlinkURLDestination/#ref-Citation1%3a2017" Name="link" DestinationURL="#ref-Citation1:2017" DestinationUniqueKey="1" />
  <Hyperlink Self="uf-1" Name="#ref-Citation1:2017" Source="htss-1" Visible="true" DestinationUniqueKey="1">
    <Properties>
      <BorderColor type="enumeration">Black</BorderColor>
      <Destination type="object">HyperlinkURLDestination/#ref-Citation1%3a2017</Destination>
    </Properties>
  </Hyperlink>
  • Desired correct output of internal-link source point...

...in this example, written just before the relevant bibliography entry:

    <HyperlinkTextDestination Self="HyperlinkTextDestination/#ref-Citation1%3a2017" Name="#ref-Citation1:2017" Hidden="false" DestinationUniqueKey="9" />
    <Content>Last1, F. (2017). </Content>

...with the corresponding <Hyperlink> element written at end of file, and without the DestinationURL attribute (which is for external links only):

  <Hyperlink Self="uf-1" Name="#ref-Citation1:2017" Source="htss-1" Visible="true" DestinationUniqueKey="9">
    <Properties>
      <BorderColor type="enumeration">Black</BorderColor>
      <Destination type="object">HyperlinkTextDestination/#ref-Citation1%3a2017</Destination>
    </Properties>
  </Hyperlink>
  1. The DestinationUniqueKey attribute of each <Hyperlink> tag and its corresponding <HyperlinkURLDestination> or <HyperlinkTextDestination> tag (for external and internal links respectively) should be unique, NOT the same. For example:
  • Current incorrect output showing a list of <Hyperlink> tags with identical DestinationUniqueKey attributes:
  <HyperlinkURLDestination Self="HyperlinkURLDestination/https%3a//pandoc.org/MANUAL.html" Name="link" DestinationURL="https://pandoc.org/MANUAL.html" DestinationUniqueKey="1" />
  <Hyperlink Self="uf-9" Name="https://pandoc.org/MANUAL.html" Source="htss-9" Visible="true" DestinationUniqueKey="1">
    <Properties>
      <BorderColor type="enumeration">Black</BorderColor>
      <Destination type="object">HyperlinkURLDestination/https%3a//pandoc.org/MANUAL.html</Destination>
    </Properties>
  </Hyperlink>
  <HyperlinkURLDestination Self="HyperlinkURLDestination/https%3a//www.adobe.com/InDesign" Name="link" DestinationURL="https://www.adobe.com/InDesign" DestinationUniqueKey="1" />
  <Hyperlink Self="uf-8" Name="https://www.adobe.com/InDesign" Source="htss-8" Visible="true" DestinationUniqueKey="1">
    <Properties>
      <BorderColor type="enumeration">Black</BorderColor>
      <Destination type="object">HyperlinkURLDestination/https%3a//www.adobe.com/InDesign</Destination>
    </Properties>
  </Hyperlink>
  <HyperlinkURLDestination Self="HyperlinkURLDestination/#de-optimo-modo-percipiendi" Name="link" DestinationURL="#de-optimo-modo-percipiendi" DestinationUniqueKey="1" />
  <Hyperlink Self="uf-7" Name="#de-optimo-modo-percipiendi" Source="htss-7" Visible="true" DestinationUniqueKey="1">
    <Properties>
      <BorderColor type="enumeration">Black</BorderColor>
      <Destination type="object">HyperlinkURLDestination/#de-optimo-modo-percipiendi</Destination>
    </Properties>
  </Hyperlink>
  • Desired correct output showing a list of <Hyperlink> tags with unique DestinationUniqueKey attributes (and the last <Hyperlink> tag, because it is an internal link, correctly lacks a companion <HyperlinkURLDestination> element):
  <HyperlinkURLDestination Self="HyperlinkURLDestination/https%3a//pandoc.org/MANUAL.html" Name="https%3a//pandoc.org/MANUAL.html" DestinationURL="https://pandoc.org/MANUAL.html" DestinationUniqueKey="1" />
  <Hyperlink Self="uf-9" Name="https://pandoc.org/MANUAL.html" Source="htss-9" Visible="true" DestinationUniqueKey="1">
    <Properties>
      <BorderColor type="enumeration">Black</BorderColor>
      <Destination type="object">HyperlinkURLDestination/https%3a//pandoc.org/MANUAL.html</Destination>
    </Properties>
  </Hyperlink>
  <HyperlinkURLDestination Self="HyperlinkURLDestination/https%3a//www.adobe.com/InDesign" Name="https%3a//www.adobe.com/InDesign" DestinationURL="https://www.adobe.com/InDesign" DestinationUniqueKey="2" />
  <Hyperlink Self="uf-8" Name="https://www.adobe.com/InDesign" Source="htss-8" Visible="true" DestinationUniqueKey="2">
    <Properties>
      <BorderColor type="enumeration">Black</BorderColor>
      <Destination type="object">HyperlinkURLDestination/https%3a//www.adobe.com/InDesign</Destination>
    </Properties>
  </Hyperlink>
  <Hyperlink Self="uf-7" Name="#de-optimo-modo-percipiendi" Source="htss-7" Visible="true" DestinationUniqueKey="3">
    <Properties>
      <BorderColor type="enumeration">Black</BorderColor>
      <Destination type="object">HyperlinkTextDestination/#de-optimo-modo-percipiendi</Destination>
    </Properties>
  </Hyperlink>
@jgm
Copy link
Owner

jgm commented Jun 1, 2019

Thanks for the excellent bug report!

mb21 added a commit to mb21/pandoc that referenced this issue Jun 2, 2019
mb21 added a commit to mb21/pandoc that referenced this issue Jun 2, 2019
@mb21
Copy link
Collaborator

mb21 commented Jun 2, 2019

Thanks a lot for your report, exactly the info I needed!

I've started implementing this... most of your points should be implemented, but I haven't looked at the tests yet. A few questions:

  1. Is there an equivalent to the CrossReferenceFormat thing for external links as well? Currently, we don't have that. Seems like it's used only for styling the link with the AppliedFormat attribute?

  2. <HyperlinkTextDestination> elements for internal-link destination points should be written at relevant points in the document, NOT at end of file

    What's the reason for this? Since the <HyperlinkTextDestination> is the analogue of the <HyperlinkURLDestination>, wouldn't it make sense to place them both at the end? (Or both right after the corresponding link source in the body text?) This is also easier to do in the current source code...

  3. the Name attribute of <CrossReferenceSource> tags should have a relevant value

    What is this used for? Is this only for the GUI somewhere? What's a good value? Does the same go for external links? Currently, it's set to the title of the link or the empty string. Try for example this markdown: [link text](http://pandoc.org "my link title"). I see I couldn't make my mind up about this: the Name attribute on the <Hyperlink> element is actually the url/href. So what's the name used for there?

@nathan-artist
Copy link
Author

  1. Is there an equivalent to the CrossReferenceFormat thing for external links as well? Currently, we don't have that. Seems like it's used only for styling the link with the AppliedFormat attribute?

<CrossReferenceFormat> is only for internal links (cross-references). InDesign usually uses this to automatically update certain cross-reference tags that it calls "building blocks" (hence the <BuildingBlock> element within the <CrossReferenceFormat> element); for example, this could include page numbers, paragraph numbers, or other variables that are automatically updated in the text. In InDesign's GUI, users can edit these building blocks. In my tests, the <CrossReferenceFormat> element (and corresponding AppliedFormat attribute of <CrossReferenceSource> tags) appeared to be necessary for cross-references to work, even though the content of <CrossReferenceSource> chosen here is not something that InDesign can automatically update.

When the markup chosen here (the "desired-output.icml" file that I provided on pandoc-discuss) is imported into InDesign, an "update" icon appears next to each cross-reference in the Hyperlinks pane. The InDesign reference manual (p. 434) explains: "An update icon indicates that the cross-reference destination text has changed or that the cross-reference source text has been edited." This is because in the file "desired-output.icml", Pandoc's ICML writer has "edited" (written) the cross-reference source text to be whatever is in the <Content> element of the relevant <CrossReferenceSource> element, instead of one of InDesign's building blocks. For example:

  <CrossReferenceSource Self="htss-1" AppliedFormat="u1" Name="2017" Hidden="false">
    <CharacterStyleRange AppliedCharacterStyle="CharacterStyle/Cite Link">
      <Content>2017</Content>
    </CharacterStyleRange>
  </CrossReferenceSource>

I can't see any better way to do this given InDesign's limitations, and it works well enough.

  1. <HyperlinkTextDestination> elements for internal-link destination points should be written at relevant points in the document, NOT at end of file

What's the reason for this? Since the <HyperlinkTextDestination> is the analogue of the <HyperlinkURLDestination>, wouldn't it make sense to place them both at the end? (Or both right after the corresponding link source in the body text?) This is also easier to do in the current source code...

This has a simple answer: <HyperlinkTextDestination> is the analogue of the <HyperlinkURLDestination>, but since <HyperlinkTextDestination> represents an internal destination point, it needs to be written at the point in the file at which the internal link points. <HyperlinkURLDestination> is written at end of file because it doesn't "point inside" the document; it points at an external URL.

Here's an analogy to HTML that comes to mind: the <HyperlinkTextDestination> tag is analogous to named anchor tags in HTML4, where <a href="#value">text</a> points to named anchor tag <a name="value" /> at some other point in the document. Pandoc's HTML and DOCX writers, for example, do this correctly, so there may be some clue about how to implement this in those writers.

There is another relevant way of doing internal links in ICML using <ParagraphDestination> instead of <HyperlinkTextDestination> but that would involve changing values of other elements, and I didn't want to complicate things by presenting a completely different option that would not be any easier to implement.

  1. the Name attribute of <CrossReferenceSource> tags should have a relevant value

What is this used for? Is this only for the GUI somewhere? What's a good value? Does the same go for external links? Currently, it's set to the title of the link or the empty string. Try for example this markdown: [link text](http://pandoc.org "my link title"). I see I couldn't make my mind up about this: the Name attribute on the <Hyperlink> element is actually the url/href. So what's the name used for there?

This value is used by InDesign in several link-related dialogue boxes ("Hyperlink Options...", "Cross-Reference Options...", etc.) in a drop-down menu that lists all the links.

It appears that the best value for the Name attribute of a <CrossReferenceSource> tag would be the same as the Name attribute of the corresponding <Hyperlink> tag. Several of the Name values that I provided in the file "desired-output.icml" are wrong; instead of <CrossReferenceSource Self="htss-1" AppliedFormat="u1" Name="2017" Hidden="false"> it would be <CrossReferenceSource Self="htss-1" AppliedFormat="u1" Name="#ref-Citation1:2017" Hidden="false">. So, yes, it would be the same as the value of the href attribute in HTML.

Thanks so much for working on this.

@mb21
Copy link
Collaborator

mb21 commented Jun 2, 2019

it needs to be written at the point in the file at which the internal link points.

Ah, that's what I was missing. Yes, that makes sense, but is somewhat tricky to implement, since a lot of elements can be linked to (every element that can have an id attribute). I'll take a look...

@nathan-artist
Copy link
Author

A year after opening this issue, I just wanted to note that I think this issue is still worth working on, and I am willing to help in any way that I can!

@lrosenthol
Copy link
Contributor

I just discovered this as well and would be willing to do what I can to help. Is anyone actively working on the ICML writer?

@mb21
Copy link
Collaborator

mb21 commented Aug 3, 2020

I've written most of the ICML writer a long time ago... since then, I haven't been actively working on it anymore... but pulls welcome! I'm also happy to answer any questions...

@lrosenthol
Copy link
Contributor

Looked over the code last night...

One question @mb21, have you ever tried to use a Lua filter to write direct ICML? I've used them for other formats incl. docx and HTML, but wonder if it would work here as well because of the XML...

@mb21
Copy link
Collaborator

mb21 commented Aug 3, 2020

not sure I understand your question... most of pandoc is written in Haskell, which is a much nicer programming language for big projects. to do a little bit of AST transformations, lua is great though.

btw. if you want a lua writer, you should look at https://pandoc.org/MANUAL.html#custom-writers

@lrosenthol
Copy link
Contributor

Spend some time looking at this today, and there are appears to be two possible approaches to getting internal links works in ICML
1 - there is the cross-reference approach that @nathan-artist mentions above
2 - there is a simpler "text anchor" model, which maps fairly well to the HTML or Docx models of bookmarks/anchors.

#2 uses a combination of three things to make it operational

  • HyperlinkTextDestination element preceeding the element that will be linked to (with appropriate name and unique ID)
  • HyperlinkTextSource element surrounding the text to be the link
  • Hyperlink element, outside the Story block that connects the two pieces

The first two could be done with a pandoc filter (I've done it for docx and could easily adjust for ICML), but since those only allow changes to the content, it's not possible to do #3. That has to be done in the writer itself. I looked at the writer code and while I can read and understand it - haskell is just quirky enough that I might break something.

@mb21
Copy link
Collaborator

mb21 commented Aug 5, 2020

thanks for the info! I probably won't have time to implement this anytime soon, but maybe someone else wants to give it a shot? Or a first step would be to figure out some example XML that we'd need to generate...

haskell is just quirky enough that I might break something.

the compiler and test-suite most probably will catch it :-)

@nathan-artist
Copy link
Author

Thanks @lrosenthol, I am looking forward to testing this when the PR is accepted!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants