Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for internal document links in ICML (#5541) #6606

Merged
merged 1 commit into from
Sep 10, 2020

Conversation

lrosenthol
Copy link
Contributor

This PR addresses #5541 by supporting internal links when generating ICML. I added two tests - one for testing that URL links aren't broken and one for testing the new inter-doc functionality.

This is my first PR for Pandoc and my first time writing Haskell, so all comments welcome!

@lrosenthol
Copy link
Contributor Author

I see that I forgot to update the standard ICML tests. Will do...

Comment on lines 546 to 563
makeContent :: Doc Text -> Doc Text
makeContent cont | isEmpty cont = empty
| length (show cont) < 25 = vcat [
selfClosingTag "HyperlinkTextDestination"
[("Self", "HyperlinkTextDestination/"<>makeDestName (render Nothing cont)), ("Name","Destination"), ("DestinationUniqueKey","1")]
, inTagsSimple "Content" $ flush cont
]
| otherwise = inTagsSimple "Content" $ flush cont
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... not sure I understand what's going on here... why do add the HyperlinkTextDestination only if the content is less than 25 long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above the method says it all

-- | NOTE: since we have no easy way to get actual named dests, we just create them for any short content blocks

AFAICT, there isn't anything in the Pandoc AST for a "destination", "bookmark" or "anchor", which is what I need here. I know I could do it with a custom span attribute that I would read in filter, but here I need it in the "core" writer. Instead, as a bit of a "hack", I just make any short content fragment as a link destination.

If you have a suggestion about how to set these bookmarks in the source - that I can find in the writer - I'm all ears!!

Copy link
Collaborator

@mb21 mb21 Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for that, the pandoc AST follows the HTML/markdown approach where you just can have an id attribute on any element (that has Attr), then link to it with #. (and yes, for headers, the id are auto-generated..) e.g.

$ pandoc -t native
# my title

my [span]{#anchor}

link [to title](#my-title)

link [to span](#anchor)

^D

[Header 1 ("my-title",[],[]) [Str "my",Space,Str "title"]
,Para [Str "my",Space,Span ("anchor",[],[]) [Str "span"]]
,Para [Str "link",Space,Link ("",[],[]) [Str "to",Space,Str "title"] ("#my-title","")]
,Para [Str "link",Space,Link ("",[],[]) [Str "to",Space,Str "span"] ("#anchor","")]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - that was 1/3 of what I needed. The second third was figuring out how that shows up inside the writer itself, and in looking at the HTML and DocX writers, I worked that out (first param to the element!).

Now I have to figure out how to pass it through to the actual content writer...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accomplished!

@lrosenthol
Copy link
Contributor Author

@mb21 I've implemented proper support for named destinations/links/anchors (whatever they are called) for most of the standard types. I didn't do it for images and special divs - and will file a separate issue for those.

Hopefully with these changes, we should be good to merge...

@lrosenthol
Copy link
Contributor Author

@mb21 @jgm What else do you need from me in order to get this PR merged? I'd like to move forward to addressing other ICML-related issues, and need this in main first (due to all the changes here). Thanks!

@mb21
Copy link
Collaborator

mb21 commented Aug 17, 2020

I'll try to take a closer look tomorrow... by just skimming the code so far I'm wondering:

  • we're trying to have a named destination for all elements for some reason? or only for those that haven id attribute in the pandoc AST?

  • do we need to make a distinction between what's called in HTML (and the pandoc AST) inline and block elements? (which roughly corresponds to what the ICML writer outputs as having character styles and paragraph styles applied). Because in pandoc, e.g. Image is an Inline element, but can still have an id set on it. I assume HyperlinkTextDestination can only occur inside CharacterStyleRange? If so, how would we handle pandoc -f html -t icml on this:

    <div id="blockId">
      inline text <span id="inlineId">element</span>
    </div>
    

    Could we just put in two HyperlinkTextDestination right behind each other?

  • just a thought: since we do have WriterState in the ICML writer, we could also think of moving things in there instead of passing in as function arguments (that's what I should have done with the opts already...). But yeah... in the end it doesn't really matter... the two are fundamentally equivalent, with passing in via arguments more verbose, but easier to understand for Haskell-newcomers...

@lrosenthol
Copy link
Contributor Author

Thanks a lot! Really appreciate your time!

we're trying to have a named destination for all elements for some reason? or only for those that haven id attribute in the pandoc AST?

only those that have an id attribute in the AST - but is that not all? I would think that you can apply an id to any element - both inline and block level.

If so, how would we handle pandoc -f html -t icml on this

That's a good question! I will investigate how InDesign writes that out...

since we do have WriterState in the ICML writer...

I don't know anything about the WriterState, so I just went with the extra arguments. If you give me some insight into WriterState - and think that is the better approach - I'll make the changes. You have more insight into Haskell and the "pandoc way" than I...

@lrosenthol
Copy link
Contributor Author

<div id="blockId">
  inline text <span id="inlineId">element</span>
</div>

would indeed end up with two HyperlinkTextDestination elements, one before inline text and one before element.

Which is what my code would currently output...

@mb21
Copy link
Collaborator

mb21 commented Aug 17, 2020

I would think that you can apply an id to any element - both inline and block level.

currently, the pandoc AST is somewhat limited: not all elements have Attr... but eventually we'll want to tackle this (#684).

Yeah, moving more stuff into the WriterState is probably best left to another pull request... since style and ident would both have to go in there... as it's things we collect to eventually output as part of the CharacterStyles...

@mb21
Copy link
Collaborator

mb21 commented Aug 17, 2020

and what about a more contrived example?

<div id="blockId">
  <div id="blockId2">
    <span id="inlineId">
      <img id="inlineId2" src="dummy.jpg" />
    </span>
  </div>
</div>

I guess we should just output four HyperlinkTextDestination elements at the same place?
If so, seems we need to pass around a list of identifiers [Text]... and while we're at it, maybe call the argument linkDests instead of ident?

@lrosenthol
Copy link
Contributor Author

@mb21 I added a test with your specific multilevel HTML. it only puts in the span IDs since destinations are tied to character ranges and not specific blocks (from what I can tell).

@lrosenthol
Copy link
Contributor Author

What else is necessary to get this merged?? @jgm @mb21

@mb21
Copy link
Collaborator

mb21 commented Sep 5, 2020

Okay, yeah... guess that will be a little surprising, but maybe nested ids are a rare enough use-case to not hold up this pull.

It's been quite some time since I worked with ICML, and when I wrote this writer, I was quite new to Haskell... so the longer I look at it, the more things I discover I would do different today... but yes, if you've tested this enough in InDesign, and feel people exporting to ICML could use this, then we should probably merge it :-)

@jgm
Copy link
Owner

jgm commented Sep 5, 2020

This looks good to me. I think @mb21 is right that using State to keep track of the style and identifier would be more elegant than threading through the parameters -- but of course it amounts to the same thing under the hood.

The one issue remaining is the overlong line in the commit message, flagged by CI above. But actually it's probably best if you rebase your changes into one commit with a nice message summing up the changes.

@lrosenthol
Copy link
Contributor Author

@jgm will do - thanks!

and I'll also add a new issue #6665 to track doing a code cleanup

@lrosenthol
Copy link
Contributor Author

Ugh! I just messed up the rebase... @jgm

Should I just start this PR over off another branch or are you OK with this?

@tarleb
Copy link
Collaborator

tarleb commented Sep 6, 2020

@lrosenthol it appears that there are git conflicts which would prevent merging, so I'm afraid these have to be resolved first. Can you rebase again?

@lrosenthol
Copy link
Contributor Author

OK - had to learn some new git commands, but I believe that everything is now good...watching the checks (definitely pass the commit length, so that's good - and no conflicts with base)

@tarleb
Copy link
Collaborator

tarleb commented Sep 7, 2020

Thanks! The macos CI failure is not related to the changes, it seems to affect all PRs.

@lrosenthol
Copy link
Contributor Author

@jgm back to you for merging...

@jgm
Copy link
Owner

jgm commented Sep 10, 2020

@tarleb I wish I could figure out that macos failure, which keeps happening.
I have been able to get past it by altering stack.yaml, which expires the cache.
But that's ugly.
Maybe we should switch to using cabal for the macos build, to see if that helps.

@jgm jgm merged commit ef4f514 into jgm:master Sep 10, 2020
@jgm
Copy link
Owner

jgm commented Sep 10, 2020

Many thanks and congrats on your first Haskell PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants