Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping end of sentence characters #6613

Closed
Krasjet opened this issue Aug 13, 2020 · 9 comments
Closed

Escaping end of sentence characters #6613

Krasjet opened this issue Aug 13, 2020 · 9 comments

Comments

@Krasjet
Copy link

Krasjet commented Aug 13, 2020

Is there a way to escape end of sentence characters when converting to man pages?

Consider this example from OpenBSD's roff(7)

The RUNOFF typesetting system, whose input forms the basis for roff, was written in
MAD and FAP for the CTSS operating system by Jerome E. Saltzer in 1964? Doug McIlroy
rewrote it in BCPL in 1969, renaming it roff. Dennis M. Ritchie rewrote McIlroy's
roff in PDP-11 assembly for Version 1 AT&T UNIX, Joseph F. Ossanna improved roff and
renamed it nroff for Version 2 AT&T UNIX, then ported nroff to C as troff, which
Brian W. Kernighan released with Version 7 AT&T UNIX. In 1989, James Clarke
re-implemented troff in C++, naming it groff.

Pandoc will incorrectly break the abbreviation dot in names as sentence spaces:

$ pandoc -t man test.man | man --nh --nj -l -
The RUNOFF typesetting system, whose input forms the basis for
roff, was written in MAD and FAP for the CTSS operating system by
Jerome E.  Saltzer in 1964?  Doug McIlroy rewrote it in BCPL in
1969, renaming it roff.  Dennis M.  Ritchie rewrote McIlroy's
roff in PDP‐11 assembly for Version 1 AT&T UNIX, Joseph F.
Ossanna improved roff and renamed it nroff for Version 2 AT&T
UNIX, then ported nroff to C as troff, which Brian W.  Kernighan
released with Version 7 AT&T UNIX.  In 1989, James Clarke re‐
implemented troff in C++, naming it groff.

I'm currently using groff with man-db, but this problem should also affect mandoc on BSDs.

Abbreviation file should be a solution, but it doesn't seem to work for man pages.

Version

pandoc 2.9.2.1
Compiled with pandoc-types 1.20, texmath 0.12.0.2, skylighting 0.8.4
Default user data directory: /home/krasjet/.local/share/pandoc or /home/krasjet/.pandoc
Copyright (C) 2006-2020 John MacFarlane
Web:  https://pandoc.org
This is free software; see the source for copying conditions.
There is no warranty, not even for merchantability or fitness
for a particular purpose.
@Krasjet
Copy link
Author

Krasjet commented Aug 13, 2020

pandoc(1) appears to be affected by this issue

       pdfa   adds  to  the  preamble the setup necessary to generate PDF/A of
              the type specified, e.g.  1a:2005, 2a.  If no type is  specified
              (i.e.   the  value  is  set to True, by e.g.  --metadata=pdfa or
              pdfa: true in a YAML metadata block), 1b:2005 will  be  used  as
              default,  for reasons of backwards compatibility.  Using --vari‐
              able=pdfa without specified value is not supported.  To success‐
              fully  generate PDF/A the required ICC color profiles have to be
              available and the content and all included files  (such  as  im‐
              ages) have to be standard conforming.  The ICC profiles and out‐
              put intent may be specified using the  variables  pdfaiccprofile
              and pdfaintent.  See also ConTeXt PDFA for more details.

Pay attention to the spaces after e.g..

The issue is still there when justification and hyphenation are turned off:

       pdfa   adds to the preamble the setup necessary to generate PDF/A of
              the type specified, e.g.  1a:2005, 2a.  If no type is specified
              (i.e.  the value is set to True, by e.g.  --metadata=pdfa or
              pdfa: true in a YAML metadata block), 1b:2005 will be used as
              default, for reasons of backwards compatibility.  Using
              --variable=pdfa without specified value is not supported.  To
              successfully generate PDF/A the required ICC color profiles have
              to be available and the content and all included files (such as
              images) have to be standard conforming.  The ICC profiles and
              output intent may be specified using the variables
              pdfaiccprofile and pdfaintent.  See also ConTeXt PDFA for more
              details.

@jgm
Copy link
Owner

jgm commented Aug 13, 2020

The pandoc man page is generated using pandoc -f markdown-smart -t man.
If you don't use the -smart to disable the smart extension, pandoc will insert a unicode nonbreaking space after the 'e.g.' to distinguish it from a sentence ending space. So that would give you just the behavior you want -- though you'll also get smart quotes, which you may not want.

I can't actually recall why this feature -- inserting nonbreaking spaces after abbreviations -- is hooked up to the 'smart' extension rather than just operating by default.

@jgm
Copy link
Owner

jgm commented Aug 13, 2020

Here's what it looks like when you use -f markdown instead of -f markdown-smart:

       pdfa   adds to the preamble the setup necessary to  generate  PDF/A  of
              the  type  specified, e.g. 1a:2005, 2a.  If no type is specified
              (i.e. the value is set to True, by e.g. --metadata=pdfa or pdfa:
              true in a YAML metadata block), 1b:2005 will be used as default,
              for reasons of backwards compatibility.   Using  --variable=pdfa
              without  specified value is not supported.  To successfully gen-
              erate PDF/A the required ICC color profiles have to be available
              and  the content and all included files (such as images) have to
              be standard conforming.  The ICC profiles and output intent  may
              be  specified using the variables pdfaiccprofile and pdfaintent.
              See also ConTeXt PDFA for more details.

@jgm
Copy link
Owner

jgm commented Aug 13, 2020

I can't see any reason not to use the smart extension in generating man pages, so I'll switch to doing that for pandoc, which will resolve the issue noted above.

Note that pandoc has a list of abbreviations it uses for this. In your example, you have E., which isn't in this list. However, it is possible to specify additional abbreviations (or e.g. initials) using --abbreviations.

@jgm
Copy link
Owner

jgm commented Aug 13, 2020

If you want to do this manually, in pandoc's markdown you can simply do
Jerome E.\ Salzer and the escaped space will be treated as a nonbreaking space.

@Krasjet
Copy link
Author

Krasjet commented Aug 13, 2020

Somehow abbreviation file is not working for the roff(7) example.

I'm using

Dennis M.
Jerome E.
Joseph F.
Brian W.

as the abbreviation file, and pandoc is still breaking names

$ pandoc --abbreviations=abbr -f markdown -t man test.md | nroff
The  RUNOFF  typesetting  system, whose input forms the basis for
roff, was written in MAD and FAP for the CTSS operating system by
Jerome  E.   Saltzer in 1964?  Doug McIlroy rewrote it in BCPL in
1969, renaming it roff.  Dennis  M.   Ritchie  rewrote  McIlroy’s
roff  in  PDP‐11 assembly for Version 1 AT&T UNIX, Joseph F.  Os‐
sanna improved roff and renamed it nroff for Version 2 AT&T UNIX,
then  ported  nroff  to C as troff, which Brian W.  Kernighan re‐
leased with Version 7 AT&T UNIX.  In 1989, James Clarke re‐imple‐
mented troff in C++, naming it groff.

@Krasjet
Copy link
Author

Krasjet commented Aug 13, 2020

Though I can confirm that the escaped space does work.

@jgm
Copy link
Owner

jgm commented Aug 13, 2020

The abbreviation strings can't contain whitespace.
Try just

M.
E.
F.
W.

@Krasjet
Copy link
Author

Krasjet commented Aug 13, 2020

Thanks, this solution does work. I'll close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants