TeX input "splitting up" \mathXX{foo} #2595

pkra · 2020-12-30T13:18:24Z

As per discussion with @zorkow, filing this here.

@davidmjones recently pointed out to me that (from a TeX perspective), things like \mathrm{foo} and \mathbf{foo} and \mathsf{foo}, etc., shouldn't be split up into individual characters.

Avoiding this would be a nice improvement for accessibility (and situations with missing glyphs and their shaping maybe as well).

dpvc · 2020-12-30T15:03:53Z

I'm not sure I understand. In LaTeX, \mathrm{x+y} is displayed with the proper spacing around the +, for example, and that can't be done unless the characters are interpreted as math. My understanding is that \mathbf changes some fo the math fonts, but does not otherwise alter the way the math is processed. Of course \textrm{x+y} is treated as a single text item, and the plus will not have extra space around it. Perhaps that is what you want?

I also don't understand the comment about missing glyphs and their shaping. Sorry!

davidmjones · 2020-12-31T00:47:12Z

In my experience, your example is unrepresentative. (In my opinion, it's also a perverse, but my argument doesn't depend on that.) In a typical use of \mathrm, \mathit, \mathbf, \mathsf, etc., it's clear that the argument to the macro is meant to be treated as a single token. If I write

\mathrm{area} = \mathrm{length} \times \mathrm{width}

I expect it to be vocalized as "area equals length times width", not as "ay ar ee ay equals...". If I wanted that second reading, I would write

\mathrm{a}\mathrm{r}\mathrm{e}\mathrm{a} = ....

I can provide more examples from real documents if that would be helpful.

dpvc · 2021-01-02T17:56:21Z

OK, @davidmjones, thanks for the additional information. I understand your viewpoint, and I believe you that many people use the macros in this way. I did some Googling and while the vast majority of uses are for a single letter where it doesn't matter, I do see this usage in the wild. But I also do see other usage, such as \mathbf {^{208}Pb}, \mathbf {Au+Au}, and \mathbf{v\times u} along with more reasonable things like \mathbf{\Gamma} and \mathbf{\hat x}. These all require processing the contents as math (as the name \mathbf indicates it will do). And while you may consider most of these to be "perverse", I can't think of another reasonable way to obtain \mathbf{\hat x} so that the hat is the proper bold one.

My suggestion, then, would be to make these macros check the contents, and if it is a string of letters (where what counts as a letter is configurable), then enclose it in an <mi>, otherwise process it as math as is currently done. That should mean that existing math will not break, but your use would produce the desired output.

In the meantime, you could use

\newcommand{\mathMi}[1]{\mmlToken{mi}[mathvariant=#1]}
\renewcommand{\mathrm}{\mathMi{normal}}
\renewcommand{\mathbf}{\mathMi{bold}}
\renewcommand{\mathsf}{\mathMi{sans-serif}}
...

or the equivalent in the macros list of the tex block in your MathJax configuration (or 'Macros' and 'TeX' in v2).

davidmjones · 2021-01-05T00:09:21Z

To my dismay, it's extremely hard to distill the point I want to make into a simple statement without getting bogged down in endless exceptions and special cases. So, in case it's not obvious already, let me state up front that I know there is no perfect solution when you try to assign semantics based on visual markup.

Nevertheless, I think there is a clear pattern of using these macros to encode identifiers based on natural language into, and I think it's important to try to preserve those semantics where possible for the benefit of the text-to-speech engine. I also think there are some relatively simple heuristics that would capture most of those patterns without breaking anything.

These all require processing the contents as math (as the name \mathbf
indicates it will do).

Right. I never meant to imply otherwise. In TeX, these macros only affect letters and digits, so those are the only characters that require special handling. [Technically, it applies to any character whose math class is 7, which also includes Greek symbols like \alpha, \Gamma, etc., but I don't think those should be treated as letters.]

BTW, that's why something like \mathrm{x + y} strikes me as perverse: It has no affect on the +. An author might write \mathbf{x + y} expecting the + to become bold, but they would be disappointed.

My suggestion, then, would be to make these macros check the contents, and if it is a string of letters (where what counts as a letter is configurable), then enclose it in an , otherwise process it as math as is currently done.

That would be a huge improvement, but my suggestion would be to combine into a single <mi> element any sequence of math Ord atoms with the following properties:

They are contained in the argument of single \mathrm, \mathit, \mathsf, \mathtt, or \mathbf.
The nucleus of each Ord atom consists of a Unicode letter.
The subscript and superscript fields are empty.

This doesn't apply to things like \mathcal, \mathfrak or \mathbb since those are explicitly used to access distinct Mathematical Alphanumeric Symbols (to use the Unicode terminology), not a specific text font face. I left out \mathbfit because in the sample I looked at it's too rare to draw any conclusions about.

dpvc · 2021-01-19T21:44:06Z

Here is a configuration that implements the proposal I made above:

MathJax = {
  tex: {packages: {'[+]': ['math-fonts']}},
  startup: {
    ready() {
      //
      //  These would be replaced by import commands if you wanted to make
      //  a proper extension.
      //
      const {Configuration} = MathJax._.input.tex.Configuration;
      const {CommandMap} = MathJax._.input.tex.SymbolMap;
      const BaseMethods = MathJax._.input.tex.base.BaseMethods.default;
      const TexParser = MathJax._.input.tex.TexParser.default;

      //
      //  Remap \mathrm, etc. to be able to create single <mi> elements
      //
      new CommandMap('math-fonts', {
        mathrm: ['MathFont', 'normal'],
        mathbf: ['MathFont', 'bold'],
        mathit: ['MathFont', '-tex-mathit'],  // internal variant for text italic font
        mathsf: ['MathFont', 'sans-serif'],
        mathtt: ['MathFont', 'monospace']
      }, {
        MathFont(parser, name, variant) {
          const text = parser.GetArgument(name);
          //
          //  Check if the argument is a string of letters only
          //     Make a single <mi> of them if so, otherwise
          //     Parse the argument as normal.
          //
          if (text.match(/^[a-z]+$/i)) {
            parser.Push(parser.create('token', 'mi', {mathvariant: variant}, text));
          } else {
            let mml = new TexParser(text, {...parser.stack.env, font: variant}, parser.configuration).mml();
            if (mml.isKind('inferredMrow')) {
              mml = parser.create('node', 'mrow', mml.childNodes);
            }
            parser.Push(mml);
          }
        }
      });
      Configuration.create('math-fonts', {
        handler: {macro: ['math-fonts']}
      });

      MathJax.startup.defaultReady();
    }
  }
}

in case you want to try that out.

Technically, it applies to any character whose math class is 7, which also includes Greek symbols like \alpha, \Gamma, etc.,

[Actually, I don't think it doesn't apply to the lower-case Greek letters, only the upper-case ones. Because I knew that that is how \mathbf works, and I know that + is not class 7, it makes sense to me to use \mathbf{x + y} and not expect the + to be in bold. But I understand that not everyone knows those details.]

my suggestion would be to combine into a single <mi> element any sequence of math Ord atoms with the following properties:...

MathJax's parsing does not produce TeX math lists, and so this characterization is not natural within MathJax. It would require significant changes to the parser to be able to accomplish it, and while one might suggest trying to combine nodes returned in the mml variable above, that would be a rather fragile approach, as there is no indication of where the nodes came from, or if there was any spacing, etc. So \mathbf{x y} would produce <mi>xy</mi> which seems inappropriate.

Can you give an example of where your algorithm would be needed (in place of the one I give above)?

This doesn't apply to things like \mathcal, \mathfrak or \mathbb ...

I'm a bit concerned about the inconsistency of having some of these macros combine characters into one <mi> and other not. I'm not sure I buy the argument about the Math Alphanumerics block, because MathML doesn't have separate text and math fonts. That is, <mi mathvariant="bold">A</mi> is supposed to be treated identically to <mi>𝐀</mi>, and so when \mathbf{ABC} produces <mi mathvariant="bold">ABC</mi>, it is also producing values in the Math Alphanumeric block. Why should that be different for any other characters in that block. Why shouldn't \mathbb{ABC} produce <mi mathvariant="double-struck">ABC</mi> which is equivalent to <mi>𝔸𝔹ℂ</mi> (since double-struck C is in the Letterlike Symbols block, not the Math Alphanumerics)?

TeX doesn't have a separate "math bold" and "text bold" font (they are labeled "text fonts" in Appendix F of the TeXbook, so I guess Knuth considered them text fonts); the only distinction is between math italics (cmmi) and text italics (cmit). While Unicode does have a distinction (the text font being in the usual ASCII range and the math font in the Math Alphanumerics block), MathML doesn't give a natural means of accessing the text versions (as I describe above). So while TeX thinks of bold as a text font, MathML thinks of it as a math font.

Similarly, MathJax doesn't have separate text and math fonts, except for italics, where it uses a special internal math variant to handle the text italics. So when you use \mathbf{ABC} you will be getting the Math Alphanumeric versions.

davidmjones · 2021-01-20T22:25:49Z

Coincidentally, I've been rereading the unicode-math documentation, which I had forgotten spends quite a bit of space discussing exactly these issues. See especially sections 3.1 and 4.4, but also parts of section 5 in http://mirrors.ctan.org/macros/unicodetex/latex/unicode-math/unicode-math.pdf.

Technically, it applies to any character whose math class is 7, which also includes Greek symbols like \alpha, \Gamma, etc.,

[Actually, I don't think it doesn't apply to the lower-case Greek letters, only the upper-case ones.

Yes, listing \alpha was a mistake.

Can you give an example of where your algorithm would be needed (in place of the one I give above)?

Needed? No, as long as the user knows what they are doing and uses the commands carefully, I think your solution is probably sufficient.

This doesn't apply to things like \mathcal, \mathfrak or \mathbb ...

I'm a bit concerned about the inconsistency of having some of these macros combine characters into one <mi> and other not.

Yup. It's a mess. Damn Knuth for not anticipating Unicode when he designed TeX and the Computer Modern fonts in the mid 70s. :)

Similarly, MathJax doesn't have separate text and math fonts, except for italics, where it uses a special internal math variant to handle the text italics. So when you use \mathbf{ABC} you will be getting the Math Alphanumeric versions.

Interesting. I don't think I knew that. FWIW, here's what various alphabets give you by default:

\documentclass{article}

\usepackage{unicode-math}
\setmainfont{STIX Two Text}
\setmathfont{STIX Two Math}

\loggingoutput

\begin{document}

\textbf{a}          % STIXTwoText U+0061

$a$                 % STIXTwoMath U+1D44E

$\mathrm{a}$        % STIXTwoMath U+0061

$\mathbf{a}$        % cmbx10 U+0061 (surely a bug)

$\symbf{a}$         % STIXTwoMath U+1D41A

\end{document}

Like I said, it's a mess.

davidmjones · 2021-01-21T00:16:29Z

Can you give an example of where your algorithm would be needed (in place of the one I give above)?

Needed? No, as long as the user knows what they are doing and uses the commands carefully, I think your solution is probably sufficient.

I shouldn't have folded so quickly. Originally I had a couple of cases in mind. First, something like

\mathrm{area = length \times width}

That's easily taken care of by recoding it the way I originally coded it above.

Since I'm monolingual and the AMS publishes almost exclusively in English, I can't come up with any examples of the other use case, but I can imagine someone wanting to using an identifier with letters outside of the ASCII range. I think that's what @pkra had in mind when he mentioned missing glyphs and shaping at the top.

dpvc · 2021-01-21T14:44:01Z

$\mathbf{a}$ % cmbx10 U+0061 (surely a bug)

Of course, that was the one I really wanted to see. :-)

I've been rereading the unicode-math documentation

That link was very useful, thank you. I'm thinking about how best to incorporate that information into MathJax.

I can imagine someone wanting to using an identifier with letters outside of the ASCII range

Absolutely. I hard coded the pattern in the example above, but it would be a configurable value if it is to be included in Mathjax itself, so those using other languages could include the characters they need.

davidmjones · 2021-01-21T20:41:46Z

$\mathbf{a}$ % cmbx10 U+0061 (surely a bug)

Of course, that was the one I really wanted to see. :-)

To be clear, the bug is that it was using cmbx10, not that it was mapping the character to U+0061; that's the expected behaviour from the documentation.

It did inspire me to make a more thorough catalog of the math alphabets supported by the unicode-math package, though. Here's the result: mathalpha.pdf. It makes for interesting if somewhat maddening reading.

I've been rereading the unicode-math documentation

That link was very useful, thank you. I'm thinking about how best to incorporate that information into MathJax.

@pkra and I have been working on a MathJax extension to support the unicode-math package. It's in a private repo at the moment, but we hope to make a beta version public soon. Maybe that would be a good place to experiment with the math alphabet support?

dpvc · 2021-01-28T22:11:39Z

Volker pointed out to me a suggestion for how to do something more like what you have suggested in terms of grouping multiple letters together. Here is an implementation for that:

MathJax = {
  tex: {packages: {'[+]': ['math-fonts']}},
  startup: {
    ready() {
      //
      //  These would be replaced by import commands if you wanted to make
      //  a proper extension.
      //
      const {Configuration} = MathJax._.input.tex.Configuration;
      const {CommandMap, RegExpMap} = MathJax._.input.tex.SymbolMap;
      const TexParser = MathJax._.input.tex.TexParser.default;
      const ParseMethods = MathJax._.input.tex.ParseMethods.default;

      new RegExpMap('multi-letter', function (parser, c) {
        if (parser.stack.env.multiLetterIdentifiers) {
          c = parser.string.substr(parser.i-1).match(/^[a-z]+/i)[0];
        }
        ParseMethods.variable(parser, c);
        parser.i += c.length - 1;
      }, /[a-z]/i);

      new CommandMap('math-fonts', {
        mathrm: ['MathFont', 'normal'],
        mathbf: ['MathFont', 'bold'],
        mathit: ['MathFont', '-tex-mathit'],  // internal variant for text italic font
        mathsf: ['MathFont', 'sans-serif'],
        mathtt: ['MathFont', 'monospace']
      }, {
        MathFont(parser, name, variant) {
          const text = parser.GetArgument(name);
          const old = parser.stack.env.multiLetterIdentifiers;
          parser.stack.env.multiLetterIdentifiers = true;
          let mml = new TexParser(text, {...parser.stack.env, font: variant}, parser.configuration).mml();
          if (!old) {
            delete parser.stack.env.multiLetterIdentifiers;
          }
          if (mml.isKind('inferredMrow')) {
            mml = parser.create('node', 'mrow', mml.childNodes);
          }
          parser.Push(mml);
        }
      });
      const mathFonts = Configuration.create('math-fonts', {
        handler: {
          character: ['multi-letter'],
          macro: ['math-fonts'],
        }
      });

      MathJax.startup.defaultReady();
    }
  }
}

This adds a character map that (conditionally) turns on multi-character identifiers, so that within \mathbf{} and the others, multiple letters will be combined into a single identifier, while still processing everything else as normal. So \mathrm{area = length \times width} would produce

<mi mathvariant="normal">area</mi>
<mo mathvariant="normal">=</mo>
<mi mathvariant="normal">length</mi>
<mo mathvariant="normal">&#xD7;</mo>
<mi mathvariant="normal">width</mi>

This is slightly different from what you suggest, in that \mathrm{inch^3} will produce

<msup>
  <mi mathvariant="normal">inch</mi>
  <mn>3</mn>
</msup>

rather than the

<mi mathvariant="normal">inc</mi>
<msup>
  <mi mathvariant="normal">h</mi>
  <mn>3</mn>
</msup>

that your algorithm would produce, and something like

\newcommand{\a}{a}
\mathbf{a\a}

will produce

<mi mathvariant="normal">a</mi>
<mi mathvariant="normal">a</mi>

rather than

<mi mathvariant="normal">aa</mi>

that your approach would produce.

Anyway, it turns out that your area example can be handled reasonably.

davidmjones · 2021-02-09T01:15:20Z

Thank you for this. I haven't had a chance to take a close look at it or try it out yet, but I wanted to comment on this part:

This is slightly different from what you suggest, in that \mathrm{inch^3} will produce
<msup>
  <mi mathvariant="normal">inch</mi>
  <mn>3</mn>
</msup>
rather than the
<mi mathvariant="normal">inc</mi>
<msup>
  <mi mathvariant="normal">h</mi>
  <mn>3</mn>
</msup>
that your algorithm would produce,

If that's what my algorithm would produce, my algorithm was clearly wrong.

and something like
\newcommand{\a}{a}
\mathbf{a\a}
will produce
<mi mathvariant="normal">a</mi>
<mi mathvariant="normal">a</mi>
rather than
<mi mathvariant="normal">aa</mi>
that your approach would produce.

Fair enough. That's weird enough that I'm not too worried about how it comes out.

…e multi-letter <mi> elements that are not auto-converted to OP elements. (mathjax/MathJax#2595)

dpvc · 2021-03-31T17:04:46Z

I've made a PR to implement the solution above, and added the remaining \math* and the \sym* macros. This allows easy access from TeX to all the MathML variants, which was not the case before.

Add support for all \mathXYZ and \symXYZ macros using multi-letter <mi>. (mathjax/MathJax#2595)

dpvc added the Feature Request label Dec 30, 2020

dpvc added a commit to mathjax/MathJax-src that referenced this issue Mar 31, 2021

Add support for all \mathXYZ and \symXYZ macros, and have them produc…

608cbcf

…e multi-letter <mi> elements that are not auto-converted to OP elements. (mathjax/MathJax#2595)

dpvc mentioned this issue Mar 31, 2021

Add support for all \mathXYZ and \symXYZ macros using multi-letter <mi>. (mathjax/MathJax#2595) mathjax/MathJax-src#676

Merged

dpvc added Accepted Issue has been reproduced by MathJax team Ready for Review Test Needed v3 labels Mar 31, 2021

dpvc added this to the 3.1.3 milestone Mar 31, 2021

This was referenced Apr 1, 2021

support \mathsfit and \textsfit mathjax/MathJax-src#667

Closed

support \mathsfit and \textsfit mathjax/MathJax-src#666

Closed

dpvc added the Code Example Contains an illustrative code example, solution, or work-around label Apr 1, 2021

dpvc added a commit to mathjax/MathJax-src that referenced this issue Apr 20, 2021

Merge pull request #676 from mathjax/issue2595

3aeb542

Add support for all \mathXYZ and \symXYZ macros using multi-letter <mi>. (mathjax/MathJax#2595)

dpvc added Merged Merged into develop branch and removed Ready for Review labels Apr 20, 2021

pkra mentioned this issue Apr 22, 2021

Two-Character Unicode Support #2672

Open

This was referenced Apr 23, 2021

Bump mathjax-full from 3.1.2 to 3.1.4 tani/markdown-it-mathjax3#39

Closed

Bump mathjax-full from 3.1.2 to 3.1.4 elabftw/elabftw#2624

Closed

dependabot bot mentioned this issue Apr 26, 2021

chore(deps): bump mathjax-full from 3.1.2 to 3.1.4 uetchy/math-api#243

Closed

dpvc added Fixed v3.1 and removed Merged Merged into develop branch labels Apr 27, 2021

dpvc closed this as completed Apr 27, 2021

This was referenced May 7, 2021

Bump mathjax-full from 3.1.2 to 3.1.4 in /web bmybbs/bmybbs#190

Merged

Bump mathjax-full from 3.1.2 to 3.1.4 nschloe/purple-pi#23

Merged

mkuron mentioned this issue Jun 13, 2021

\mmlToken does not evaluate TeX in its argument #2706

Closed

dpvc mentioned this issue Sep 23, 2021

subtle? poor translation from TeX #2775

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TeX input "splitting up" \mathXX{foo} #2595

TeX input "splitting up" \mathXX{foo} #2595

pkra commented Dec 30, 2020

dpvc commented Dec 30, 2020

davidmjones commented Dec 31, 2020

dpvc commented Jan 2, 2021 •

edited

Loading

davidmjones commented Jan 5, 2021

dpvc commented Jan 19, 2021

davidmjones commented Jan 20, 2021

davidmjones commented Jan 21, 2021

dpvc commented Jan 21, 2021

davidmjones commented Jan 21, 2021

dpvc commented Jan 28, 2021

davidmjones commented Feb 9, 2021

dpvc commented Mar 31, 2021

TeX input "splitting up" \mathXX{foo} #2595

TeX input "splitting up" \mathXX{foo} #2595

Comments

pkra commented Dec 30, 2020

dpvc commented Dec 30, 2020

davidmjones commented Dec 31, 2020

dpvc commented Jan 2, 2021 • edited Loading

davidmjones commented Jan 5, 2021

dpvc commented Jan 19, 2021

davidmjones commented Jan 20, 2021

davidmjones commented Jan 21, 2021

dpvc commented Jan 21, 2021

davidmjones commented Jan 21, 2021

dpvc commented Jan 28, 2021

davidmjones commented Feb 9, 2021

dpvc commented Mar 31, 2021

dpvc commented Jan 2, 2021 •

edited

Loading