-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TeX input "splitting up" \mathXX{foo} #2595
Comments
I'm not sure I understand. In LaTeX, I also don't understand the comment about missing glyphs and their shaping. Sorry! |
In my experience, your example is unrepresentative. (In my opinion, it's also a perverse, but my argument doesn't depend on that.) In a typical use of
I expect it to be vocalized as "area equals length times width", not as "ay ar ee ay equals...". If I wanted that second reading, I would write
I can provide more examples from real documents if that would be helpful. |
OK, @davidmjones, thanks for the additional information. I understand your viewpoint, and I believe you that many people use the macros in this way. I did some Googling and while the vast majority of uses are for a single letter where it doesn't matter, I do see this usage in the wild. But I also do see other usage, such as My suggestion, then, would be to make these macros check the contents, and if it is a string of letters (where what counts as a letter is configurable), then enclose it in an In the meantime, you could use
or the equivalent in the |
To my dismay, it's extremely hard to distill the point I want to make into a simple statement without getting bogged down in endless exceptions and special cases. So, in case it's not obvious already, let me state up front that I know there is no perfect solution when you try to assign semantics based on visual markup. Nevertheless, I think there is a clear pattern of using these macros to encode identifiers based on natural language into, and I think it's important to try to preserve those semantics where possible for the benefit of the text-to-speech engine. I also think there are some relatively simple heuristics that would capture most of those patterns without breaking anything.
Right. I never meant to imply otherwise. In TeX, these macros only affect letters and digits, so those are the only characters that require special handling. [Technically, it applies to any character whose math class is 7, which also includes Greek symbols like BTW, that's why something like
That would be a huge improvement, but my suggestion would be to combine into a single
This doesn't apply to things like |
Here is a configuration that implements the proposal I made above: MathJax = {
tex: {packages: {'[+]': ['math-fonts']}},
startup: {
ready() {
//
// These would be replaced by import commands if you wanted to make
// a proper extension.
//
const {Configuration} = MathJax._.input.tex.Configuration;
const {CommandMap} = MathJax._.input.tex.SymbolMap;
const BaseMethods = MathJax._.input.tex.base.BaseMethods.default;
const TexParser = MathJax._.input.tex.TexParser.default;
//
// Remap \mathrm, etc. to be able to create single <mi> elements
//
new CommandMap('math-fonts', {
mathrm: ['MathFont', 'normal'],
mathbf: ['MathFont', 'bold'],
mathit: ['MathFont', '-tex-mathit'], // internal variant for text italic font
mathsf: ['MathFont', 'sans-serif'],
mathtt: ['MathFont', 'monospace']
}, {
MathFont(parser, name, variant) {
const text = parser.GetArgument(name);
//
// Check if the argument is a string of letters only
// Make a single <mi> of them if so, otherwise
// Parse the argument as normal.
//
if (text.match(/^[a-z]+$/i)) {
parser.Push(parser.create('token', 'mi', {mathvariant: variant}, text));
} else {
let mml = new TexParser(text, {...parser.stack.env, font: variant}, parser.configuration).mml();
if (mml.isKind('inferredMrow')) {
mml = parser.create('node', 'mrow', mml.childNodes);
}
parser.Push(mml);
}
}
});
Configuration.create('math-fonts', {
handler: {macro: ['math-fonts']}
});
MathJax.startup.defaultReady();
}
}
} in case you want to try that out.
[Actually, I don't think it doesn't apply to the lower-case Greek letters, only the upper-case ones. Because I knew that that is how
MathJax's parsing does not produce TeX math lists, and so this characterization is not natural within MathJax. It would require significant changes to the parser to be able to accomplish it, and while one might suggest trying to combine nodes returned in the Can you give an example of where your algorithm would be needed (in place of the one I give above)?
I'm a bit concerned about the inconsistency of having some of these macros combine characters into one TeX doesn't have a separate "math bold" and "text bold" font (they are labeled "text fonts" in Appendix F of the TeXbook, so I guess Knuth considered them text fonts); the only distinction is between math italics (cmmi) and text italics (cmit). While Unicode does have a distinction (the text font being in the usual ASCII range and the math font in the Math Alphanumerics block), MathML doesn't give a natural means of accessing the text versions (as I describe above). So while TeX thinks of bold as a text font, MathML thinks of it as a math font. Similarly, MathJax doesn't have separate text and math fonts, except for italics, where it uses a special internal math variant to handle the text italics. So when you use |
Coincidentally, I've been rereading the unicode-math documentation, which I had forgotten spends quite a bit of space discussing exactly these issues. See especially sections 3.1 and 4.4, but also parts of section 5 in http://mirrors.ctan.org/macros/unicodetex/latex/unicode-math/unicode-math.pdf.
Yes, listing
Needed? No, as long as the user knows what they are doing and uses the commands carefully, I think your solution is probably sufficient.
Yup. It's a mess. Damn Knuth for not anticipating Unicode when he designed TeX and the Computer Modern fonts in the mid 70s. :)
Interesting. I don't think I knew that. FWIW, here's what various alphabets give you by default:
Like I said, it's a mess. |
I shouldn't have folded so quickly. Originally I had a couple of cases in mind. First, something like
That's easily taken care of by recoding it the way I originally coded it above. Since I'm monolingual and the AMS publishes almost exclusively in English, I can't come up with any examples of the other use case, but I can imagine someone wanting to using an identifier with letters outside of the ASCII range. I think that's what @pkra had in mind when he mentioned missing glyphs and shaping at the top. |
Of course, that was the one I really wanted to see. :-)
That link was very useful, thank you. I'm thinking about how best to incorporate that information into MathJax.
Absolutely. I hard coded the pattern in the example above, but it would be a configurable value if it is to be included in Mathjax itself, so those using other languages could include the characters they need. |
To be clear, the bug is that it was using cmbx10, not that it was mapping the character to U+0061; that's the expected behaviour from the documentation. It did inspire me to make a more thorough catalog of the math alphabets supported by the unicode-math package, though. Here's the result: mathalpha.pdf. It makes for interesting if somewhat maddening reading.
@pkra and I have been working on a MathJax extension to support the unicode-math package. It's in a private repo at the moment, but we hope to make a beta version public soon. Maybe that would be a good place to experiment with the math alphabet support? |
Volker pointed out to me a suggestion for how to do something more like what you have suggested in terms of grouping multiple letters together. Here is an implementation for that: MathJax = {
tex: {packages: {'[+]': ['math-fonts']}},
startup: {
ready() {
//
// These would be replaced by import commands if you wanted to make
// a proper extension.
//
const {Configuration} = MathJax._.input.tex.Configuration;
const {CommandMap, RegExpMap} = MathJax._.input.tex.SymbolMap;
const TexParser = MathJax._.input.tex.TexParser.default;
const ParseMethods = MathJax._.input.tex.ParseMethods.default;
new RegExpMap('multi-letter', function (parser, c) {
if (parser.stack.env.multiLetterIdentifiers) {
c = parser.string.substr(parser.i-1).match(/^[a-z]+/i)[0];
}
ParseMethods.variable(parser, c);
parser.i += c.length - 1;
}, /[a-z]/i);
new CommandMap('math-fonts', {
mathrm: ['MathFont', 'normal'],
mathbf: ['MathFont', 'bold'],
mathit: ['MathFont', '-tex-mathit'], // internal variant for text italic font
mathsf: ['MathFont', 'sans-serif'],
mathtt: ['MathFont', 'monospace']
}, {
MathFont(parser, name, variant) {
const text = parser.GetArgument(name);
const old = parser.stack.env.multiLetterIdentifiers;
parser.stack.env.multiLetterIdentifiers = true;
let mml = new TexParser(text, {...parser.stack.env, font: variant}, parser.configuration).mml();
if (!old) {
delete parser.stack.env.multiLetterIdentifiers;
}
if (mml.isKind('inferredMrow')) {
mml = parser.create('node', 'mrow', mml.childNodes);
}
parser.Push(mml);
}
});
const mathFonts = Configuration.create('math-fonts', {
handler: {
character: ['multi-letter'],
macro: ['math-fonts'],
}
});
MathJax.startup.defaultReady();
}
}
} This adds a character map that (conditionally) turns on multi-character identifiers, so that within <mi mathvariant="normal">area</mi>
<mo mathvariant="normal">=</mo>
<mi mathvariant="normal">length</mi>
<mo mathvariant="normal">×</mo>
<mi mathvariant="normal">width</mi> This is slightly different from what you suggest, in that <msup>
<mi mathvariant="normal">inch</mi>
<mn>3</mn>
</msup> rather than the <mi mathvariant="normal">inc</mi>
<msup>
<mi mathvariant="normal">h</mi>
<mn>3</mn>
</msup> that your algorithm would produce, and something like \newcommand{\a}{a}
\mathbf{a\a} will produce <mi mathvariant="normal">a</mi>
<mi mathvariant="normal">a</mi> rather than <mi mathvariant="normal">aa</mi> that your approach would produce. Anyway, it turns out that your area example can be handled reasonably. |
Thank you for this. I haven't had a chance to take a close look at it or try it out yet, but I wanted to comment on this part:
If that's what my algorithm would produce, my algorithm was clearly wrong.
Fair enough. That's weird enough that I'm not too worried about how it comes out. |
…e multi-letter <mi> elements that are not auto-converted to OP elements. (mathjax/MathJax#2595)
I've made a PR to implement the solution above, and added the remaining |
Add support for all \mathXYZ and \symXYZ macros using multi-letter <mi>. (mathjax/MathJax#2595)
As per discussion with @zorkow, filing this here.
@davidmjones recently pointed out to me that (from a TeX perspective), things like \mathrm{foo} and \mathbf{foo} and \mathsf{foo}, etc., shouldn't be split up into individual characters.
Avoiding this would be a nice improvement for accessibility (and situations with missing glyphs and their shaping maybe as well).
The text was updated successfully, but these errors were encountered: