-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[plugin] Idea: Support YAML front matter in Markdown files #2391
Comments
But that doesn't make the content Markdown. We also can't differential between the top of a file and the middle of a file, so contextual things like this are a bit outside our grasp.
Does it, really? All it does it color the date, no matter how it's included, YAML or just inline text... I don't think it has anything to do with the YAML. I think this would be better solved with a plugin designed specifically for YAML + Markdown. Take a look at: https://github.com/highlightjs/highlight.js/blob/master/docs/plugin-api.rst You'd detect if the file was YAML + markdown BEFORE highlighting, then split it yourself, highlight both chunks separate then paste them back together and return that as the result. I think that would be possible with the new plugin support. All the glue might not be there, but if you wanted to work on a plug-in I'd look into the glue. I'd guess 20-30 lines of Javascript maybe. |
Okay, I thought that might be the case.
I suppose I meant that GitHub doesn't seem to highlight it incorrectly, for that small example I posted. I wasn't aware there was plugin support for highlight.js! I will take a look at writing a plugin for this. I've just had a quick look at the docs and noticed that |
Now that would be a better question. That might be fixable. Does a header require a blank line before it? We could check for that but then the problem becomes I don't think JS has @allejo Does that sound right to you? Thoughts?
Ok, so you can't coop the process completely currently, but you can still do it. You need to hook before and after. You can effect changes with |
My first thought would be to introduce front matter as a new language definition instead of trying to introduce complexity to the markdown syntax by detecting headers or the start of files. My thought would be to allow modifiers/combinations to languages. e.g.
Depending on the tool using the front matter files, the body of it can be in a number of languages which is why I would be hesitant to introduce it just to markdown. |
Thanks for helping me clarify why this rubbed me the wrong way. This (front matter) is really more of a "concept" than a "language" itself. It's a convention I've seen used with blogging software but I assume it's also used elsewhere since it's kind of a useful pattern. Also front matter doesn't necessarily have to be limited to YAML either... really if we had configurable grammars you could do this with something like: // you could alias it however you wanted, I just used "markdown" here to overwrite markdown, as per the original request - if say that was the ONLY context you were using HLJS
registerLanguage("markdown", frontmatter("yaml",{ content: "html" } ) As I said before though I don't think you could do this purely with a grammar now though - I think you could do it with a grammar + plugin though. The grammar I think would be really light and would lean on the plugin for doing the actual work (so it can run parse code, split the content, etc). Honestly I see grammar + plugin as a way to do all sorts of crazy complex behavior that you can't do inside grammars themselves. Is this the best thing long-term, I dunno... It probably needs some thought so we can come up with a common pattern for 3rd party grammars who want to do this. Actually I think this might be a good way to START things and then eventually you simply fold the plugins into the grammar itself... so a grammar would have a That would require adding hooks for |
@gregives It'd be quite cool if you build a plugin for this. That'd probably motivate me to go in and actually add support for |
Something like: (very raw) I need to think about how this works in light of autoHighlight (which would call your plugin a zillion times) - but perhaps that is a problem for plugin authors? And also need to think about sublanguage and how this handles continuations... that's really why I didn't tackle it at first and went with the much simpler highlightBLock instead. |
Thanks for the great conversation around this! So at the moment, are we thinking a plugin is the best way forward for now? I can take a look at a plugin this weekend if so. Out of interest, how does highlighting work for other 'nested' languages? For example, does highlight.js use the JavaScript definition for highlighting JavaScript within some HTML? Or is it duplicated within the HTML definition? |
@gregives Do you see how you might go about this? Define a dummy grammar and then some flag you store there enables the plug-in... So given the example earlier:
Perhaps return {
name: "markdown with yaml front matter",
frontmatterIs: "yaml",
bodyLanguage: "markdown",
usePlugin: "frontmatter",
// and perhaps long-term
"before:highlight" : function () {},
"after:highlight" : function () {},
} When your plugin runs then it's looking for the |
Well a plugin that could also be/provide/generate it's own grammar. :-) I think over time the line will blur, as mentioned above. We're playing with the future here. :-) I kind of showed you how you might go about it in the previous message.
Both, but often we do it right and use And theoretically you could use If JS supported |
Actually one could add this type functionality via plugin even, without even changing core. LOL. |
I agree that this should be a plugin. My main reason is because this concept can be expanded to highlighting embedded any language without complicating or bloating a language grammar. Say we want to highlight code in markdown blocks just like GitHub does: # Hello World
I'm a markdown document and want to show some embedded HTML:
```html
<div>
<p data-testid="highlightjs">Hi from HTML!</p>
</div>
``` Unless this is already possible? |
Well, I actually think that's a whole different case - and not comparable to the original example, but I have some very mixed feelings on that specifically. I think GitHub is only confusing the issue because it's actually RENDERING markdown, not highlighting it. To me highlighting markdown would mean that you'd have markup like: <div class="hljs-string">
```html
<div>
<p data-testid="highlightjs">Hi from HTML!</p>
</div>
```
</div> Within the context of highlighting markdown that is really just a multi-line string... now when the markdown is actually rendered that string might be highlighted as code... but I see that as a job of the renderer... and we are not a renderer, we're a highlighter. This is also related to how some people expect us to handle markdown... some people expect us to actually have styling that makes bold parts bold and italic parts italic... but that's not our job. We're a highlighter, NOT a renderer. Even thinking about it makes my head hurt a little. :) Maybe I'm thinking about it wrong though - I know my text editor syntax highlighting does colorize markdown code snippets...
You can't do it dynamically, but you could "guess" or trust the auto-detect... look at how XML handles begin: '<script(?=\\s|>)', end: '>',
keywords: {name: 'script'},
contains: [TAG_INTERNALS],
starts: {
end: '\<\/script\>', returnEnd: true,
subLanguage: ['actionscript', 'javascript', 'handlebars', 'xml']
} |
Or see
|
To me, I would think this is still the job of the highlighter. See this example where I'm forcing the language within the GitHub highlighting: ```
<div>
<p data-testid="highlightjs">Hi from HTML!</p>
</div>
```
```html
<div>
<p data-testid="highlightjs">Hi from HTML!</p>
</div>
```
```python
<div>
<p data-testid="highlightjs">Hi from HTML!</p>
</div>
```
```go
<div>
<p data-testid="highlightjs">Hi from HTML!</p>
</div>
``` I saw this example related to front matter because the behavior seemed to be the same for me:
|
Well the big difference is that language snippets is actually a syntactic feature of the markdown language (or at least the GitHub variant)... where-as "front matter" isn't a feature of any language. It's a concept for front-loading meta-data about any type of textual content. So it's the same kind of thing, but quite different conceptually. So if we had better support for this kind of thing I could potentially imagine the code snippet support being added to Markdown, but we wouldn't add front-matter support... because as pointed out earlier that's not specific to Markdown... Where-as if we decided Markdown snippets should be highlighted as their declared code type, that would be a simply an improvement to the existing Markdown highlighting - to better support the Markdown language. |
Agree with this. Consider to look for the Also see how CommonMark written its metadata header. |
YAML is ridiculously complex. :-) Yet what we have seems to be working OK for most people. :-) |
I mean we could tag the "..." as something, but I'm not sure it would really change anything since after the ... evidently you can have even more YAML... so it might already "just work". I've never seen an example like that before. |
As long as it starts with another |
I was thinking a bit more about how to solve this problem, specifically YAML front matter in Markdown files, and I encountered the following in the docs.
As lookbehind matching is supported, albeit by around 70% of browsers, we can use a fairly simple regular expression to match {
begin: '(?<!\\n)^---\\n', end: '\\n---\\n',
subLanguage: 'yaml',
relevance: 0
} However, I'm aware that other formats of front matter are available, for example, Hugo supports four formats for front matter; would it be naive to add each format as I suggested, or would it be suitable for now? If this solution seems okay then I'd be happy to create a pull request. |
That is a neat trick. I think one 3rd party language author is using look behind (but they test for it and use alternative regex if it's not available - but in this case there are no alternatives) , but I'm not sure how I feel about adding a feature to core that's only support by 70% of green-field browsers. It seems very bad to me for Highlight.js to have different behavior in one modern browser than another.
Couldn't auto-detect try to figure it out? Are they all enclosed the same?
I'm not sure what you're asking. As we discussed already "frontmatter" isn't a concept unique to markdown so I'm not sure where you are suggesting that we add it. One could conceive of a |
If you wrote a plugin that was small/simple enough it could possibly be included in |
I've had a quick go at making a plugin for this, feedback would be very much appreciated! The plugin revolves around a regular expression which has three matching groups:
The plugin separates the front matter into these three parts, highlights the content of the front matter, and then concatenates them back together, along with the original content of the Markdown. By default, it works with class FrontMatterPlugin {
constructor(options) {
this.regexp = (options && options.regexp) || /(^[-+]{3}\n)([\s\S]*?)(\n\1\n)/;
this.language = options && options.language;
this.subLanguage = options && options.subLanguage;
}
'before:highlightBlock'({block, language}) {
if (this.language && this.language !== language) {
return;
}
var content = block.innerText;
var frontMatter = content.match(this.regexp);
var frontMatterContent = frontMatter[2];
if (this.sublanguage) {
var frontMatterResult = hljs.highlight(this.subLanguage, frontMatterContent);
} else {
var frontMatterResult = hljs.highlightAuto(frontMatterContent);
}
this.frontMatterBegin = frontMatter[1];
this.frontMatterResult = frontMatterResult.value;
this.frontMatterEnd = frontMatter[3];
block.innerText = content.replace(frontMatter[0], '');
}
'after:highlightBlock'({block, result}) {
if (this.frontMatterResult) {
result.value = this.frontMatterBegin + this.frontMatterResult + this.frontMatterEnd + result.value;
}
}
} There are definitely some things I haven't considered yet, for example,
Here's an example of how you'd use this plugin with AsciiDoc and JSON front matter (if that's even a thing): hljs.addPlugin(new FrontMatterPlugin({
regexp: /(^)({\n[\s\S]*?\n})(\n)/,
language: 'asciidoc',
subLanguage: 'json'
})); You can see in this case that the first matching group is just |
It would be possible in this plugin to check if lookbehind matching was supported and simply change the grammar if it was, otherwise fall back to the plugin itself. Would there be any advantage or disadvantage in doing this? |
Well you probably wouldn't "change" anything, but you could do a check first and then decide whether to install a plugin at all or simply to auto-generate a language grammar with negative look-behind and register it - but WHY would you do that? You're just making things twice as complex with no real upside - and creating the possibility of subtly differences in behavior between the two different ways of doing the same thing. A single solution is best, IMHO. Grammars aren't necessarily better than plugins. |
Did you mean "pass thru"? There is no need.
I'd think if you couldn't find the front matter you'd just highlight the whole content normally.
Not sure this is necessary but not difficult.
I'm a believer in clean code. I'd have broken your plugin down into smaller functions... you have a whole class, take advantage of it to have some small helper functions to make your code easier to read. Just one example: 'after:highlightBlock'({block, result}) {
result.value = this.highlightedFrontMatter() + result.value;
} Highlighted front matter returns the front portion, or a blank string... and you've pushed that complexity down a layer. The before callback could probably be broken into 2 or 3 smaller well named functions also. |
That's how you'd do it with Although now I'm wondering if the callback system itself should protect from recursive plugins... |
I agree, a plugin seems the way to go.
For example, if you knew that your front matter was going to be either YAML or TOML, it would be nice to pass through languageSubset where the plugin calls
Thanks for the feedback, I will refactor it a bit.
I've had a read of the other thread — in my opinion, recursive plugins seem like they might be useful, although I can't think of a use case off the top of my head. In the case of this plugin, if you have a Markdown file with YAML front matter, you could specify to only run the plugin if the language is |
You might have to invent some of that yourself since now you're inventing things that only have to do with your plugin, not Highlight.js itself. I'd probably use
I'm not sure it's useful for a plugin to be self-recursive. I mean there is no need - you could build it yourself inside your own plugin... and it's sure a pain for every plugin to add a check just to avoid recursion. And I think calling highlight from within a plugin might be a pretty common pattern. On the other hand multiple plugins can nest within each other (which seems very useful)... so that would "just work". The only issue would be the order the plugins were registered, and I don't know how you avoid that.
Ah, true. I forgot you're using the actual original highlight for the "base" content... another way to do it would be to call But you're right that you avoid the issue the way you're doing it. :-) |
Closing this as an issue since (as mentioned earlier) this is not an actual issue with HLJS or the Markdown grammar. More than happy to continue the plugin discussion. |
It's common to include YAML front matter at the top of a Markdown file, for example when using Jekyll. Currently, highlight.js parses the last line of YAML as a second-level heading because of the
---
three dashes below it. Although YAML front matter isn't actually part of any Markdown specification, would it be possible to add this to the Markdown definition?GitHub highlights this correctly:
Highlight.js highlights this incorrectly:
The text was updated successfully, but these errors were encountered: