-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Enhance Parser to Support # %% with Additional Comments #127
Conversation
Ensure the parser can correctly recognize lines that start with # %%, even when followed by additional comments or markdown annotations. This change includes using regular expressions to match the lines. Refactor the parser to use regex for identifying cells.
Ensure the parser can correctly recognize lines that start with # %%, even when followed by additional comments or markdown annotations. This change includes importing the re module to use regular expressions for matching the lines. Import the re module and use regex for identifying cells.
Thanks for the PR!
|
Thank you for your response! 1. What problem are you trying to solve? What are markdown annotations?Markdown annotation was a miscommunication on my part. What I meant is that I want to enable the code cell separator 2. Why add comments after the line separator? Isn't this not even standard Jupytext format?While this is not a standard Jupytext format, users, including myself, often add comments after Here are a few examples from my work: # %% [Train Pretrained Yolov8 Model]
from ultralytics import YOLO
import os
model = YOLO("yolov8s-seg.pt") # load a pretrained model (recommended for training)
# Set training parameters
current_path = os.path.dirname(os.path.abspath("__file__"))
params = { ... }
model.train(**params)
# %% [Load Model]
from ultralytics import YOLO
model = YOLO("runs/segment/train/weights/best.pt")
# %% [Prediction]
results = model.predict( ... ) 3. This function is called very often. Changing it to a regular expression might cause a significant performance drop on some machines. Since Jupynium is different from Jupytext because it requires parsing real-time changes, I want to keep it lightweight. The changes you made can be replaced by str.startswith.Thank you for pointing that out. Using regular expressions can indeed affect performance. I have reverted the changes to the original version except for the 4. The equivalent Lua implementation needs to be changed (possibly in cells.lua).Since the |
Ensure the parser can correctly recognize lines that start with # %%, even when followed by additional comments. This change includes modifying the line separator check to use `line.startswith("# %%")` instead of `line.strip() == "# %%"`. Revert other changes and only modify the line separator check for code cell recognition and conversion. This update also removes the use of regular expressions to prevent potential performance drops.
Okay, I don't mind supporting this feature. However, I'd like to point out that your current implementation is broken.
|
Thank you for pointing out the issue. I understand that the current implementation mistakenly treats magic commands like ChangesPython (
|
This commit corrects the cell separator detection that was broken in the previous commit. The previous commit allowed lines starting with "# %%" followed by additional comments to be treated as cell separators. However, this broke the functionality for magic commands like "# %%timeit", which should not be treated as cell separators. The fix involves checking whether there is a space after "# %%" before treating it as a cell separator. This is done by changing the line separator check in Lua files to `vim.fn.trim(string.sub(line, 1, 5)) == "# %%"`. Additionally, it was discovered that `line.startswith("# %%")` in Python files also misidentified magic commands as cell separators when syncing to Jupyter. Therefore, this has been corrected to `line[:5].strip() == "# %%"`. This change ensures that magic commands are correctly recognized and not treated as cell separators.
Okay, thank you for your work! To finalise the PR, I think one last step remains. Add commentsSince the implementation is a little awkward and hard to understand, I'd like a comment linking to this PR and a brief description. # Treat '# %% Cell Title' as a code cell
# But not magic commands such as '# %%timeit'.
# See: https://github.com/kiyoon/jupynium.nvim/pull/127 To both lua and the python files. Add testsIn For example, # below test_buffer_1()
def test_buffer_with_cell_title_1():
buffer = JupyniumBuffer(["a", "b", "c", "# %% Cell Title", "d", "e", "f"])
assert buffer.num_rows_per_cell == [3, 4]
assert buffer.cell_types == ["header", "code"]
# below test_magic_command_1()
def test_double_magic_command_1():
"""
Everything else except magic commands should be preserved after __init__() or fully_analysed_buf().
"""
lines = ["a", "b", "c", "# %% Cell Title", "# %%timeit", "e", "f"]
buffer = JupyniumBuffer(lines)
code_cell_content = buffer.get_cell_text(1)
assert code_cell_content == "%%timeit\ne\nf" I think this should be sufficient. |
Thank you for your feedback. I have added the requested comments in # Treat '# %% Cell Title' as a code cell
# But not magic commands such as # '%%timeit'.
# See: https://github.com/kiyoon/jupynium.nvim/pull/127 Additionally, I have added the following unit tests in
I have run the tests with Please review the changes and let me know if there are any further adjustments needed. Thank you for your guidance and support. |
This commit adds comments in `cells.lua` and `buffer.py` to explain the new feature that treats lines starting with '# %%' followed by any text (except magic commands) as code cells. See: kiyoon#127
This commit adds unit tests in `test_buffer.py` for the new feature in `buffer.py` and `cells.lua` that treats lines starting with '# %%' followed by any text (except magic commands) as code cells. The tests, named `test_buffer_with_cell_title_1` and `test_double_magic_command_1`, ensure that the buffer correctly identifies and handles these cells. The tests were run with `pytest -m .` to ensure the changes work as expected and do not break existing functionality. See: kiyoon#127
* refactor(parser): support # %% with additional comments * fix(parser): import re for regex matching * refactor(parser): support # %% with additional comments * fix: correct cell separator detection for magic commands * docs: add comments for '# %% Cell Title' feature * test: add tests for '# %% Cell Title' in test_buffer.py
Thank you for this feature! |
Summary
This pull request enhances the parser to correctly recognize lines that start with
# %%
, even when followed by additional commentsor markdown annotations. This improvement is achieved by using regular expressions.Detailed Description
# %%
and may include additional comments or markdown annotations.Changes
re
module to facilitate regex matching.Motivation
The existing parser did not handle lines starting with
# %%
when followed by additional comments or markdown annotations. By using regular expressions, the parser can now accurately identify such lines, improving its robustness and flexibility.Testing
# %%
followed by additional comments or markdown annotations.Thank you for considering this pull request. I look forward to your feedback.