feat: Enhance Parser to Support # %% with Additional Comments #127

ESSO0428 · 2024-07-12T17:06:30Z

Summary

This pull request enhances the parser to correctly recognize lines that start with # %%, even when followed by additional comments ~~or markdown annotations~~. This improvement is achieved by using regular expressions.

Detailed Description

Enhancement: Refactored the parser to use regex for identifying cells that start with # %% and may include additional comments or markdown annotations.

Changes

Added regex-based identification of cells to the parser.
Imported the re module to facilitate regex matching.

Motivation

The existing parser did not handle lines starting with # %% when followed by additional comments or markdown annotations. By using regular expressions, the parser can now accurately identify such lines, improving its robustness and flexibility.

Testing

Verified that the parser correctly identifies cells starting with # %% followed by additional comments or markdown annotations.
Ensured no existing functionality is broken by these changes.

Thank you for considering this pull request. I look forward to your feedback.

Ensure the parser can correctly recognize lines that start with # %%, even when followed by additional comments or markdown annotations. This change includes using regular expressions to match the lines. Refactor the parser to use regex for identifying cells.

Ensure the parser can correctly recognize lines that start with # %%, even when followed by additional comments or markdown annotations. This change includes importing the re module to use regular expressions for matching the lines. Import the re module and use regex for identifying cells.

kiyoon · 2024-07-13T02:53:27Z

Thanks for the PR!

What is the problem you actually want to solve? What are markdown annotations? According to the jupytext document, something like key="value"? I've never seen this being used so far so please let me know how this is useful.
I don't really see why you would add a comment to the line separator. It's not even a standard Jupytext format, right?
That function is called very often. Changing it to regex will likely cause a huge performance drop in some machines. Since Jupynium is different from Jupytext because it requires parsing in real-time changes, I want to keep it lightweight. The changes you made, can be replaced by str.startswith.
There is an equivalent lua implementation that you need to change (maybe in cells.lua).

ESSO0428 · 2024-07-13T07:24:21Z

Thanks for the PR!

What is the problem you actually want to solve? What are markdown annotations? According to the jupytext document, something like key="value"? I've never seen this being used so far so please let me know how this is useful.

I don't really see why you would add a comment to the line separator. It's not even a standard Jupytext format, right?

That function is called very often. Changing it to regex will likely cause a huge performance drop in some machines. Since Jupynium is different from Jupytext because it requires parsing in real-time changes, I want to keep it lightweight. The changes you made, can be replaced by str.startswith.

There is an equivalent lua implementation that you need to change (maybe in cells.lua).

Thank you for your response!

1. What problem are you trying to solve? What are markdown annotations?

Markdown annotation was a miscommunication on my part. What I meant is that # %% should allow adding comments.

I want to enable the code cell separator # %% to have comments following it, similar to the behavior in VS Code. This way, comments can describe the function of the code block, and when the code block is sent to the Python Interactive window, it is still recognized as a code cell. This helps users maintain Python scripts more clearly by understanding the main function of each code cell.

2. Why add comments after the line separator? Isn't this not even standard Jupytext format?

While this is not a standard Jupytext format, users, including myself, often add comments after # %% to improve readability and organization of the code. This helps when reviewing the original Python script to understand each code block better, making it easier to convert the developed features into formal Python scripts.

Here are a few examples from my work:

# %% [Train Pretrained Yolov8 Model]
from ultralytics import YOLO
import os

model = YOLO("yolov8s-seg.pt")  # load a pretrained model (recommended for training)

# Set training parameters
current_path = os.path.dirname(os.path.abspath("__file__"))
params = { ... }
model.train(**params)

# %% [Load Model]
from ultralytics import YOLO

model = YOLO("runs/segment/train/weights/best.pt")

# %% [Prediction]
results = model.predict( ... )

3. This function is called very often. Changing it to a regular expression might cause a significant performance drop on some machines. Since Jupynium is different from Jupytext because it requires parsing real-time changes, I want to keep it lightweight. The changes you made can be replaced by str.startswith.

Thank you for pointing that out. Using regular expressions can indeed affect performance. I have reverted the changes to the original version except for the line.strip() == "# %%" part, which I have replaced with line.startswith("# %%"), discarding the use of regular expressions.

4. The equivalent Lua implementation needs to be changed (possibly in cells.lua).

Since the # %% block conversion is written in buffer.py, implementing this through cells.lua might be challenging. However, I will further investigate whether it is possible to achieve this functionality by modifying cells.lua.

Ensure the parser can correctly recognize lines that start with # %%, even when followed by additional comments. This change includes modifying the line separator check to use `line.startswith("# %%")` instead of `line.strip() == "# %%"`. Revert other changes and only modify the line separator check for code cell recognition and conversion. This update also removes the use of regular expressions to prevent potential performance drops.

kiyoon · 2024-07-13T11:16:48Z

Okay, I don't mind supporting this feature.

However, I'd like to point out that your current implementation is broken.

# %%timeit is a magic command, and should not be treated as a cell separator. You need to check whether there is a space between the comment and the percent.
The line_type in cells.lua also needs a change accordingly.

https://github.com/kiyoon/jupynium.nvim/blob/1e572965813719c9ef123f777d79814e91d1a2cf/lua/jupynium/cells.lua#L6C12-L23

ESSO0428 · 2024-07-13T14:45:51Z

Okay, I don't mind supporting this feature.

However, I'd like to point out that your current implementation is broken.

# %%timeit is a magic command, and should not be treated as a cell separator. You need to check whether there is a space between the comment and the percent.

The line_type in cells.lua also needs a change accordingly.

https://github.com/kiyoon/jupynium.nvim/blob/1e572965813719c9ef123f777d79814e91d1a2cf/lua/jupynium/cells.lua#L6C12-L23

Thank you for pointing out the issue.

I understand that the current implementation mistakenly treats magic commands like # %%timeit as cell separators. To fix this, I have made the following changes:

Changes

Python (`buffer.py`)

I have modified the cell separator check from:

elif line.startswith("# %%"):

to

elif line[:5].strip() == "# %%":

This ensures that only lines starting with # %% followed by a space or no additional string are treated as cell separators when syncing to Jupyter.

Lua (`cells.lua`)

I have modified the line type detection from:

elseif vim.fn.trim(line) == "# %%" then

to

elseif vim.fn.trim(string.sub(line, 1, 5)) == "# %%" then

This ensures that only lines starting with # %% followed by a space or no additional string are treated as cell separators, avoiding the misidentification of magic commands when syncing to Jupyter.

These changes should correctly distinguish between cell separators and magic commands, maintaining the functionality while preventing performance issues by not using regular expressions.

Thank you for your guidance and support for this feature.

This commit corrects the cell separator detection that was broken in the previous commit. The previous commit allowed lines starting with "# %%" followed by additional comments to be treated as cell separators. However, this broke the functionality for magic commands like "# %%timeit", which should not be treated as cell separators. The fix involves checking whether there is a space after "# %%" before treating it as a cell separator. This is done by changing the line separator check in Lua files to `vim.fn.trim(string.sub(line, 1, 5)) == "# %%"`. Additionally, it was discovered that `line.startswith("# %%")` in Python files also misidentified magic commands as cell separators when syncing to Jupyter. Therefore, this has been corrected to `line[:5].strip() == "# %%"`. This change ensures that magic commands are correctly recognized and not treated as cell separators.

kiyoon · 2024-07-15T02:27:44Z

Okay, thank you for your work!

To finalise the PR, I think one last step remains.

Add comments

Since the implementation is a little awkward and hard to understand, I'd like a comment linking to this PR and a brief description.

# Treat '# %% Cell Title' as a code cell
# But not magic commands such as '# %%timeit'.
# See: https://github.com/kiyoon/jupynium.nvim/pull/127

To both lua and the python files.

Add tests

In tests/test_buffer.py, I'd like you to add a new test. You can copy an existing test, but do not modify the existing ones.

For example,

# below test_buffer_1()
def test_buffer_with_cell_title_1():
    buffer = JupyniumBuffer(["a", "b", "c", "# %% Cell Title", "d", "e", "f"])
    assert buffer.num_rows_per_cell == [3, 4]
    assert buffer.cell_types == ["header", "code"]

# below test_magic_command_1()
def test_double_magic_command_1():
    """
    Everything else except magic commands should be preserved after __init__() or fully_analysed_buf().
    """
    lines = ["a", "b", "c", "# %% Cell Title", "# %%timeit", "e", "f"]
    buffer = JupyniumBuffer(lines)
    code_cell_content = buffer.get_cell_text(1)
    assert code_cell_content == "%%timeit\ne\nf"

I think this should be sufficient.

ESSO0428 · 2024-07-15T16:53:41Z

Okay, thank you for your work!

To finalise the PR, I think one last step remains.

Add comments

Since the implementation is a little awkward and hard to understand, I'd like a comment linking to this PR and a brief description.
# Treat '# %% Cell Title' as a code cell
# But not magic commands such as '# %%timeit'.
# See: https://github.com/kiyoon/jupynium.nvim/pull/127
To both lua and the python files.

Add tests

In tests/test_buffer.py, I'd like you to add a new test. You can copy an existing test, but do not modify the existing ones.

For example,
# below test_buffer_1()
def test_buffer_with_cell_title_1():
    buffer = JupyniumBuffer(["a", "b", "c", "# %% Cell Title", "d", "e", "f"])
    assert buffer.num_rows_per_cell == [3, 4]
    assert buffer.cell_types == ["header", "code"]

# below test_magic_command_1()
def test_double_magic_command_1():
    """
    Everything else except magic commands should be preserved after __init__() or fully_analysed_buf().
    """
    lines = ["a", "b", "c", "# %% Cell Title", "# %%timeit", "e", "f"]
    buffer = JupyniumBuffer(lines)
    code_cell_content = buffer.get_cell_text(1)
    assert code_cell_content == "%%timeit\ne\nf"
I think this should be sufficient.

Thank you for your feedback.

I have added the requested comments in cells.lua and buffer.py to explain the new
feature that treats lines starting with # %% followed by any text (except magic commands) as code cells. This is done by ensuring there is a space after # %% to avoid misinterpreting magic commands.

# Treat '# %% Cell Title' as a code cell
# But not magic commands such as # '%%timeit'.
# See: https://github.com/kiyoon/jupynium.nvim/pull/127

Additionally, I have added the following unit tests in tests/test_buffer.py:

test_buffer_with_cell_title_1 : This test ensures that the buffer correctly handles lines starting with # %% Cell Title as code cells.
test_double_magic_command_1 : This test ensures that magic commands like # %%timeit are preserved correctly and not treated as cell separators.

I have run the tests with pytest -m . to ensure the changes work as expected and do not break existing functionality.

Please review the changes and let me know if there are any further adjustments needed.

Thank you for your guidance and support.

This commit adds comments in `cells.lua` and `buffer.py` to explain the new feature that treats lines starting with '# %%' followed by any text (except magic commands) as code cells. See: kiyoon#127

This commit adds unit tests in `test_buffer.py` for the new feature in `buffer.py` and `cells.lua` that treats lines starting with '# %%' followed by any text (except magic commands) as code cells. The tests, named `test_buffer_with_cell_title_1` and `test_double_magic_command_1`, ensure that the buffer correctly identifies and handles these cells. The tests were run with `pytest -m .` to ensure the changes work as expected and do not break existing functionality. See: kiyoon#127

* refactor(parser): support # %% with additional comments * fix(parser): import re for regex matching * refactor(parser): support # %% with additional comments * fix: correct cell separator detection for magic commands * docs: add comments for '# %% Cell Title' feature * test: add tests for '# %% Cell Title' in test_buffer.py

kiyoon · 2024-07-16T00:19:07Z

Thank you for this feature!

ESSO0428 added 2 commits July 12, 2024 19:19

ESSO0428 added 2 commits July 16, 2024 00:58

docs: add comments for '# %% Cell Title' feature

4a377b0

This commit adds comments in `cells.lua` and `buffer.py` to explain the new feature that treats lines starting with '# %%' followed by any text (except magic commands) as code cells. See: kiyoon#127

kiyoon merged commit d93164f into kiyoon:master Jul 16, 2024
10 checks passed

kiyoon changed the title ~~Enhance Parser to Support # %% with Additional Comments~~ feat: Enhance Parser to Support # %% with Additional Comments Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enhance Parser to Support # %% with Additional Comments #127

feat: Enhance Parser to Support # %% with Additional Comments #127

ESSO0428 commented Jul 12, 2024 •

edited

Loading

kiyoon commented Jul 13, 2024

ESSO0428 commented Jul 13, 2024 •

edited

Loading

kiyoon commented Jul 13, 2024 •

edited

Loading

ESSO0428 commented Jul 13, 2024 •

edited

Loading

kiyoon commented Jul 15, 2024

ESSO0428 commented Jul 15, 2024 •

edited

Loading

Add comments

Add tests

kiyoon commented Jul 16, 2024

feat: Enhance Parser to Support # %% with Additional Comments #127

feat: Enhance Parser to Support # %% with Additional Comments #127

Conversation

ESSO0428 commented Jul 12, 2024 • edited Loading

Summary

Detailed Description

Changes

Motivation

Testing

kiyoon commented Jul 13, 2024

ESSO0428 commented Jul 13, 2024 • edited Loading

1. What problem are you trying to solve? What are markdown annotations?

2. Why add comments after the line separator? Isn't this not even standard Jupytext format?

4. The equivalent Lua implementation needs to be changed (possibly in cells.lua).

kiyoon commented Jul 13, 2024 • edited Loading

ESSO0428 commented Jul 13, 2024 • edited Loading

Changes

Python (buffer.py)

Lua (cells.lua)

kiyoon commented Jul 15, 2024

Add comments

Add tests

ESSO0428 commented Jul 15, 2024 • edited Loading

Add comments

Add tests

kiyoon commented Jul 16, 2024

ESSO0428 commented Jul 12, 2024 •

edited

Loading

ESSO0428 commented Jul 13, 2024 •

edited

Loading

kiyoon commented Jul 13, 2024 •

edited

Loading

ESSO0428 commented Jul 13, 2024 •

edited

Loading

Python (`buffer.py`)

Lua (`cells.lua`)

ESSO0428 commented Jul 15, 2024 •

edited

Loading