https://github.com/jackdewinter/pymarkdown/issues/945

jackdewinter · Jan 26, 2024 · a2e5e26 · a2e5e26
1 parent 5087500
commit a2e5e26
Show file tree

Hide file tree

Showing 50 changed files with 948 additions and 111 deletions.
diff --git a/changelog.md b/changelog.md
@@ -2,6 +2,20 @@
 
 ## Unversioned - In Main, Not Released
 
+### Added
+
+- None
+
+### Fixed
+
+- None
+
+### Changed
+
+- None
+
+## Version 0.9.16 - Date: 2024-01-20
+
 This release is going to focus on getting the feature list complete
 for a version 1.0 release in early 2024.  To a large extent, this
 involves adding the "fix" feature for some rules, and double checking

diff --git a/docs/rules/rule_md014.md b/docs/rules/rule_md014.md
@@ -33,7 +33,7 @@ commands provided.
 ### Failure Scenarios
 
 This rule triggers if every line within a Code Block element begins with
-the `$` indicator, after any leading whitespace has been removed.
+the `$` indicator, after any leading space characters have been removed.
 
 ````Markdown
 ```shell

diff --git a/docs/rules/rule_md025.md b/docs/rules/rule_md025.md
@@ -103,9 +103,13 @@ to check against for multiples.
 | Value Name | Type | Default | Description |
 | -- | -- | -- | -- |
 | `enabled` | `boolean` | `True` | Whether the plugin rule is enabled. |
-| `front_matter_title` | `string` | `title` | Name of the front-matter field that has the title associated with the document. |
+| `front_matter_title` | `string` | `title` | Name of the front-matter field that has the title associated with the document.** |
 | `level` | `integer` | `1` | Heading level to be considered as the top-level. |
 
+** Any leading or trailing space characters are removed from the `front_matter_title`
+during processing.  This value is expected not to have the `:` at the end. Therefore,
+a header value of `subject:` would be entered as `subject`.
+
 ## Origination of Rule
 
 This rule is largely inspired by the MarkdownLint rule

diff --git a/docs/rules/rule_md033.md b/docs/rules/rule_md033.md
@@ -67,15 +67,17 @@ image tags than the default `!--` (HTML comment) are strongly discouraged.
 | Value Name | Type | Default | Description |
 | -- | -- | -- | -- |
 | `enabled` | `boolean` | `True` | Whether the plugin rule is enabled. |
-| `allowed_elements` | `string` | `!--,![CDATA[,!DOCTYPE` | Comma separated list of tag starts that are allowable. |
+| `allowed_elements` | `string` | `!--,![CDATA[,!DOCTYPE` | Comma separated list of tag starts that are allowable.** |
 | `allow_first_image_element` | `boolean` | `True` | Whether to allow an image HTML block. |
 
-To be clear, if using the `allowed_elements` configuration value, the supplied
-value is a comma separated list of allowable element sequences.  Those
-element names are derived by taking the start of the tag and skipping
-over the start character `<`.
-From that point, the parser collects the contents of the tag up to one of the
-following:
+** The comma-separated list of items is a string with a format of `{item},...,{item}`.
+Any leading or trailing space characters surrounding the `{item}` are trimmed during
+processing.  Empty `{item}` values after this trimming has been applied will generate
+a configuration error.
+
+The element names in the list are derived by taking the start of the tag and skipping
+over the start character `<`.  From that point, the parser collects the contents
+of the tag up to one of the following:
 
 - the first whitespace character
 - the close HTML tag character (`/`)

diff --git a/docs/rules/rule_md035.md b/docs/rules/rule_md035.md
@@ -75,8 +75,10 @@ is made, so that the following example will not trigger this rule:
 | `enabled` | `boolean` | `True` | Whether the plugin rule is enabled. |
 | `style` | `string` | `consistent` | `consistent` for consistent, or a specific marker** |
 
-** If a specific marker is configured, it must be valid multiples (three or more) of either the
-`-` character, the `_` character, or the `*` character, with optional whitespace between them.
+** If a specific marker is configured, it must be valid multiples (three or more)
+of either the `-` character, the `_` character, or the `*` character, with optional
+whitespace between them. The specific marker cannot start or end with a space
+character.
 
 ## Origination of Rule
 

diff --git a/docs/rules/rule_md037.md b/docs/rules/rule_md037.md
@@ -31,7 +31,7 @@ such as `***` for combining an italics emphasis with a bold emphasis.
 ### Failure Scenarios
 
 This rule triggers if a pair of matching emphasis characters occur
-within the same paragraph with space around either of the emphasis
+within the same paragraph with unicode whitespace around either of the emphasis
 characters.
 
 ```Markdown

diff --git a/docs/rules/rule_md041.md b/docs/rules/rule_md041.md
@@ -103,7 +103,11 @@ document will not trigger this rule:
 | -- | -- | -- | -- |
 | `enabled` | `boolean` | `True` | Whether the plugin rule is enabled. |
 | `level` | `integer` | `1` | Level that is expected from the first heading (Atx or SetExt) in the document. |
-| `front_matter_title` | `string` | `title` | Name of the front-matter field that has the title associated with the document. |
+| `front_matter_title` | `string` | `title` | Name of the front-matter field that has the title associated with the document.** |
+
+** Any leading or trailing space characters are removed from the `front_matter_title`
+during processing.  This value is expected not to have the `:` at the end. Therefore,
+a header value of `subject:` would be entered as `subject`.
 
 ## Origination of Rule
 

diff --git a/docs/rules/rule_md043.md b/docs/rules/rule_md043.md
@@ -100,7 +100,12 @@ sequence is not followed by anything; it cannot be followed by any headings.
 | Value Name | Type | Default | Description |
 | -- | -- | -- | -- |
 | `enabled` | `boolean` | `True` | Whether the plugin rule is enabled. |
-| `required_headings` | `string` | `""` | Comma separated list of headings to require the document to have. |
+| `required_headings` | `string` | `""` | Comma separated list of headings to require the document to have.** |
+
+** The comma-separated list of items is a string with a format of `{item},...,{item}`.
+Any leading or trailing space characters surrounding the `{item}` are trimmed during
+processing.  Empty `{item}` values after this trimming has been applied will generate
+a configuration error.
 
 For the `required_headings` list, each element is expected to be in one
 of two forms.  The first form is that of a uncomplicated text Atx Heading, such as

diff --git a/docs/rules/rule_md044.md b/docs/rules/rule_md044.md
@@ -87,9 +87,14 @@ this is a reparagraph
 | Value Name | Type | Default | Description |
 | -- | -- | -- | -- |
 | `enabled` | `boolean` | `True` | Whether the plugin rule is enabled. |
-| `names`   | `string` | None | Comma-separated list of proper nouns to preserve capitalization on. |
+| `names`   | `string` | None | Comma-separated list of proper nouns to preserve capitalization on.** |
 | `code_blocks` | `boolean` | `True` | Search in Fenced Code Block elements and Indented Code Block elements. |
 
+** The comma-separated list of items is a string with a format of `{item},...,{item}`.
+Any leading or trailing space characters surrounding the `{item}` are trimmed during
+processing.  Empty `{item}` values after this trimming has been applied will generate
+a configuration error.
+
 ## Origination of Rule
 
 This rule is largely inspired by the MarkdownLint rule

diff --git a/docs/rules/rule_md045.md b/docs/rules/rule_md045.md
@@ -28,7 +28,9 @@ sight impaired people.
 ### Failure Scenarios
 
 This rule triggers when the link label for an image has no characters or only
-whitespace characters:
+whitespace characters.  As the focus of this rule is to provide text to help
+identify the image, the whitespace characters compared against are the set
+of Unicode whitespace characters.
 
 ````Markdown
 [](/url)

diff --git a/publish/coverage.json b/publish/coverage.json
@@ -2,12 +2,12 @@
     "projectName": "pymarkdown",
     "reportSource": "pytest",
     "branchLevel": {
-        "totalMeasured": 4783,
-        "totalCovered": 4783
+        "totalMeasured": 4785,
+        "totalCovered": 4785
     },
     "lineLevel": {
-        "totalMeasured": 19312,
-        "totalCovered": 19312
+        "totalMeasured": 19327,
+        "totalCovered": 19327
     }
 }
 
diff --git a/publish/test-results.json b/publish/test-results.json
@@ -236,23 +236,23 @@
         },
         {
             "name": "test.extensions.test_markdown_front_matter",
-            "totalTests": 29,
+            "totalTests": 33,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,
             "elapsedTimeInMilliseconds": 0
         },
         {
             "name": "test.extensions.test_markdown_pragma_parsing",
-            "totalTests": 12,
+            "totalTests": 14,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,
             "elapsedTimeInMilliseconds": 0
         },
         {
             "name": "test.extensions.test_markdown_pragmas",
-            "totalTests": 31,
+            "totalTests": 32,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,
@@ -332,7 +332,7 @@
         },
         {
             "name": "test.gfm.test_markdown_code_spans",
-            "totalTests": 37,
+            "totalTests": 38,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,
@@ -532,7 +532,7 @@
         },
         {
             "name": "test.gfm.test_markdown_list_blocks",
-            "totalTests": 133,
+            "totalTests": 135,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 4,
@@ -572,7 +572,7 @@
         },
         {
             "name": "test.gfm.test_markdown_reference_links",
-            "totalTests": 97,
+            "totalTests": 104,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,
@@ -1364,7 +1364,7 @@
         },
         {
             "name": "test.rules.test_md033",
-            "totalTests": 17,
+            "totalTests": 18,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,
@@ -1444,7 +1444,7 @@
         },
         {
             "name": "test.rules.test_md043",
-            "totalTests": 30,
+            "totalTests": 32,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,
@@ -1460,7 +1460,7 @@
         },
         {
             "name": "test.rules.test_md045",
-            "totalTests": 6,
+            "totalTests": 7,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,
@@ -1596,7 +1596,7 @@
         },
         {
             "name": "test.test_markdown_extra",
-            "totalTests": 110,
+            "totalTests": 114,
             "failedTests": 0,
             "errorTests": 0,
             "skippedTests": 0,

diff --git a/pymarkdown/block_quotes/block_quote_non_fenced_helper.py b/pymarkdown/block_quotes/block_quote_non_fenced_helper.py
@@ -6,6 +6,7 @@
 
 from pymarkdown.block_quotes.block_quote_count_helper import BlockQuoteCountHelper
 from pymarkdown.block_quotes.block_quote_data import BlockQuoteData
+from pymarkdown.general.constants import Constants
 from pymarkdown.general.parser_helper import ParserHelper
 from pymarkdown.general.parser_logger import ParserLogger
 from pymarkdown.general.parser_state import ParserState
@@ -195,7 +196,7 @@ def __handle_non_fenced_code_section_no_requeue(
         )
         POGGER.debug("text_removed_by_container=[$]", removed_text)
         POGGER.debug("removed_text=[$]", removed_text)
-        if line_to_parse.strip():
+        if line_to_parse.strip(Constants.ascii_whitespace):
             return (
                 line_to_parse,
                 start_index,

diff --git a/pymarkdown/block_quotes/block_quote_processor.py b/pymarkdown/block_quotes/block_quote_processor.py
@@ -10,6 +10,7 @@
     BlockQuoteNonFencedHelper,
 )
 from pymarkdown.container_blocks.container_grab_bag import ContainerGrabBag
+from pymarkdown.general.constants import Constants
 from pymarkdown.general.parser_logger import ParserLogger
 from pymarkdown.general.parser_state import ParserState
 from pymarkdown.general.position_marker import PositionMarker
@@ -326,10 +327,9 @@ def __handle_block_quote_block_kludges(
                 POGGER.debug(
                     "token_stack[x]>$", parser_state.token_stack[adjusted_current_count]
                 )
-                if (
-                    parser_state.token_stack[adjusted_current_count].is_list
-                    and adjusted_text_to_parse.strip()
-                ):
+                if parser_state.token_stack[
+                    adjusted_current_count
+                ].is_list and adjusted_text_to_parse.strip(Constants.ascii_whitespace):
                     POGGER.debug("\n\nBOOM\n\n")
                     parser_state.nested_list_start = cast(
                         ListStackToken, parser_state.token_stack[adjusted_current_count]

diff --git a/pymarkdown/container_blocks/container_block_nested_processor.py b/pymarkdown/container_blocks/container_block_nested_processor.py
@@ -10,6 +10,7 @@
 from pymarkdown.block_quotes.block_quote_data import BlockQuoteData
 from pymarkdown.container_blocks.container_grab_bag import ContainerGrabBag
 from pymarkdown.container_blocks.container_indices import ContainerIndices
+from pymarkdown.general.constants import Constants
 from pymarkdown.general.parser_helper import ParserHelper
 from pymarkdown.general.parser_logger import ParserLogger
 from pymarkdown.general.parser_state import ParserState
@@ -428,7 +429,9 @@ def __check_for_nested_list_start(
             POGGER.debug(
                 "parser_state.token_document>>$<<", parser_state.token_document
             )
-            if parser_state.nested_list_start and grab_bag.adj_line_to_parse.strip():
+            if parser_state.nested_list_start and grab_bag.adj_line_to_parse.strip(
+                Constants.ascii_whitespace
+            ):
                 (
                     grab_bag.start_index,
                     indent_level,

diff --git a/pymarkdown/container_blocks/container_block_non_leaf_processor.py b/pymarkdown/container_blocks/container_block_non_leaf_processor.py
@@ -215,7 +215,9 @@ def __handle_trailing_indent_with_block_quote(
             if inner_token.is_block_quote_start:
                 block_quote_token = cast(BlockQuoteMarkdownToken, inner_token)
                 assert block_quote_token.bleading_spaces is not None
-                split_spaces = block_quote_token.bleading_spaces.split("\n")
+                split_spaces = block_quote_token.bleading_spaces.split(
+                    ParserHelper.newline_character
+                )
                 grab_bag.indent_already_processed = len(split_spaces[-1])
             else:
                 assert inner_token.is_list_start

diff --git a/pymarkdown/extensions/disallowed_raw_html.py b/pymarkdown/extensions/disallowed_raw_html.py
@@ -85,7 +85,7 @@ def apply_configuration(
         if modify_tag_names is not None:
             tag_config_name = f"extensions.{self.get_identifier()}.change_tag_names"
             for next_tag_part in modify_tag_names.split(","):
-                next_tag_part = next_tag_part.strip()
+                next_tag_part = next_tag_part.strip(" ")
                 if not next_tag_part:
                     raise ValueError(
                         f"Configuration item '{tag_config_name}' contains at least one empty string."

diff --git a/pymarkdown/extensions/front_matter_extension.py b/pymarkdown/extensions/front_matter_extension.py
@@ -14,6 +14,7 @@
 )
 from pymarkdown.extension_manager.parser_extension import ParserExtension
 from pymarkdown.extensions.front_matter_markdown_token import FrontMatterMarkdownToken
+from pymarkdown.general.constants import Constants
 from pymarkdown.general.parser_logger import ParserLogger
 from pymarkdown.general.position_marker import PositionMarker
 from pymarkdown.general.source_providers import SourceProvider
@@ -77,7 +78,7 @@ def process_header_if_present(
         Take care of processing eligibility and processing for front matter support.
         """
         start_char, extracted_index = ThematicLeafBlockProcessor.is_thematic_break(
-            first_line_in_document.rstrip(),
+            first_line_in_document.rstrip(Constants.ascii_whitespace),
             0,
             "",
             whitespace_allowed_between_characters=False,
@@ -111,23 +112,25 @@ def __handle_document_front_matter(
         Optional[str], Optional[FrontMatterMarkdownToken], int, Optional[List[str]]
     ]:
         starting_line = token_to_use
-        clean_starting_line = starting_line.rstrip()
+        clean_starting_line = starting_line.rstrip(Constants.ascii_whitespace)
         repeat_again = True
         have_closing = False
         collected_lines: List[str] = []
         POGGER.info("Metadata prefix detected, scanning for metadata header.")
         next_line = None
         while repeat_again:
             next_line = source_provider.get_next_line()
-            if next_line and next_line.rstrip():
+            if next_line and next_line.rstrip(Constants.ascii_whitespace):
                 start_char, _ = ThematicLeafBlockProcessor.is_thematic_break(
-                    next_line.rstrip(),
+                    next_line.rstrip(Constants.ascii_whitespace),
                     0,
                     "",
                     whitespace_allowed_between_characters=False,
                 )
-                have_closing = (
-                    bool(start_char) and clean_starting_line == next_line.rstrip()
+                have_closing = bool(
+                    start_char
+                ) and clean_starting_line == next_line.rstrip(
+                    Constants.ascii_whitespace
                 )
                 repeat_again = not have_closing
             elif not self.__allow_blank_lines: