Skip to content

Commit a215a97

Browse files
pawamoyoprypintvdboom
authored
feat: Add Markdown anchors and aliases
Replaces-PR-#20: #20 Related-to-issue-#25: #25 Related-to-issue-#35: #35 Co-authored-by: Oleh Prypin <oleh@pryp.in> Co-authored-by: tvdboom <m.524687@gmail.com>
1 parent 0c1781d commit a215a97

File tree

5 files changed

+291
-6
lines changed

5 files changed

+291
-6
lines changed

README.md

Lines changed: 94 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,4 +49,97 @@ This works the same as [a normal link to that heading](../doc1.md#hello-world).
4949

5050
Linking to a heading without needing to know the destination page can be useful if specifying that path is cumbersome, e.g. when the pages have deeply nested paths, are far apart, or are moved around frequently. And the issue is somewhat exacerbated by the fact that [MkDocs supports only *relative* links between pages](https://github.com/mkdocs/mkdocs/issues/1592).
5151

52-
Note that this plugin's behavior is undefined when trying to link to a heading title that appears several times throughout the site. Currently it arbitrarily chooses one of the pages.
52+
Note that this plugin's behavior is undefined when trying to link to a heading title that appears several times throughout the site. Currently it arbitrarily chooses one of the pages. In such cases, use [Markdown anchors](#markdown-anchors) to add unique aliases to your headings.
53+
54+
### Markdown anchors
55+
56+
The autorefs plugin offers a feature called "Markdown anchors". Such anchors can be added anywhere in a document, and linked to from any other place.
57+
58+
The syntax is:
59+
60+
```md
61+
[](){#id-of-the-anchor}
62+
```
63+
64+
If you look closely, it starts with the usual syntax for a link, `[]()`, except both the text value and URL of the link are empty. Then we see `{#id-of-the-anchor}`, which is the syntax supported by the [`attr_list`](https://python-markdown.github.io/extensions/attr_list/) extension. It sets an HTML id to the anchor element. The autorefs plugin simply gives a meaning to such anchors with ids. Note that raw HTML anchors like `<a id="foo"></a>` are not supported.
65+
66+
The `attr_list` extension must be enabled for the Markdown anchors feature to work:
67+
68+
```yaml
69+
# mkdocs.yml
70+
plugins:
71+
- search
72+
- autorefs
73+
74+
markdown_extensions:
75+
- attr_list
76+
```
77+
78+
Now, you can add anchors to documents:
79+
80+
```md
81+
Somewhere in a document.
82+
83+
[](){#foobar-paragraph}
84+
85+
Paragraph about foobar.
86+
```
87+
88+
...making it possible to link to this anchor with our automatic links:
89+
90+
```md
91+
In any document.
92+
93+
Check out the [paragraph about foobar][foobar-pararaph].
94+
```
95+
96+
If you add a Markdown anchor right above a heading, this anchor will redirect to the heading itself:
97+
98+
```md
99+
[](){#foobar}
100+
## A verbose title about foobar
101+
```
102+
103+
Linking to the `foobar` anchor will bring you directly to the heading, not the anchor itself, so the URL will show `#a-verbose-title-about-foobar` instead of `#foobar`. These anchors therefore act as "aliases" for headings. It is possible to define multiple aliases per heading:
104+
105+
```md
106+
[](){#contributing}
107+
[](){#development-setup}
108+
## How to contribute to the project?
109+
```
110+
111+
Such aliases are especially useful when the same headings appear in several different pages. Without aliases, linking to the heading is undefined behavior (it could lead to any one of the headings). With unique aliases above headings, you can make sure to link to the right heading.
112+
113+
For example, consider the following setup. You have one document per operating system describing how to install a project with the OS package manager or from sources:
114+
115+
```tree
116+
docs/
117+
install/
118+
arch.md
119+
debian.md
120+
gentoo.md
121+
```
122+
123+
Each page has:
124+
125+
```md
126+
## Install with package manager
127+
...
128+
129+
## Install from sources
130+
...
131+
```
132+
133+
You don't want to change headings and make them redundant, like `## Arch: Install with package manager` and `## Debian: Install with package manager` just to be able to reference the right one with autorefs. Instead you can do this:
134+
135+
```md
136+
[](){#arch-install-pkg}
137+
## Install with package manager
138+
...
139+
140+
[](){#arch-install-src}
141+
## Install from sources
142+
...
143+
```
144+
145+
...changing `arch` by `debian`, `gentoo`, etc. in the other pages.

mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ markdown_extensions:
9595
permalink: "¤"
9696

9797
plugins:
98+
- autorefs
9899
- search
99100
- markdown-exec
100101
- gen-files:
@@ -109,6 +110,7 @@ plugins:
109110
import:
110111
- https://docs.python.org/3/objects.inv
111112
- https://www.mkdocs.org/objects.inv
113+
- https://python-markdown.github.io/objects.inv
112114
paths: [src]
113115
options:
114116
docstring_options:

src/mkdocs_autorefs/plugin.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@
1818
from typing import TYPE_CHECKING, Any, Callable, Sequence
1919
from urllib.parse import urlsplit
2020

21+
from mkdocs.config.defaults import MkDocsConfig
2122
from mkdocs.plugins import BasePlugin
23+
from mkdocs.structure.pages import Page
2224

2325
from mkdocs_autorefs.references import AutorefsExtension, fix_refs, relative_url
2426

@@ -59,14 +61,14 @@ def __init__(self) -> None:
5961
self._abs_url_map: dict[str, str] = {}
6062
self.get_fallback_anchor: Callable[[str], tuple[str, ...]] | None = None
6163

62-
def register_anchor(self, page: str, identifier: str) -> None:
64+
def register_anchor(self, page: str, identifier: str, anchor: str | None = None) -> None:
6365
"""Register that an anchor corresponding to an identifier was encountered when rendering the page.
6466
6567
Arguments:
6668
page: The relative URL of the current page. Examples: `'foo/bar/'`, `'foo/index.html'`
6769
identifier: The HTML anchor (without '#') as a string.
6870
"""
69-
self._url_map[identifier] = f"{page}#{identifier}"
71+
self._url_map[identifier] = f"{page}#{anchor or identifier}"
7072

7173
def register_url(self, identifier: str, url: str) -> None:
7274
"""Register that the identifier should be turned into a link to this URL.
@@ -133,7 +135,7 @@ def on_config(self, config: MkDocsConfig) -> MkDocsConfig | None:
133135
The modified config.
134136
"""
135137
log.debug("Adding AutorefsExtension to the list")
136-
config["markdown_extensions"].append(AutorefsExtension())
138+
config["markdown_extensions"].append(AutorefsExtension(self))
137139
return config
138140

139141
def on_page_markdown(self, markdown: str, page: Page, **kwargs: Any) -> str: # noqa: ARG002
@@ -145,7 +147,8 @@ def on_page_markdown(self, markdown: str, page: Page, **kwargs: Any) -> str: #
145147
kwargs: Additional arguments passed by MkDocs.
146148
147149
Returns:
148-
The same Markdown. We only use this hook to map anchors to URLs.
150+
The same Markdown. We only use this hook to keep a reference to the current page URL,
151+
used during Markdown conversion by the anchor scanner tree processor.
149152
"""
150153
self.current_page = page.url
151154
return markdown

src/mkdocs_autorefs/references.py

Lines changed: 104 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,33 @@
22

33
from __future__ import annotations
44

5+
import logging
56
import re
67
from html import escape, unescape
7-
from typing import TYPE_CHECKING, Any, Callable, Match
8+
from typing import TYPE_CHECKING, Any, Callable, ClassVar, Match
89
from urllib.parse import urlsplit
910
from xml.etree.ElementTree import Element
1011

1112
import markupsafe
13+
from markdown.core import Markdown
1214
from markdown.extensions import Extension
1315
from markdown.inlinepatterns import REFERENCE_RE, ReferenceInlineProcessor
16+
from markdown.treeprocessors import Treeprocessor
1417
from markdown.util import HTML_PLACEHOLDER_RE, INLINE_PLACEHOLDER_RE
1518

1619
if TYPE_CHECKING:
1720
from markdown import Markdown
1821

22+
from mkdocs_autorefs.plugin import AutorefsPlugin
23+
24+
try:
25+
from mkdocs.plugins import get_plugin_logger
26+
27+
log = get_plugin_logger(__name__)
28+
except ImportError:
29+
# TODO: remove once support for MkDocs <1.5 is dropped
30+
log = logging.getLogger(f"mkdocs.plugins.{__name__}") # type: ignore[assignment]
31+
1932
_ATTR_VALUE = r'"[^"<>]+"|[^"<> ]+' # Possibly with double quotes around
2033
AUTO_REF_RE = re.compile(
2134
rf"<span data-(?P<kind>autorefs-(?:identifier|optional|optional-hover))=(?P<identifier>{_ATTR_VALUE})"
@@ -208,13 +221,96 @@ def fix_refs(html: str, url_mapper: Callable[[str], str]) -> tuple[str, list[str
208221
return html, unmapped
209222

210223

224+
class AnchorScannerTreeProcessor(Treeprocessor):
225+
"""Tree processor to scan and register HTML anchors."""
226+
227+
_htags: ClassVar[set[str]] = {"h1", "h2", "h3", "h4", "h5", "h6"}
228+
229+
def __init__(self, plugin: AutorefsPlugin, md: Markdown | None = None) -> None:
230+
"""Initialize the tree processor.
231+
232+
Parameters:
233+
plugin: A reference to the autorefs plugin, to use its `register_anchor` method.
234+
"""
235+
super().__init__(md)
236+
self.plugin = plugin
237+
238+
def run(self, root: Element) -> None: # noqa: D102
239+
if self.plugin.current_page is not None:
240+
pending_anchors = _PendingAnchors(self.plugin, self.plugin.current_page)
241+
self._scan_anchors(root, pending_anchors)
242+
pending_anchors.flush()
243+
244+
def _scan_anchors(self, parent: Element, pending_anchors: _PendingAnchors) -> None:
245+
for el in parent:
246+
if el.tag == "a":
247+
# We found an anchor. Record its id if it has one.
248+
if anchor_id := el.get("id"):
249+
pending_anchors.append(anchor_id)
250+
# If the element has text or a link, it's not an alias.
251+
# Non-whitespace text after the element interrupts the chain, aliases can't apply.
252+
if el.text or el.get("href") or (el.tail and el.tail.strip()):
253+
pending_anchors.flush()
254+
255+
elif el.tag == "p":
256+
# A `p` tag is a no-op for our purposes, just recurse into it in the context
257+
# of the current collection of anchors.
258+
self._scan_anchors(el, pending_anchors)
259+
# Non-whitespace text after the element interrupts the chain, aliases can't apply.
260+
if el.tail and el.tail.strip():
261+
pending_anchors.flush()
262+
263+
elif el.tag in self._htags:
264+
# If the element is a heading, that turns the pending anchors into aliases.
265+
pending_anchors.flush(el.get("id"))
266+
267+
else:
268+
# But if it's some other interruption, flush anchors anyway as non-aliases.
269+
pending_anchors.flush()
270+
# Recurse into sub-elements, in a *separate* context.
271+
self.run(el)
272+
273+
274+
class _PendingAnchors:
275+
"""A collection of HTML anchors that may or may not become aliased to an upcoming heading."""
276+
277+
def __init__(self, plugin: AutorefsPlugin, current_page: str):
278+
self.plugin = plugin
279+
self.current_page = current_page
280+
self.anchors: list[str] = []
281+
282+
def append(self, anchor: str) -> None:
283+
self.anchors.append(anchor)
284+
285+
def flush(self, alias_to: str | None = None) -> None:
286+
for anchor in self.anchors:
287+
self.plugin.register_anchor(self.current_page, anchor, alias_to)
288+
self.anchors.clear()
289+
290+
211291
class AutorefsExtension(Extension):
212292
"""Extension that inserts auto-references in Markdown."""
213293

294+
def __init__(
295+
self,
296+
plugin: AutorefsPlugin | None = None,
297+
**kwargs: Any,
298+
) -> None:
299+
"""Initialize the Markdown extension.
300+
301+
Parameters:
302+
plugin: An optional reference to the autorefs plugin (to pass it to the anchor scanner tree processor).
303+
**kwargs: Keyword arguments passed to the [base constructor][markdown.extensions.Extension].
304+
"""
305+
super().__init__(**kwargs)
306+
self.plugin = plugin
307+
214308
def extendMarkdown(self, md: Markdown) -> None: # noqa: N802 (casing: parent method's name)
215309
"""Register the extension.
216310
217311
Add an instance of our [`AutoRefInlineProcessor`][mkdocs_autorefs.references.AutoRefInlineProcessor] to the Markdown parser.
312+
Also optionally add an instance of our [`AnchorScannerTreeProcessor`][mkdocs_autorefs.references.AnchorScannerTreeProcessor]
313+
to the Markdown parser if a reference to the autorefs plugin was passed to this extension.
218314
219315
Arguments:
220316
md: A `markdown.Markdown` instance.
@@ -224,3 +320,10 @@ def extendMarkdown(self, md: Markdown) -> None: # noqa: N802 (casing: parent me
224320
"mkdocs-autorefs",
225321
priority=168, # Right after markdown.inlinepatterns.ReferenceInlineProcessor
226322
)
323+
if self.plugin is not None and self.plugin.scan_toc and "attr_list" in md.treeprocessors:
324+
log.debug("Enabling Markdown anchors feature")
325+
md.treeprocessors.register(
326+
AnchorScannerTreeProcessor(self.plugin, md),
327+
"mkdocs-autorefs-anchors-scanner",
328+
priority=0,
329+
)

0 commit comments

Comments
 (0)