Skip to content

Add auto-language extra for code block language detection (#361)#706

Open
Darkness1521 wants to merge 1 commit into
trentm:masterfrom
Darkness1521:master
Open

Add auto-language extra for code block language detection (#361)#706
Darkness1521 wants to merge 1 commit into
trentm:masterfrom
Darkness1521:master

Conversation

@Darkness1521
Copy link
Copy Markdown

@Darkness1521 Darkness1521 commented May 18, 2026

Closes #361

Summary

  • Add auto-language extra that automatically detects the programming
    language of fenced code blocks without explicit language tags
  • Uses heuristic pattern matching (no new dependencies)
  • Supports 13 languages: Python, JavaScript, HTML, CSS, SQL, Bash, Java,
    Go, Rust, Ruby, PHP, JSON, YAML, C/C++

How it works

Runs before fenced-code-blocks in the processing pipeline. When a code
block has no language tag, it analyzes the content and inserts the
detected language name. The existing fenced-code-blocks and Pygments
highlighting then process it normally.

Usage

import markdown2
html = markdown2.markdown(text, extras=["fenced-code-blocks", "auto-language"])

Tests

On 24 short code snippets (the typical case for markdown code blocks):

Method Accuracy
Our heuristic (detect_language) 24/24 = 100%
Pygments guess_lexer() 6/24 = 25%

The test suite passes with no regressions. All changes comply with the
project's contribution guidelines (PEP8, test coverage, docs updated).

Result

with auto-language:
with auto-language

without auto-language:
without auto-language

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] language guess for code blocks

1 participant