Skip to content
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
6ecfc28
bpo-54873: Backported namespaces prefixes support for xml.sax.expatre…
ukarroum Apr 26, 2024
e9ac454
bpo-54873: Added News entry
ukarroum Apr 26, 2024
9c365c2
Merge branch 'main' into fix-issue-54873
ukarroum Jun 9, 2024
2fc666a
Merge branch 'main' into fix-issue-54873
ukarroum Apr 29, 2025
fe9d82f
Merge branch 'main' into fix-issue-54873
ukarroum Nov 2, 2025
be878cb
Moved test_namespace_prefix to test_sax.py
ukarroum Nov 2, 2025
0e3f870
test_sax: Replaced namespace tests with one test on qualified names
ukarroum Nov 2, 2025
c88e7ed
ExpatParser: Add namespacePrefixesHandling
ukarroum Nov 2, 2025
27cb273
namespace prefixes: add link to w3.org
ukarroum Nov 2, 2025
91112b9
Merge branch 'main' into fix-issue-54873
ukarroum Nov 24, 2025
d3cbe5f
Merge branch 'main' into fix-issue-54873
ukarroum Apr 17, 2026
1d343f1
Update imports in Lib/test/test_sax.py
ukarroum Apr 17, 2026
256a97b
Approuved readibility of unit test
ukarroum Apr 17, 2026
7c0da60
Update pyXML spelling in News.d
ukarroum Apr 17, 2026
e219450
Update Lib/xml/sax/expatreader.py
ukarroum Apr 17, 2026
34af031
Addressed PR comments
ukarroum Apr 17, 2026
8075f8b
Merge branch 'fix-issue-54873' of github.com:ukarroum/cpython into fi…
ukarroum Apr 17, 2026
2004846
Removed unused import
ukarroum Apr 17, 2026
c3e00e2
Update news entry
ukarroum Apr 17, 2026
d3b08ec
Update Doc/whatsnew/3.15.rst
ukarroum Apr 17, 2026
e8bcb30
Update news entry
ukarroum Apr 17, 2026
dfff704
Merge branch 'fix-issue-54873' of github.com:ukarroum/cpython into fi…
ukarroum Apr 17, 2026
7de4ce5
Add a check on namespace feature when enabling namespace prefixes
ukarroum Apr 17, 2026
08d659e
Add test for feature_namespace_prefixes enabled while feature_namespa…
ukarroum Apr 17, 2026
7cdc402
Update Lib/test/test_sax.py
ukarroum Apr 17, 2026
cd7de73
Update test
ukarroum Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,9 @@ xml.parsers.expat

.. _billion laughs: https://en.wikipedia.org/wiki/Billion_laughs_attack

* Add support for namespace prefixes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be also documented in the expat docs.

Copy link
Copy Markdown
Contributor Author

@ukarroum ukarroum Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean I should add a link to the C libexpat doc ?

EDIT: I think you probably meant to document this here: https://docs.python.org/3/library/pyexpat.html

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, to the specs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

(Contributed by Yassir Karroum in :gh:`118317`.)

Comment on lines +1389 to +1391
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adding to section xml.parsers.expat while the changes in here are patching code elsewhere. (I'm unsure when/if news files should be written to directly. I have only seen things go through news files myself.)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ukarroum did you see this one? ⬆️

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will move it to a new section.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's adding it in two places now, it looks like this upper entry ⬇️ still needs to be dropped?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry should have reviewed the diffs one more time before tagging you ^^"
fixed


zlib
----
Expand Down
34 changes: 33 additions & 1 deletion Lib/test/test_sax.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# regression test for SAX 2.0

from xml.sax import make_parser, ContentHandler, \
SAXException, SAXReaderNotAvailable, SAXParseException
SAXException, SAXReaderNotAvailable, SAXParseException, handler
Comment thread
ukarroum marked this conversation as resolved.
Outdated
import unittest
from unittest import mock
try:
Expand Down Expand Up @@ -1307,6 +1307,38 @@ def test_expat_locator_withinfo_nonascii(self):
self.assertEqual(parser.getSystemId(), fname)
self.assertEqual(parser.getPublicId(), None)

def test_namespace_prefix(self):
parser = create_parser()
parser.setFeature(handler.feature_namespaces, 1)
parser.setFeature(handler.feature_namespace_prefixes, 1)

class Handler(handler.ContentHandler):
def startElementNS(self, name, qname, attrs):
self.qname = qname

h = Handler()

parser.setContentHandler(h)
parser.feed("<Q:E xmlns:Q='http://example.org/testuri'/>")
parser.close()
self.assertEqual(h.qname, "Q:E")

def test_default_namespace(self):
parser = create_parser()
parser.setFeature(handler.feature_namespaces, 1)

class Handler(handler.ContentHandler):
def startElementNS(self, name, qname, attrs):
self.qname = qname

h = Handler()

parser.setContentHandler(h)
parser.feed("<E xmlns='http://example.org/testuri'/>")
parser.close()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use parser.parse(..., True)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but I didn't add the True, it seems that the parse method only accept one argument.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Wait, @hartwork should we do feed() + close() or parse()?

Copy link
Copy Markdown
Contributor

@hartwork hartwork Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz hi!

My understanding is that:

  • create_parser is xml.sax.expatreader.create_parser and it creates an xml.sax.xmlreader.XMLReader, first sentence in the docs
  • xml.sax.xmlreader.XMLReader provides .parse that parses in one single go, one single parameter, no boolean isFinal
  • xml.sax.xmlreader.XMLReader does not provide .feed or .close because that needs IncrementalParser, not a plain XMLReader.
  • (xml.sax.expatreader.create_parser creates xml.sax.expatreader.ExpatParser that is an instance of both IncrementalParser and XMLReader for me in practice.)
  • I therefore consider parser.parse("<E xmlns='http://example.org/testuri'/>") to be better suited here because availability of .feed and .close is not guaranteed on interface level.
  • (If you want to stick to .feed here or elsewhere maybe add self.assertIsInstance(parser, IncrementalParser) prior to a call to communicate that expectation and have the test fail meaningfully for regressions in the future.)

What do you think?

self.assertEqual(h.qname, "E")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't leave 3 blank lines, only 2 is sufficient.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed



# ===========================================================================
#
Expand Down
17 changes: 10 additions & 7 deletions Lib/xml/sax/expatreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ def __init__(self, namespaceHandling=0, bufsize=2**16-20):
self._entity_stack = []
self._external_ges = 0
self._interning = None
self._namespace_prefixes = 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it 1 by default?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use a new namespacePrefixesHandling argument.


# XMLReader methods

Expand Down Expand Up @@ -126,8 +127,9 @@ def getFeature(self, name):
return self._namespaces
elif name == feature_string_interning:
return self._interning is not None
elif name in (feature_validation, feature_external_pes,
feature_namespace_prefixes):
elif name == feature_namespace_prefixes:
return self._namespace_prefixes
elif name in (feature_validation, feature_external_pes):
return 0
elif name == feature_external_ges:
return self._external_ges
Expand All @@ -147,6 +149,8 @@ def setFeature(self, name, state):
self._interning = {}
else:
self._interning = None
elif name == feature_namespace_prefixes:
self._namespace_prefixes = state
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote for moving this one further down — right after elif name == feature_external_pes: — to match the left side of the diff.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

elif name == feature_validation:
if state:
raise SAXNotSupportedException(
Expand All @@ -155,10 +159,6 @@ def setFeature(self, name, state):
if state:
raise SAXNotSupportedException(
"expat does not read external parameter entities")
elif name == feature_namespace_prefixes:
if state:
raise SAXNotSupportedException(
"expat does not report namespace prefixes")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it a limitation from the C extension module? was it a Expat version limitation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick git blame, and looks like this specific check was added in this commit: 18476a3 (from 2002).
and in this commit Expat version used is 1.95 (which should support namespace prefixes), so I would say the limitation was probably just in the python wrapper.

else:
raise SAXNotRecognizedException(
"Feature '%s' not recognized" % name)
Expand Down Expand Up @@ -347,11 +347,14 @@ def start_element_ns(self, name, attrs):
pair = name.split()
if len(pair) == 1:
# no namespace
elem_qname = name
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether this is correct or not here. Is there some specs that we could follow for the implementation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're referring specificaly to elem_qname, there is the spec: https://www.w3.org/TR/xml-names/#ns-qualnames.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added another test to test the "else" branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@picnixz these lines are 1:1 from PyXML 0.8.4:

meld

pair = (None, name)
elif len(pair) == 3:
elem_qname = "%s:%s" % (pair[2], pair[1])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: This is the key value provider in this pull request. pair[2] was previously not exposed.

pair = pair[0], pair[1]
else:
# default namespace
elem_qname = pair[1]
pair = tuple(pair)

newattrs = {}
Expand All @@ -374,7 +377,7 @@ def start_element_ns(self, name, attrs):
newattrs[apair] = value
qnames[apair] = qname

self._cont_handler.startElementNS(pair, None,
self._cont_handler.startElementNS(pair, elem_qname,
Comment thread
ukarroum marked this conversation as resolved.
AttributesNSImpl(newattrs, qnames))

def end_element_ns(self, name):
Expand Down
Loading