Skip to content

Commit 0fd8fae

Browse files
committed
Draft for freezing the HTML simple repository API
Signed-off-by: William Woodruff <william@astral.sh> Lint Signed-off-by: William Woodruff <william@astral.sh> More lint Signed-off-by: William Woodruff <william@astral.sh> Deprecation -> freezing Signed-off-by: William Woodruff <william@astral.sh> Mention optimization problems Signed-off-by: William Woodruff <william@astral.sh> Tweak conneg language Signed-off-by: William Woodruff <william@astral.sh> Update spon/delegate Signed-off-by: William Woodruff <william@astral.sh> Precision Signed-off-by: William Woodruff <william@astral.sh>
1 parent d6bcdaf commit 0fd8fae

File tree

1 file changed

+233
-0
lines changed

1 file changed

+233
-0
lines changed

peps/pep-9999.rst

Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
PEP: 9999
2+
Title: Freezing the HTML simple repository API
3+
Author: William Woodruff <william@yossarian.net>
4+
Sponsor: Donald Stufft <donald@stufft.io>
5+
PEP-Delegate: Donald Stufft <donald@stufft.io>
6+
Discussions-To: Pending
7+
Status: Draft
8+
Type: Standards Track
9+
Topic: Packaging
10+
Created: 16-Apr-2026
11+
Post-History: `13-Apr-2026 <https://discuss.python.org/t/106959>`__
12+
13+
14+
Abstract
15+
========
16+
17+
This PEP proposes freezing the
18+
:ref:`standard HTML representation <packaging:simple-repository-html-serialization>`
19+
of the simple repository API, as originally specified in :pep:`503`
20+
and updated over subsequent PEPs.
21+
22+
In this context of this PEP, "freezing" means that the HTML representation
23+
is considered complete from the perspective of the standards process,
24+
and **SHOULD NOT** be updated by future PEPs. Future PEPs **SHOULD** instead
25+
target the
26+
:ref:`standard JSON representation <packaging:simple-repository-api-json>`,
27+
as originally specified in :pep:`691`.
28+
29+
Similarly, this PEP's freezing of the HTML representation does **not** stipulate
30+
that installers should remove support for the HTML representation, or that
31+
indices (like PyPI) will or should stop providing an HTML representation.
32+
33+
Rationale and Motivation
34+
========================
35+
36+
The use of an HTML representation for Python package indices predates
37+
efforts to standardize Python packaging. Consequently, the HTML representation
38+
standardized with :pep:`503` represents a *formalization* of
39+
existing practices (particularly those of PyPI), rather than a *design*.
40+
41+
The HTML representation of a Python package index has served the Python
42+
packaging ecosystem admirably: it has acted as the baseline representation
43+
that all indices and installers support, and has allowed PyPI to incrementally
44+
modernize its index presentation while maintaining backwards compatibility
45+
with installers and mirrors. :pep:`629`, :pep:`714`, :pep:`740`,
46+
:pep:`792`, and many others demonstrate the viability of this approach.
47+
48+
At the same time, the HTML representation has several limitations that
49+
have become increasingly apparent and salient as Python packaging as a whole
50+
has modernized:
51+
52+
- The HTML representation is *rigid*, for backwards compatibility reasons.
53+
This rigidity makes it difficult to represent new pieces of metadata,
54+
and PEPs that attempt to do so typically need to shoehorn their changes
55+
into ``<meta>`` tags or ``data-`` attributes to avoid interfering with
56+
assumptions that existing consumers make about the structure of the HTML.
57+
58+
This shoehorning process also requires PEPs that modify the HTML index
59+
to invent syntax for encoding structured data. For example, :pep:`792`
60+
adds meta tags named ``pypi:project-status`` and
61+
``pypi:project-status-reason``, effectively flattening an object
62+
representation that appears naturally in the JSON representation.
63+
64+
Similarly, the HTML representation's rigidity makes it an optimization
65+
barrier: :pep:`658` allows indices to serve distribution metadata via
66+
the simple repository API, but the absence of a straightforward and
67+
backwards-compatible way to encode that metadata within the HTML
68+
representation means that installers must incur an additional HTTP round-trip
69+
to fetch relatively small amounts of information. :pep:`740` adopts a
70+
similar approach, with similar overhead repercussions.
71+
72+
In practice, some index PEPs have chosen not to modify the HTML representation
73+
at all, and instead focus solely on the JSON representation. :pep:`700`
74+
for example introduces both per-distribution metadata *and* a top-level
75+
``versions`` key to the JSON representation, but does not modify the HTML
76+
representation. The original rationale for this was that HTML consumers
77+
would be unlikely to need the new metadata,
78+
79+
- Relatedly, third-party consumption of the HTML representation is often
80+
*brittle*: even syntactically valid, non-semantic changes to PyPI's HTML
81+
representation are
82+
`known to cause breakage <https://github.com/pypi/warehouse/issues/18275>`__
83+
due to unsound assumptions about the exact structure of the HTML, including
84+
its whitespace.
85+
86+
Consumption of the JSON representation, by contrast, is more robust to
87+
non-semantic changes thanks to the prevalence of robust JSON parsing
88+
libraries. Robust handling of HTML is naturally possible, but consumers
89+
are often *tempted* to avoid the perceived complexity and generality
90+
of HTML parsing in favor of brittle approaches involving regular expressions
91+
and similar ad-hoc parsing techniques.
92+
93+
- In practice, *adoption* of incremental improvements to the HTML representation
94+
is limited: PyPI itself typically adopts new features, but third-party
95+
indices (particularly those sold as corporate offerings) frequently provide
96+
only the absolute minimum representation originally defined in :pep:`503`.
97+
98+
As a result, *even when* the HTML representation is improved, many consumers
99+
do not benefit from those improvements.
100+
101+
Put together, these limitations mean that the HTML representation is (1)
102+
often difficult to extend in a robust way, (2) *de facto* frozen with
103+
respect to how many consumers interact with Python packaging, even
104+
when standards processes work to modernize it.
105+
106+
The purpose of this PEP is to formalize this status quo.
107+
108+
Specification
109+
=============
110+
111+
The HTML representation of the simple repository API is frozen
112+
for the purposes of Python packaging standards processes. Future
113+
Python packaging PEPs **SHOULD NOT** modify the HTML representation of the
114+
simple repository API, and **MUST** instead modify the JSON representation.
115+
116+
This PEP does not alter the status of the HTML representation on PyPI
117+
and does not prescribe any behavioral changes for installers.
118+
119+
One functional consequence of this freeze is that future changes
120+
to the simple repository API will be
121+
:ref:`versioned <packaging:simple-repository-api-versioning>` as they are
122+
currently, but that only the JSON representation will receive changes
123+
to its versioning marker. For example, if a future PEP introduces
124+
version 1.5 of the simple repository API, the HTML representation will retain
125+
the following versioning marker:
126+
127+
.. code-block:: html
128+
129+
<meta name="pypi:repository-version" content="1.4">
130+
131+
Future Considerations
132+
=====================
133+
134+
This PEP does not stipulate any changes to how indices and installers should
135+
handle the HTML representation.
136+
137+
As of April 2026, the prospect of *fully* removing support for the HTML
138+
representation from either indices or installers is unrealistic: it is simply
139+
too critical to the ecosystem, and efforts to remove it would be extremely
140+
and unreasonably disruptive.
141+
142+
However, it is not *inconceivable* that the HTML representation could be
143+
fully removed (or relegated to legacy/default-disabled flows) in the future.
144+
This PEP does not preclude such a future, but does not propose it either.
145+
146+
The Python packaging community has made several valuable observations
147+
around behaviors that make outright removal of the HTML representation
148+
difficult or infeasible, including:
149+
150+
- By virtue of being the default, the HTML representation is extremely
151+
easy to adopt internally: it doesn't require any (explicit) content
152+
negotiation, and can often be served trivially by a CDN or a minimal
153+
HTTP server (like ``python -m http.server``).
154+
155+
The JSON representation does not technically require content negotiation
156+
either, but in practice clients that consume it expect to perform
157+
explicit content negotiation due to the assumption that the same URL
158+
provides both representations. Consequently, any future efforts to remove the
159+
HTML representation will likely require a simpler adoption story for the JSON
160+
representation.
161+
162+
- The HTML representation is currently easier for installers like ``pip``
163+
to parse incrementally, as the Python standard library includes
164+
``html.parser`` for incremental HTML parsing. This helps mitigate
165+
the memory overhead of large HTML index responses, e.g. detail responses
166+
for packages that have hundreds or thousands of distributions.
167+
168+
By contrast, Python's standard library currently lacks an incremental
169+
JSON parser. Incremental JSON parsing is not impractical (and is strictly
170+
less complex than incremental HTML parsing), but the absence of a
171+
standard library solution presents an adoption barrier.
172+
Future efforts to remove the HTML representation will likely require a robust
173+
standard library (or acceptably vendorable third-party) solution for
174+
incremental JSON parsing within ``pip``.
175+
176+
Security Implications
177+
=====================
178+
179+
This PEP does not identify and positive or negative security implications
180+
associated with freezing the HTML representation of the simple repository
181+
API.
182+
183+
How to Teach This
184+
=================
185+
186+
Because this PEP only freezes the HTML representation of the simple repository
187+
API for the purposes of Python packaging standards processes, the end user
188+
implications of this PEP are limited.
189+
190+
However, for third-party indices that wish to modernize their index
191+
representations, this PEP proposes the following if accepted:
192+
193+
- The authors of this PEP will coordinate with the maintainers
194+
of PyPI on appropriate public-facing documentation and communication,
195+
including an announcement on the `PyPI blog <https://blog.pypi.org>`_
196+
if deemed appropriate.
197+
198+
- The authors of this PEP will make appropriate changes to the
199+
:ref:`living standard <packaging:simple-repository-api>` for the simple
200+
repository API, including admonitions and callouts where appropriate
201+
to indicate that the HTML representation will not receive future updates.
202+
203+
Rejected Ideas
204+
==============
205+
206+
Doing nothing
207+
-------------
208+
209+
Doing nothing is always an option. Per above, this would be a continuation
210+
of the status quo, wherein the HTML representation is updated on paper
211+
(and on PyPI), but is frozen in practice in third-party settings.
212+
213+
The authors of this PEP believe that being explicit about the status
214+
of the HTML representation is valuable, and would benefit future standards
215+
efforts by diverting design effort away from shoehorning new features
216+
into the HTML representation.
217+
218+
219+
Aggressively removing the HTML representation
220+
---------------------------------------------
221+
222+
Encouraging indices and installers to aggressively remove support for the HTML
223+
representation is another option. However, as noted above, this is unrealistic
224+
in the near term, and would be disruptive to the ecosystem.
225+
226+
The authors of this PEP believe that freezing is a more gradual and
227+
pragmatic approach that better reflects the ecosystem's reality.
228+
229+
Copyright
230+
=========
231+
232+
This document is placed in the public domain or under the CC0-1.0-Universal
233+
license, whichever is more permissive.

0 commit comments

Comments
 (0)