|
| 1 | +PEP: 9999 |
| 2 | +Title: Freezing the HTML simple repository API |
| 3 | +Author: William Woodruff <william@yossarian.net> |
| 4 | +Sponsor: Donald Stufft <donald@stufft.io> |
| 5 | +PEP-Delegate: Donald Stufft <donald@stufft.io> |
| 6 | +Discussions-To: Pending |
| 7 | +Status: Draft |
| 8 | +Type: Standards Track |
| 9 | +Topic: Packaging |
| 10 | +Created: 16-Apr-2026 |
| 11 | +Post-History: `13-Apr-2026 <https://discuss.python.org/t/106959>`__ |
| 12 | + |
| 13 | + |
| 14 | +Abstract |
| 15 | +======== |
| 16 | + |
| 17 | +This PEP proposes freezing the |
| 18 | +:ref:`standard HTML representation <packaging:simple-repository-html-serialization>` |
| 19 | +of the simple repository API, as originally specified in :pep:`503` |
| 20 | +and updated over subsequent PEPs. |
| 21 | + |
| 22 | +In this context of this PEP, "freezing" means that the HTML representation |
| 23 | +is considered complete from the perspective of the standards process, |
| 24 | +and **SHOULD NOT** be updated by future PEPs. Future PEPs **SHOULD** instead |
| 25 | +target the |
| 26 | +:ref:`standard JSON representation <packaging:simple-repository-api-json>`, |
| 27 | +as originally specified in :pep:`691`. |
| 28 | + |
| 29 | +Similarly, this PEP's freezing of the HTML representation does **not** stipulate |
| 30 | +that installers should remove support for the HTML representation, or that |
| 31 | +indices (like PyPI) will or should stop providing an HTML representation. |
| 32 | + |
| 33 | +Rationale and Motivation |
| 34 | +======================== |
| 35 | + |
| 36 | +The use of an HTML representation for Python package indices predates |
| 37 | +efforts to standardize Python packaging. Consequently, the HTML representation |
| 38 | +standardized with :pep:`503` represents a *formalization* of |
| 39 | +existing practices (particularly those of PyPI), rather than a *design*. |
| 40 | + |
| 41 | +The HTML representation of a Python package index has served the Python |
| 42 | +packaging ecosystem admirably: it has acted as the baseline representation |
| 43 | +that all indices and installers support, and has allowed PyPI to incrementally |
| 44 | +modernize its index presentation while maintaining backwards compatibility |
| 45 | +with installers and mirrors. :pep:`629`, :pep:`714`, :pep:`740`, |
| 46 | +:pep:`792`, and many others demonstrate the viability of this approach. |
| 47 | + |
| 48 | +At the same time, the HTML representation has several limitations that |
| 49 | +have become increasingly apparent and salient as Python packaging as a whole |
| 50 | +has modernized: |
| 51 | + |
| 52 | +- The HTML representation is *rigid*, for backwards compatibility reasons. |
| 53 | + This rigidity makes it difficult to represent new pieces of metadata, |
| 54 | + and PEPs that attempt to do so typically need to shoehorn their changes |
| 55 | + into ``<meta>`` tags or ``data-`` attributes to avoid interfering with |
| 56 | + assumptions that existing consumers make about the structure of the HTML. |
| 57 | + |
| 58 | + This shoehorning process also requires PEPs that modify the HTML index |
| 59 | + to invent syntax for encoding structured data. For example, :pep:`792` |
| 60 | + adds meta tags named ``pypi:project-status`` and |
| 61 | + ``pypi:project-status-reason``, effectively flattening an object |
| 62 | + representation that appears naturally in the JSON representation. |
| 63 | + |
| 64 | + Similarly, the HTML representation's rigidity makes it an optimization |
| 65 | + barrier: :pep:`658` allows indices to serve distribution metadata via |
| 66 | + the simple repository API, but the absence of a straightforward and |
| 67 | + backwards-compatible way to encode that metadata within the HTML |
| 68 | + representation means that installers must incur an additional HTTP round-trip |
| 69 | + to fetch relatively small amounts of information. :pep:`740` adopts a |
| 70 | + similar approach, with similar overhead repercussions. |
| 71 | + |
| 72 | + In practice, some index PEPs have chosen not to modify the HTML representation |
| 73 | + at all, and instead focus solely on the JSON representation. :pep:`700` |
| 74 | + for example introduces both per-distribution metadata *and* a top-level |
| 75 | + ``versions`` key to the JSON representation, but does not modify the HTML |
| 76 | + representation. The original rationale for this was that HTML consumers |
| 77 | + would be unlikely to need the new metadata, |
| 78 | + |
| 79 | +- Relatedly, third-party consumption of the HTML representation is often |
| 80 | + *brittle*: even syntactically valid, non-semantic changes to PyPI's HTML |
| 81 | + representation are |
| 82 | + `known to cause breakage <https://github.com/pypi/warehouse/issues/18275>`__ |
| 83 | + due to unsound assumptions about the exact structure of the HTML, including |
| 84 | + its whitespace. |
| 85 | + |
| 86 | + Consumption of the JSON representation, by contrast, is more robust to |
| 87 | + non-semantic changes thanks to the prevalence of robust JSON parsing |
| 88 | + libraries. Robust handling of HTML is naturally possible, but consumers |
| 89 | + are often *tempted* to avoid the perceived complexity and generality |
| 90 | + of HTML parsing in favor of brittle approaches involving regular expressions |
| 91 | + and similar ad-hoc parsing techniques. |
| 92 | + |
| 93 | +- In practice, *adoption* of incremental improvements to the HTML representation |
| 94 | + is limited: PyPI itself typically adopts new features, but third-party |
| 95 | + indices (particularly those sold as corporate offerings) frequently provide |
| 96 | + only the absolute minimum representation originally defined in :pep:`503`. |
| 97 | + |
| 98 | + As a result, *even when* the HTML representation is improved, many consumers |
| 99 | + do not benefit from those improvements. |
| 100 | + |
| 101 | +Put together, these limitations mean that the HTML representation is (1) |
| 102 | +often difficult to extend in a robust way, (2) *de facto* frozen with |
| 103 | +respect to how many consumers interact with Python packaging, even |
| 104 | +when standards processes work to modernize it. |
| 105 | + |
| 106 | +The purpose of this PEP is to formalize this status quo. |
| 107 | + |
| 108 | +Specification |
| 109 | +============= |
| 110 | + |
| 111 | +The HTML representation of the simple repository API is frozen |
| 112 | +for the purposes of Python packaging standards processes. Future |
| 113 | +Python packaging PEPs **SHOULD NOT** modify the HTML representation of the |
| 114 | +simple repository API, and **MUST** instead modify the JSON representation. |
| 115 | + |
| 116 | +This PEP does not alter the status of the HTML representation on PyPI |
| 117 | +and does not prescribe any behavioral changes for installers. |
| 118 | + |
| 119 | +One functional consequence of this freeze is that future changes |
| 120 | +to the simple repository API will be |
| 121 | +:ref:`versioned <packaging:simple-repository-api-versioning>` as they are |
| 122 | +currently, but that only the JSON representation will receive changes |
| 123 | +to its versioning marker. For example, if a future PEP introduces |
| 124 | +version 1.5 of the simple repository API, the HTML representation will retain |
| 125 | +the following versioning marker: |
| 126 | + |
| 127 | +.. code-block:: html |
| 128 | + |
| 129 | + <meta name="pypi:repository-version" content="1.4"> |
| 130 | + |
| 131 | +Future Considerations |
| 132 | +===================== |
| 133 | + |
| 134 | +This PEP does not stipulate any changes to how indices and installers should |
| 135 | +handle the HTML representation. |
| 136 | + |
| 137 | +As of April 2026, the prospect of *fully* removing support for the HTML |
| 138 | +representation from either indices or installers is unrealistic: it is simply |
| 139 | +too critical to the ecosystem, and efforts to remove it would be extremely |
| 140 | +and unreasonably disruptive. |
| 141 | + |
| 142 | +However, it is not *inconceivable* that the HTML representation could be |
| 143 | +fully removed (or relegated to legacy/default-disabled flows) in the future. |
| 144 | +This PEP does not preclude such a future, but does not propose it either. |
| 145 | + |
| 146 | +The Python packaging community has made several valuable observations |
| 147 | +around behaviors that make outright removal of the HTML representation |
| 148 | +difficult or infeasible, including: |
| 149 | + |
| 150 | +- By virtue of being the default, the HTML representation is extremely |
| 151 | + easy to adopt internally: it doesn't require any (explicit) content |
| 152 | + negotiation, and can often be served trivially by a CDN or a minimal |
| 153 | + HTTP server (like ``python -m http.server``). |
| 154 | + |
| 155 | + The JSON representation does not technically require content negotiation |
| 156 | + either, but in practice clients that consume it expect to perform |
| 157 | + explicit content negotiation due to the assumption that the same URL |
| 158 | + provides both representations. Consequently, any future efforts to remove the |
| 159 | + HTML representation will likely require a simpler adoption story for the JSON |
| 160 | + representation. |
| 161 | + |
| 162 | +- The HTML representation is currently easier for installers like ``pip`` |
| 163 | + to parse incrementally, as the Python standard library includes |
| 164 | + ``html.parser`` for incremental HTML parsing. This helps mitigate |
| 165 | + the memory overhead of large HTML index responses, e.g. detail responses |
| 166 | + for packages that have hundreds or thousands of distributions. |
| 167 | + |
| 168 | + By contrast, Python's standard library currently lacks an incremental |
| 169 | + JSON parser. Incremental JSON parsing is not impractical (and is strictly |
| 170 | + less complex than incremental HTML parsing), but the absence of a |
| 171 | + standard library solution presents an adoption barrier. |
| 172 | + Future efforts to remove the HTML representation will likely require a robust |
| 173 | + standard library (or acceptably vendorable third-party) solution for |
| 174 | + incremental JSON parsing within ``pip``. |
| 175 | + |
| 176 | +Security Implications |
| 177 | +===================== |
| 178 | + |
| 179 | +This PEP does not identify and positive or negative security implications |
| 180 | +associated with freezing the HTML representation of the simple repository |
| 181 | +API. |
| 182 | + |
| 183 | +How to Teach This |
| 184 | +================= |
| 185 | + |
| 186 | +Because this PEP only freezes the HTML representation of the simple repository |
| 187 | +API for the purposes of Python packaging standards processes, the end user |
| 188 | +implications of this PEP are limited. |
| 189 | + |
| 190 | +However, for third-party indices that wish to modernize their index |
| 191 | +representations, this PEP proposes the following if accepted: |
| 192 | + |
| 193 | +- The authors of this PEP will coordinate with the maintainers |
| 194 | + of PyPI on appropriate public-facing documentation and communication, |
| 195 | + including an announcement on the `PyPI blog <https://blog.pypi.org>`_ |
| 196 | + if deemed appropriate. |
| 197 | + |
| 198 | +- The authors of this PEP will make appropriate changes to the |
| 199 | + :ref:`living standard <packaging:simple-repository-api>` for the simple |
| 200 | + repository API, including admonitions and callouts where appropriate |
| 201 | + to indicate that the HTML representation will not receive future updates. |
| 202 | + |
| 203 | +Rejected Ideas |
| 204 | +============== |
| 205 | + |
| 206 | +Doing nothing |
| 207 | +------------- |
| 208 | + |
| 209 | +Doing nothing is always an option. Per above, this would be a continuation |
| 210 | +of the status quo, wherein the HTML representation is updated on paper |
| 211 | +(and on PyPI), but is frozen in practice in third-party settings. |
| 212 | + |
| 213 | +The authors of this PEP believe that being explicit about the status |
| 214 | +of the HTML representation is valuable, and would benefit future standards |
| 215 | +efforts by diverting design effort away from shoehorning new features |
| 216 | +into the HTML representation. |
| 217 | + |
| 218 | + |
| 219 | +Aggressively removing the HTML representation |
| 220 | +--------------------------------------------- |
| 221 | + |
| 222 | +Encouraging indices and installers to aggressively remove support for the HTML |
| 223 | +representation is another option. However, as noted above, this is unrealistic |
| 224 | +in the near term, and would be disruptive to the ecosystem. |
| 225 | + |
| 226 | +The authors of this PEP believe that freezing is a more gradual and |
| 227 | +pragmatic approach that better reflects the ecosystem's reality. |
| 228 | + |
| 229 | +Copyright |
| 230 | +========= |
| 231 | + |
| 232 | +This document is placed in the public domain or under the CC0-1.0-Universal |
| 233 | +license, whichever is more permissive. |
0 commit comments