Skip to content

Commit 9b79bf0

Browse files
committed
Remove more Python 2 language from docs
Change-Id: Icdddf85b3ddf5d3e7172e318c9e75b3c9a857314
1 parent 87c102c commit 9b79bf0

3 files changed

Lines changed: 31 additions & 118 deletions

File tree

doc/build/filtering.rst

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,15 @@ The built-in escape flags are:
3434
* ``trim`` : whitespace trimming, provided by ``string.strip()``
3535
* ``entity`` : produces HTML entity references for applicable
3636
strings, derived from ``htmlentitydefs``
37-
* ``unicode`` (``str`` on Python 3): produces a Python unicode
37+
* ``str`` : produces a Python unicode
3838
string (this function is applied by default)
39-
* ``decode.<some encoding>``: decode input into a Python
39+
* ``unicode`` : aliased to ``str`` above
40+
41+
.. versionchanged:: 1.2.0
42+
Prior versions applied the ``unicode`` built-in when running in Python 2;
43+
in 1.2.0 Mako applies the Python 3 ``str`` built-in.
44+
45+
* ``decode.<some encoding>`` : decode input into a Python
4046
unicode with the specified encoding
4147
* ``n`` : disable all default filtering; only filters specified
4248
in the local expression tag will be applied.
@@ -101,13 +107,13 @@ In addition to the ``expression_filter`` argument, the
101107
:class:`.TemplateLookup` can specify filtering for all expression tags
102108
at the programmatic level. This array-based argument, when given
103109
its default argument of ``None``, will be internally set to
104-
``["unicode"]`` (or ``["str"]`` on Python 3):
110+
``["str"]``:
105111

106112
.. sourcecode:: python
107113

108-
t = TemplateLookup(directories=['/tmp'], default_filters=['unicode'])
114+
t = TemplateLookup(directories=['/tmp'], default_filters=['str'])
109115

110-
To replace the usual ``unicode``/``str`` function with a
116+
To replace the usual ``str`` function with a
111117
specific encoding, the ``decode`` filter can be substituted:
112118

113119
.. sourcecode:: python
@@ -128,7 +134,7 @@ applied first.
128134

129135
.. sourcecode:: python
130136

131-
t = Template(templatetext, default_filters=['unicode', 'myfilter'])
137+
t = Template(templatetext, default_filters=['str', 'myfilter'])
132138

133139
To ease the usage of ``default_filters`` with custom filters,
134140
you can also add imports (or other code) to all templates using
@@ -137,7 +143,7 @@ the ``imports`` argument:
137143
.. sourcecode:: python
138144

139145
t = TemplateLookup(directories=['/tmp'],
140-
default_filters=['unicode', 'myfilter'],
146+
default_filters=['str', 'myfilter'],
141147
imports=['from mypackage import myfilter'])
142148

143149
The above will generate templates something like this:
@@ -148,7 +154,7 @@ The above will generate templates something like this:
148154
from mypackage import myfilter
149155

150156
def render_body(context):
151-
context.write(myfilter(unicode("some text")))
157+
context.write(myfilter(str("some text")))
152158

153159
.. _expression_filtering_nfilter:
154160

doc/build/unicode.rst

Lines changed: 13 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -4,74 +4,12 @@
44
The Unicode Chapter
55
===================
66

7-
.. note:: this chapter was written many years ago and is very Python-2
8-
centric. As of Mako 1.1.3, the default template encoding is ``utf-8``.
9-
10-
The Python language supports two ways of representing what we
11-
know as "strings", i.e. series of characters. In Python 2, the
12-
two types are ``string`` and ``unicode``, and in Python 3 they are
13-
``bytes`` and ``string``. A key aspect of the Python 2 ``string`` and
14-
Python 3 ``bytes`` types are that they contain no information
15-
regarding what **encoding** the data is stored in. For this
16-
reason they were commonly referred to as **byte strings** on
17-
Python 2, and Python 3 makes this name more explicit. The
18-
origins of this come from Python's background of being developed
19-
before the Unicode standard was even available, back when
20-
strings were C-style strings and were just that, a series of
21-
bytes. Strings that had only values below 128 just happened to
22-
be **ASCII** strings and were printable on the console, whereas
23-
strings with values above 128 would produce all kinds of
24-
graphical characters and bells.
25-
26-
Contrast the "byte-string" type with the "unicode/string" type.
27-
Objects of this latter type are created whenever you say something like
28-
``u"hello world"`` (or in Python 3, just ``"hello world"``). In this
29-
case, Python represents each character in the string internally
30-
using multiple bytes per character (something similar to
31-
UTF-16). What's important is that when using the
32-
``unicode``/``string`` type to store strings, Python knows the
33-
data's encoding; it's in its own internal format. Whereas when
34-
using the ``string``/``bytes`` type, it does not.
35-
36-
When Python 2 attempts to treat a byte-string as a string, which
37-
means it's attempting to compare/parse its characters, to coerce
38-
it into another encoding, or to decode it to a unicode object,
39-
it has to guess what the encoding is. In this case, it will
40-
pretty much always guess the encoding as ``ascii``... and if the
41-
byte-string contains bytes above value 128, you'll get an error.
42-
Python 3 eliminates much of this confusion by just raising an
43-
error unconditionally if a byte-string is used in a
44-
character-aware context.
45-
46-
There is one operation that Python *can* do with a non-ASCII
47-
byte-string, and it's a great source of confusion: it can dump the
48-
byte-string straight out to a stream or a file, with nary a care
49-
what the encoding is. To Python, this is pretty much like
50-
dumping any other kind of binary data (like an image) to a
51-
stream somewhere. In Python 2, it is common to see programs that
52-
embed all kinds of international characters and encodings into
53-
plain byte-strings (i.e. using ``"hello world"`` style literals)
54-
can fly right through their run, sending reams of strings out to
55-
wherever they are going, and the programmer, seeing the same
56-
output as was expressed in the input, is now under the illusion
57-
that his or her program is Unicode-compliant. In fact, their
58-
program has no unicode awareness whatsoever, and similarly has
59-
no ability to interact with libraries that *are* unicode aware.
60-
Python 3 makes this much less likely by defaulting to unicode as
61-
the storage format for strings.
62-
63-
The "pass through encoded data" scheme is what template
64-
languages like Cheetah and earlier versions of Myghty do by
65-
default. In Python 3 Mako only allows
66-
usage of native, unicode strings.
67-
687
In normal Mako operation, all parsed template constructs and
69-
output streams are handled internally as Python ``unicode``
70-
objects. It's only at the point of :meth:`~.Template.render` that this unicode
71-
stream may be rendered into whatever the desired output encoding
8+
output streams are handled internally as Python 3 ``str`` (Unicode)
9+
objects. It's only at the point of :meth:`~.Template.render` that this stream of Unicode objects may be rendered into whatever the desired output encoding
7210
is. The implication here is that the template developer must
7311
:ensure that :ref:`the encoding of all non-ASCII templates is explicit
74-
<set_template_file_encoding>` (still required in Python 3),
12+
<set_template_file_encoding>` (still required in Python 3, although Mako defaults to ``utf-8``),
7513
that :ref:`all non-ASCII-encoded expressions are in one way or another
7614
converted to unicode <handling_non_ascii_expressions>`
7715
(not much of a burden in Python 3), and that :ref:`the output stream of the
@@ -127,61 +65,44 @@ this:
12765

12866
looks something like this:
12967

130-
.. sourcecode:: python
131-
132-
context.write(unicode("hello world"))
133-
134-
In Python 3, it's just:
135-
13668
.. sourcecode:: python
13769

13870
context.write(str("hello world"))
13971

14072
That is, **the output of all expressions is run through the
141-
``unicode`` built-in**. This is the default setting, and can be
142-
modified to expect various encodings. The ``unicode`` step serves
73+
``str`` built-in**. This is the default setting, and can be
74+
modified to expect various encodings. The ``str`` step serves
14375
both the purpose of rendering non-string expressions into
14476
strings (such as integers or objects which contain ``__str()__``
14577
methods), and to ensure that the final output stream is
146-
constructed as a unicode object. The main implication of this is
78+
constructed as a Unicode object. The main implication of this is
14779
that **any raw byte-strings that contain an encoding other than
148-
ASCII must first be decoded to a Python unicode object**. It
149-
means you can't say this in Python 2:
150-
151-
.. sourcecode:: mako
152-
153-
${"voix m’a réveillé."} ## error in Python 2!
154-
155-
You must instead say this:
156-
157-
.. sourcecode:: mako
158-
159-
${u"voix m’a réveillé."} ## OK !
80+
ASCII must first be decoded to a Python unicode object**.
16081

16182
Similarly, if you are reading data from a file that is streaming
16283
bytes, or returning data from some object that is returning a
16384
Python byte-string containing a non-ASCII encoding, you have to
164-
explicitly decode to unicode first, such as:
85+
explicitly decode to Unicode first, such as:
16586

16687
.. sourcecode:: mako
16788

16889
${call_my_object().decode('utf-8')}
16990

17091
Note that filehandles acquired by ``open()`` in Python 3 default
171-
to returning "text", that is the decoding is done for you. See
92+
to returning "text": that is, the decoding is done for you. See
17293
Python 3's documentation for the ``open()`` built-in for details on
17394
this.
17495

17596
If you want a certain encoding applied to *all* expressions,
176-
override the ``unicode`` builtin with the ``decode`` built-in at the
97+
override the ``str`` builtin with the ``decode`` built-in at the
17798
:class:`.Template` or :class:`.TemplateLookup` level:
17899

179100
.. sourcecode:: python
180101

181102
t = Template(templatetext, default_filters=['decode.utf8'])
182103

183104
Note that the built-in ``decode`` object is slower than the
184-
``unicode`` function, since unlike ``unicode`` it's not a Python
105+
``str`` function, since unlike ``str`` it's not a Python
185106
built-in, and it also checks the type of the incoming data to
186107
determine if string conversion is needed first.
187108

@@ -194,7 +115,7 @@ in :ref:`filtering_default_filters`.
194115
Defining Output Encoding
195116
========================
196117

197-
Now that we have a template which produces a pure unicode output
118+
Now that we have a template which produces a pure Unicode output
198119
stream, all the hard work is done. We can take the output and do
199120
anything with it.
200121

@@ -218,7 +139,7 @@ encoding is specified. By default it performs no encoding and
218139
returns a native string.
219140

220141
:meth:`~.Template.render_unicode` will return the template output as a Python
221-
``unicode`` object (or ``string`` in Python 3):
142+
``str`` object:
222143

223144
.. sourcecode:: python
224145

@@ -230,21 +151,3 @@ you can encode yourself by saying:
230151
.. sourcecode:: python
231152

232153
print(mytemplate.render_unicode().encode('utf-8', 'replace'))
233-
234-
Buffer Selection
235-
----------------
236-
237-
Mako does play some games with the style of buffering used
238-
internally, to maximize performance. Since the buffer is by far
239-
the most heavily used object in a render operation, it's
240-
important!
241-
242-
When calling :meth:`~.Template.render` on a template that does not specify any
243-
output encoding (i.e. it's ``ascii``), Python's ``cStringIO`` module,
244-
which cannot handle encoding of non-ASCII ``unicode`` objects
245-
(even though it can send raw byte-strings through), is used for
246-
buffering. Otherwise, a custom Mako class called
247-
``FastEncodingBuffer`` is used, which essentially is a super
248-
dumbed-down version of ``StringIO`` that gathers all strings into
249-
a list and uses ``''.join(elements)`` to produce the final output
250-
-- it's markedly faster than ``StringIO``.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
.. change::
2+
:tags: py3k
3+
4+
With the removal of Python 2's ``cStringIO``, Mako now uses its own internal ``FastEncodingBuffer`` exclusively.

0 commit comments

Comments
 (0)