You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
.. note:: this chapter was written many years ago and is very Python-2
8
-
centric. As of Mako 1.1.3, the default template encoding is ``utf-8``.
9
-
10
-
The Python language supports two ways of representing what we
11
-
know as "strings", i.e. series of characters. In Python 2, the
12
-
two types are ``string`` and ``unicode``, and in Python 3 they are
13
-
``bytes`` and ``string``. A key aspect of the Python 2 ``string`` and
14
-
Python 3 ``bytes`` types are that they contain no information
15
-
regarding what **encoding** the data is stored in. For this
16
-
reason they were commonly referred to as **byte strings** on
17
-
Python 2, and Python 3 makes this name more explicit. The
18
-
origins of this come from Python's background of being developed
19
-
before the Unicode standard was even available, back when
20
-
strings were C-style strings and were just that, a series of
21
-
bytes. Strings that had only values below 128 just happened to
22
-
be **ASCII** strings and were printable on the console, whereas
23
-
strings with values above 128 would produce all kinds of
24
-
graphical characters and bells.
25
-
26
-
Contrast the "byte-string" type with the "unicode/string" type.
27
-
Objects of this latter type are created whenever you say something like
28
-
``u"hello world"`` (or in Python 3, just ``"hello world"``). In this
29
-
case, Python represents each character in the string internally
30
-
using multiple bytes per character (something similar to
31
-
UTF-16). What's important is that when using the
32
-
``unicode``/``string`` type to store strings, Python knows the
33
-
data's encoding; it's in its own internal format. Whereas when
34
-
using the ``string``/``bytes`` type, it does not.
35
-
36
-
When Python 2 attempts to treat a byte-string as a string, which
37
-
means it's attempting to compare/parse its characters, to coerce
38
-
it into another encoding, or to decode it to a unicode object,
39
-
it has to guess what the encoding is. In this case, it will
40
-
pretty much always guess the encoding as ``ascii``... and if the
41
-
byte-string contains bytes above value 128, you'll get an error.
42
-
Python 3 eliminates much of this confusion by just raising an
43
-
error unconditionally if a byte-string is used in a
44
-
character-aware context.
45
-
46
-
There is one operation that Python *can* do with a non-ASCII
47
-
byte-string, and it's a great source of confusion: it can dump the
48
-
byte-string straight out to a stream or a file, with nary a care
49
-
what the encoding is. To Python, this is pretty much like
50
-
dumping any other kind of binary data (like an image) to a
51
-
stream somewhere. In Python 2, it is common to see programs that
52
-
embed all kinds of international characters and encodings into
53
-
plain byte-strings (i.e. using ``"hello world"`` style literals)
54
-
can fly right through their run, sending reams of strings out to
55
-
wherever they are going, and the programmer, seeing the same
56
-
output as was expressed in the input, is now under the illusion
57
-
that his or her program is Unicode-compliant. In fact, their
58
-
program has no unicode awareness whatsoever, and similarly has
59
-
no ability to interact with libraries that *are* unicode aware.
60
-
Python 3 makes this much less likely by defaulting to unicode as
61
-
the storage format for strings.
62
-
63
-
The "pass through encoded data" scheme is what template
64
-
languages like Cheetah and earlier versions of Myghty do by
65
-
default. In Python 3 Mako only allows
66
-
usage of native, unicode strings.
67
-
68
7
In normal Mako operation, all parsed template constructs and
69
-
output streams are handled internally as Python ``unicode``
70
-
objects. It's only at the point of :meth:`~.Template.render` that this unicode
71
-
stream may be rendered into whatever the desired output encoding
8
+
output streams are handled internally as Python 3 ``str`` (Unicode)
9
+
objects. It's only at the point of :meth:`~.Template.render` that this stream of Unicode objects may be rendered into whatever the desired output encoding
72
10
is. The implication here is that the template developer must
73
11
:ensure that :ref:`the encoding of all non-ASCII templates is explicit
74
-
<set_template_file_encoding>` (still required in Python 3),
12
+
<set_template_file_encoding>` (still required in Python 3, although Mako defaults to ``utf-8``),
75
13
that :ref:`all non-ASCII-encoded expressions are in one way or another
76
14
converted to unicode <handling_non_ascii_expressions>`
77
15
(not much of a burden in Python 3), and that :ref:`the output stream of the
@@ -127,61 +65,44 @@ this:
127
65
128
66
looks something like this:
129
67
130
-
.. sourcecode:: python
131
-
132
-
context.write(unicode("hello world"))
133
-
134
-
In Python 3, it's just:
135
-
136
68
.. sourcecode:: python
137
69
138
70
context.write(str("hello world"))
139
71
140
72
That is, **the output of all expressions is run through the
141
-
``unicode`` built-in**. This is the default setting, and can be
142
-
modified to expect various encodings. The ``unicode`` step serves
73
+
``str`` built-in**. This is the default setting, and can be
74
+
modified to expect various encodings. The ``str`` step serves
143
75
both the purpose of rendering non-string expressions into
144
76
strings (such as integers or objects which contain ``__str()__``
145
77
methods), and to ensure that the final output stream is
146
-
constructed as a unicode object. The main implication of this is
78
+
constructed as a Unicode object. The main implication of this is
147
79
that **any raw byte-strings that contain an encoding other than
148
-
ASCII must first be decoded to a Python unicode object**. It
149
-
means you can't say this in Python 2:
150
-
151
-
.. sourcecode:: mako
152
-
153
-
${"voix m’a réveillé."} ## error in Python 2!
154
-
155
-
You must instead say this:
156
-
157
-
.. sourcecode:: mako
158
-
159
-
${u"voix m’a réveillé."} ## OK !
80
+
ASCII must first be decoded to a Python unicode object**.
160
81
161
82
Similarly, if you are reading data from a file that is streaming
162
83
bytes, or returning data from some object that is returning a
163
84
Python byte-string containing a non-ASCII encoding, you have to
164
-
explicitly decode to unicode first, such as:
85
+
explicitly decode to Unicode first, such as:
165
86
166
87
.. sourcecode:: mako
167
88
168
89
${call_my_object().decode('utf-8')}
169
90
170
91
Note that filehandles acquired by ``open()`` in Python 3 default
171
-
to returning "text", that is the decoding is done for you. See
92
+
to returning "text": that is, the decoding is done for you. See
172
93
Python 3's documentation for the ``open()`` built-in for details on
173
94
this.
174
95
175
96
If you want a certain encoding applied to *all* expressions,
176
-
override the ``unicode`` builtin with the ``decode`` built-in at the
97
+
override the ``str`` builtin with the ``decode`` built-in at the
177
98
:class:`.Template` or :class:`.TemplateLookup` level:
178
99
179
100
.. sourcecode:: python
180
101
181
102
t = Template(templatetext, default_filters=['decode.utf8'])
182
103
183
104
Note that the built-in ``decode`` object is slower than the
184
-
``unicode`` function, since unlike ``unicode`` it's not a Python
105
+
``str`` function, since unlike ``str`` it's not a Python
185
106
built-in, and it also checks the type of the incoming data to
186
107
determine if string conversion is needed first.
187
108
@@ -194,7 +115,7 @@ in :ref:`filtering_default_filters`.
194
115
Defining Output Encoding
195
116
========================
196
117
197
-
Now that we have a template which produces a pure unicode output
118
+
Now that we have a template which produces a pure Unicode output
198
119
stream, all the hard work is done. We can take the output and do
199
120
anything with it.
200
121
@@ -218,7 +139,7 @@ encoding is specified. By default it performs no encoding and
218
139
returns a native string.
219
140
220
141
:meth:`~.Template.render_unicode` will return the template output as a Python
221
-
``unicode`` object (or ``string`` in Python 3):
142
+
``str`` object:
222
143
223
144
.. sourcecode:: python
224
145
@@ -230,21 +151,3 @@ you can encode yourself by saying:
0 commit comments