@@ -1080,34 +1080,19 @@ Functions
10801080
10811081 Return the string obtained by replacing the leftmost non-overlapping occurrences
10821082 of *pattern * in *string * by the replacement *repl *. If the pattern isn't found,
1083- *string * is returned unchanged. *repl * can be a string or a function; if it is
1084- a string, any backslash escapes in it are processed. That is, ``\n `` is
1085- converted to a single newline character, ``\r `` is converted to a carriage return, and
1086- so forth. Unknown escapes of ASCII letters are reserved for future use and
1087- treated as errors. Other unknown escapes such as ``\& `` are left alone.
1088- Backreferences, such
1089- as ``\6 ``, are replaced with the substring matched by group 6 in the pattern.
1090- For example::
1091-
1092- >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
1093- ... r'static PyObject*\npy_\1(void)\n{',
1094- ... 'def myfunc():')
1095- 'static PyObject*\npy_myfunc(void)\n{'
1096-
1097- If *repl * is a function, it is called for every non-overlapping occurrence of
1098- *pattern *. The function takes a single :class: `~re.Match ` argument, and returns
1099- the replacement string. For example::
1083+ *string * is returned unchanged.
1084+ The pattern may be a string or a :class: `~re.Pattern `.
1085+ A string pattern's behaviour may be modified by specifying a *flags * value,
1086+ which can be any of the `flags `_ variables, combined using bitwise OR
1087+ (the ``| `` operator).
11001088
1101- >>> def dashrepl(matchobj):
1102- ... if matchobj.group(0) == '-': return ' '
1103- ... else: return '-'
1104- ...
1105- >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
1106- 'pro--gram files'
1107- >>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE)
1108- 'Baked Beans & Spam'
1089+ >>> re.sub(r ' ( and) ' , r ' * \1 * ' , ' Contraband Andalusian Beans AND Spam' ,
1090+ ... flags= re.IGNORECASE )
1091+ 'Contrab*and* *And*alusian Beans *AND* Spam'
11091092
1110- The pattern may be a string or a :class: `~re.Pattern `.
1093+ >>> pattern = re.compile(r ' ( and) ' , flags = re.IGNORECASE )
1094+ >>> re.sub(pattern, r ' * \1 * ' , ' Contraband Andalusian Beans AND Spam' )
1095+ 'Contrab*and* *And*alusian Beans *AND* Spam'
11111096
11121097 The optional argument *count * is the maximum number of pattern occurrences to be
11131098 replaced; *count * must be a non-negative integer. If omitted or zero, all
@@ -1118,21 +1103,51 @@ Functions
11181103 As a result, ``sub('x*', '-', 'abxd') `` returns ``'-a-b--d-' ``
11191104 instead of ``'-a-b-d-' ``.
11201105
1121- .. index :: single: \g; in regular expressions
1122-
1123- In string-type *repl * arguments, in addition to the character escapes and
1124- backreferences described above,
1125- ``\g<name> `` will use the substring matched by the group named ``name ``, as
1126- defined by the ``(?P<name>...) `` syntax. ``\g<number> `` uses the corresponding
1127- group number; ``\g<2> `` is therefore equivalent to ``\2 ``, but isn't ambiguous
1128- in a replacement such as ``\g<2>0 ``. ``\20 `` would be interpreted as a
1129- reference to group 20, not a reference to group 2 followed by the literal
1130- character ``'0' ``. The backreference ``\g<0> `` substitutes in the entire
1131- substring matched by the RE.
1132-
1133- The expression's behaviour can be modified by specifying a *flags * value.
1134- Values can be any of the `flags `_ variables, combined using bitwise OR
1135- (the ``| `` operator).
1106+ *repl * can be a string template or a function:
1107+
1108+ * If it is callable, it is called for every non-overlapping occurrence of
1109+ *pattern *. The function takes a single :class: `~re.Match ` argument, and
1110+ returns the replacement string. For example::
1111+
1112+ >>> def dashrepl(matchobj):
1113+ ... if matchobj.group(0) == '-': return ' '
1114+ ... else: return '-'
1115+ ...
1116+ >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files')
1117+ 'pro--gram files'
1118+
1119+ * If *repl * is a string, it's processed as a template based on backslash escapes:
1120+
1121+ .. index :: single: \g; in regular expressions
1122+
1123+ - ``\1 `` .. ``\99 `` are replaced by the substring matched by corresponding
1124+ ``(...) `` groups in the pattern.
1125+ - However other ``\numbers `` get interpretted as *octal * character literals.
1126+ - ``\g<name> `` are replaced by the substring matched by named ``(?P<name>...) ``
1127+ groups.
1128+ - ``\g<number> `` is another way to refer to numbered groups.
1129+ ``\g<2>0 `` inserts group 2 followed by the literal character ``'0' ``,
1130+ whereas ``\20 `` can only express a reference to group 20. ``\g<100> `` etc.
1131+ can refer to groups higher than 99, and the backreference ``\g<0> ``
1132+ substitutes in the entire substring matched by the RE.
1133+ - ``\\ `` is converted to a single backslash.
1134+ - Basic escapes ``\n\r\t\v\f\a\b `` work like in Python string literals.
1135+ That is, ``\n `` is converted to a single newline character, and so forth.
1136+ - Unknown escapes of ASCII letters are reserved for future use and
1137+ treated as errors. This includes ``\x.. ``, ``\u... ``, ``\U... `` and
1138+ ``\N{...} `` which are not presently supported.
1139+ - Other unknown escapes such as ``\& `` are left alone.
1140+
1141+ For example::
1142+
1143+ >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):',
1144+ ... r'static PyObject*\npy_\1(void)\n{',
1145+ ... 'def myfunc():')
1146+ 'static PyObject*\npy_myfunc(void)\n{'
1147+
1148+ (Note the use of raw string notation for *repl * as well. Otherwise you'd have
1149+ to write ``'\\1' `` for Python to parse it into ``\1 `` to be replaced by
1150+ ``myfunc `` at substitution time...)
11361151
11371152 .. versionchanged :: 3.1
11381153 Added the optional flags argument.
0 commit comments