mirror of
https://github.com/fofolee/uTools-Manuals.git
synced 2025-06-08 06:55:36 +08:00
253 lines
166 KiB
HTML
253 lines
166 KiB
HTML
<div class="body" role="main"><div class="section" id="module-re"><h1><span class="yiyi-st" id="yiyi-10">6.2. <a class="reference internal" href="#module-re" title="re: Regular expression operations."><code class="xref py py-mod docutils literal"><span class="pre">re</span></code></a> — 正则表达式</span></h1><p><span class="yiyi-st" id="yiyi-11"><strong>源代码:</strong> <a class="reference external" href="https://hg.python.org/cpython/file/3.5/Lib/re.py">Lib/re.py</a></span></p><p><span class="yiyi-st" id="yiyi-12">此模块提供与Perl中类似的正则表达式匹配操作。</span></p><p><span class="yiyi-st" id="yiyi-13">被搜索的模式和字符串都可以是Unicode字符串以及8比特字符串。</span><span class="yiyi-st" id="yiyi-14">然而,在匹配时Unicode字符串和8-bit字符串不能混在一起:这是因为,你不能使用字节模式的去匹配Unicode字符串,反之亦然;相似的,当你准备做替换操作时,替换的字符串或模式也一定要和即将被替换的字符串或模式使用一样的类型</span></p><p><span class="yiyi-st" id="yiyi-15">正则表达式使用反斜杠字符(<code class="docutils literal"><span class="pre">'\'</span></code>)表示特殊形式或允许使用特殊字符,而不调用其特殊含义。</span><span class="yiyi-st" id="yiyi-16">这与Python在字符串文字中用于相同目的的相同字符的使用相冲突;例如,为了匹配字面值反斜杠,可能必须将<code class="docutils literal"><span class="pre">'\\\\'</span></code>写为模式字符串,因为正则表达式必须是<code class="docutils literal"><span class="pre">\\</span></code>,每个反斜杠必须在常规Python字符串字面值内表示为<code class="docutils literal"><span class="pre">\\</span></code>。</span></p><p><span class="yiyi-st" id="yiyi-17">解决上面这种繁琐的处理方法是使用Python原始字符串符号的正则表达式模式;不以任何特殊方式在字符串字面值中以<code class="docutils literal"><span class="pre">'r'</span></code>前缀处理反斜杠。</span><span class="yiyi-st" id="yiyi-18">所以<code class="docutils literal"><span class="pre">r"\n"</span></code>包含 <code class="docutils literal"><span class="pre">'\'</span></code>和<code class="docutils literal"><span class="pre">'n'</span></code>两个字符, 但是<code class="docutils literal"><span class="pre">"\n"</span></code>只表示一个字符(即换行符)。</span><span class="yiyi-st" id="yiyi-19">Python代码中,模式通常被表示为这种raw字符串。</span></p><p><span class="yiyi-st" id="yiyi-20">值得一提的是,大多数正则表达式操作都可用作<a class="reference internal" href="#re-objects"><span>编译的正则表达式</span></a>上的模块级函数和方法。</span><span class="yiyi-st" id="yiyi-21">这些函数是快捷方式,不需要先编译正则表达式对象,但缺少一些微调参数。</span></p><div class="section" id="regular-expression-syntax"><h2><span class="yiyi-st" id="yiyi-22">6.2.1.</span><span class="yiyi-st" id="yiyi-23">正则表达式语法</span></h2><p><span class="yiyi-st" id="yiyi-24">正则表达式 (或 RE) 指定一组字符串匹配它;在此模块中的函数,可检查特定字符串是否匹配给定的正则表达式 (或给定的正则表达式匹配特定的字符串,两个说法是一回事)。</span></p><p><span class="yiyi-st" id="yiyi-25">多个正则表达式可以连起来,形成新的正则表达式; 若<em>A</em> 、 <em>B</em> 都是正则表达式, 则 <em>AB</em> 也是正则表达式。</span><span class="yiyi-st" id="yiyi-26">通常,若字符串 <em>p</em> 匹配 <em>A</em> ,且另一字符串 <em>q</em> 匹配 <em>B</em>, 那么字符串 <em>pq</em> 将会匹配 AB。 </span><span class="yiyi-st" id="yiyi-27">除非:<em>A</em>或者<em>B</em>含有低优先级的操作; 或 <em>A</em>和<em>B</em>之间有边界条件;或者有被数组引用。</span><span class="yiyi-st" id="yiyi-28">因此,复杂的表达式可以很容易地从简单的基本表达式构建,就像这里描述的那样。</span><span class="yiyi-st" id="yiyi-29">有关正则表达式的理论和实现的详细信息,请参阅上面引用的Friedl书或几乎所有关于编译器构造的教科书。</span></p><p><span class="yiyi-st" id="yiyi-30">正则表达式格式的简要说明如下。</span><span class="yiyi-st" id="yiyi-31">更多信息和更优雅的展示,请参考<a class="reference internal" href="../howto/regex.html#regex-howto"><span>Regular Expression HOWTO</span></a>.</span></p><p><span class="yiyi-st" id="yiyi-32">正则表达式可以包含特殊和普通字符。</span><span class="yiyi-st" id="yiyi-33">最普通的字符,如 <code class="docutils literal"><span class="pre">'A'</span></code>,<code class="docutils literal"><span class="pre">'a'</span></code>,或 <code class="docutils literal"><span class="pre">'0'</span></code>,是最简单的正则表达式;他们只是与自己相匹配。</span><span class="yiyi-st" id="yiyi-34">你可以连接普通的字符, 所以 <code class="docutils literal"><span class="pre">last</span></code>匹配字符串<code class="docutils literal"><span class="pre">'last'</span></code>.</span><span class="yiyi-st" id="yiyi-35">(在本章剩余部分,我们将会用<code class="docutils literal"><span class="pre">this</span> <span class="pre">special</span> <span class="pre">style</span></code>写正则表达式, 通常不用引号, 并且被匹配的字符串<code class="docutils literal"><span class="pre">'in</span> <span class="pre">single</span> <span class="pre">quotes'</span></code>.)</span></p><p><span class="yiyi-st" id="yiyi-36">某些字符, 像 <code class="docutils literal"><span class="pre">'|'</span></code>或者<code class="docutils literal"><span class="pre">'('</span></code>具有特殊含义。</span><span class="yiyi-st" id="yiyi-37">特殊字符或者表示普通字符类,或者影响正则表达式如何解释。</span><span class="yiyi-st" id="yiyi-38">正则表达式模式字符串可能不含有null字节,但是可以用<code class="docutils literal"><span class="pre">\number</span></code>符号指定空字节,例如<code class="docutils literal"><span class="pre">'\x00'</span></code>.</span></p><p><span class="yiyi-st" id="yiyi-39">特殊字符是:</span></p><dl class="docutils"><dt><span class="yiyi-st" id="yiyi-40"><code class="docutils literal"><span class="pre">'.'</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-41">(点。)</span><span class="yiyi-st" id="yiyi-42">默认模式下,匹配换行符以外的任何字符。</span><span class="yiyi-st" id="yiyi-43">若 <a class="reference internal" href="#re.DOTALL" title="re.DOTALL"><code class="xref py py-const docutils literal"><span class="pre">DOTALL</span></code></a> 标志被指定,则它匹配换行符在内的任何字符。</span></dd><dt><span class="yiyi-st" id="yiyi-44"><code class="docutils literal"><span class="pre">'^'</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-45">(尖)。</span><span class="yiyi-st" id="yiyi-46">从字符串的开始匹配, 在 <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal"><span class="pre">MULTILINE</span></code></a> 模式下每个换行符后面立即开始匹配。</span></dd><dt><span class="yiyi-st" id="yiyi-47"><code class="docutils literal"><span class="pre">'$'</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-48">匹配字符串的结尾或只是之前换行符结尾的字符串,并在 <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal"><span class="pre">多行</span></code></a> 模式下也匹配在换行符之前。</span><span class="yiyi-st" id="yiyi-49"><code class="docutils literal"><span class="pre">foo</span></code>匹配'foo'和'foobar',而正则表达式<code class="docutils literal"><span class="pre">foo$</span></code>仅匹配'foo'。</span><span class="yiyi-st" id="yiyi-50">更有趣的是,searching for <code class="docutils literal"><span class="pre">foo.$</span></code> in <code class="docutils literal"><span class="pre">'foo1\nfoo2\n'</span></code> matches ‘foo2’ normally, 但 ‘foo1’ 在 <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal"><span class="pre">MULTILINE</span></code></a> 模式下被匹配; searching for a single <code class="docutils literal"><span class="pre">$</span></code> in <code class="docutils literal"><span class="pre">'foo\n'</span></code> will find two (empty) matches: one just before the newline, and one at the end of the string.</span></dd><dt><span class="yiyi-st" id="yiyi-51"><code class="docutils literal"><span class="pre">'*'</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-52">使得到的RE匹配前面的RE的0个或更多个重复,尽可能多的重复。</span><span class="yiyi-st" id="yiyi-53"><code class="docutils literal"><span class="pre">ab*</span></code>将匹配'a','ab'或'a'后跟任意数量的'b'。</span></dd><dt><span class="yiyi-st" id="yiyi-54"><code class="docutils literal"><span class="pre">'+'</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-55">匹配前面重复出现的正则表达式1次或多次,尽可能多的匹配。</span><span class="yiyi-st" id="yiyi-56"><code class="docutils literal"><span class="pre">ab+</span></code>将匹配'a',后跟不少于一个的'b';它将不匹配只是'a'。</span></dd><dt><span class="yiyi-st" id="yiyi-57"><code class="docutils literal"><span class="pre">'?'</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-58">使得到的RE匹配前面的RE的0或1个重复。</span><span class="yiyi-st" id="yiyi-59"><code class="docutils literal"><span class="pre">ab?</span></code></span><span class="yiyi-st" id="yiyi-60">将匹配'a'或'ab'。</span></dd><dt><span class="yiyi-st" id="yiyi-61"><code class="docutils literal"><span class="pre">*?</span></code>,<code class="docutils literal"><span class="pre">+?</span></code>,<code class="docutils literal"><span class="pre">??</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-62"><code class="docutils literal"><span class="pre">'*'</span></code>,<code class="docutils literal"><span class="pre">'+'</span></code>和<code class="docutils literal"><span class="pre">'?'</span></code></span><span class="yiyi-st" id="yiyi-63">都是 <em class="dfn">贪婪模式</em>它们尽可能匹配多次。</span><span class="yiyi-st" id="yiyi-64">有时我们不希望匹配多次;如果<code class="docutils literal"><span class="pre"><.*></span></code>用来匹配 <code class="docutils literal"><span class="pre"><a></span> <span class="pre">b</span> <span class="pre"><c></span></code>, 它将会匹配所有字符串, 不只是<code class="docutils literal"><span class="pre"><a></span></code>.</span><span class="yiyi-st" id="yiyi-65">加上<code class="docutils literal"><span class="pre">?</span></code></span><span class="yiyi-st" id="yiyi-66">限定符将使得匹配为<em class="dfn">非贪婪模式</em>或者<em class="dfn">minimal</em>匹配; <em>few</em>尽可能少的字符被匹配。</span><span class="yiyi-st" id="yiyi-67">使用<code class="docutils literal"><span class="pre"><.*?></span></code>将会仅匹配<code class="docutils literal"><span class="pre"><a></span></code>.</span></dd><dt><span class="yiyi-st" id="yiyi-68"><code class="docutils literal"><span class="pre">{m}</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-69">精确的指定RE应该被匹配<em>m</em>次;少于m次将导致RE不会被匹配上。</span><span class="yiyi-st" id="yiyi-70">例如, <code class="docutils literal"><span class="pre">a{6}</span></code>将会精确匹配<code class="docutils literal"><span class="pre">'a'</span></code>字符6次,五次将不会被匹配。</span></dd><dt><span class="yiyi-st" id="yiyi-71"><code class="docutils literal"><span class="pre">{m,n}</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-72">使得到的RE匹配前面的RE的<em>m</em>到<em>n</em>重复,尝试匹配尽可能多的重复。</span><span class="yiyi-st" id="yiyi-73">For example, <code class="docutils literal"><span class="pre">a{3,5}</span></code> will match from 3 to 5 <code class="docutils literal"><span class="pre">'a'</span></code> characters. </span><span class="yiyi-st" id="yiyi-74">省略<em>m</em>指定零的下限,省略<em>n</em>指定无限上限。</span><span class="yiyi-st" id="yiyi-75">例如,<code class="docutils literal"><span class="pre">a{4,}b</span></code>将匹配<code class="docutils literal"><span class="pre">aaaab</span></code>以及1千个<code class="docutils literal"><span class="pre">'a'</span></code>字符后面跟随一个<code class="docutils literal"><span class="pre">b</span></code>,但不能匹配<code class="docutils literal"><span class="pre">aaab</span></code>。</span><span class="yiyi-st" id="yiyi-76">逗号不能省略,否则修饰符会与前面描述的形式混淆。</span></dd><dt><span class="yiyi-st" id="yiyi-77"><code class="docutils literal"><span class="pre">{m,n}?</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-78">使得到的RE从前面的RE的<em>m</em>到<em>n</em>重复匹配,尝试尽可能匹配<em>少数</em>重复。</span><span class="yiyi-st" id="yiyi-79">这是以前限定符的非贪婪版本。</span><span class="yiyi-st" id="yiyi-80">例如,对于6个字符的字符串<code class="docutils literal"><span class="pre">'aaaaaa'</span></code>,<code class="docutils literal"><span class="pre">a{3,5}</span></code>将匹配5个<code class="docutils literal"><span class="pre">'a'</span></code>字符,而 <code class="docutils literal"><span class="pre">a{3,5}?</span></code></span><span class="yiyi-st" id="yiyi-81">只会匹配3个字符。</span></dd><dt><span class="yiyi-st" id="yiyi-82"><code class="docutils literal"><span class="pre">'\'</span></code></span></dt><dd><p class="first"><span class="yiyi-st" id="yiyi-83">消除特殊字符含义(允许匹配像<code class="docutils literal"><span class="pre">'*'</span></code>, <code class="docutils literal"><span class="pre">'?'</span></code>,等特殊字符), 或者发出特殊序列信号;特殊序列接下来将会讨论。</span></p><p class="last"><span class="yiyi-st" id="yiyi-84">如果您没有使用原始字符串来表示模式,请记住Python也使用反斜杠作为字符串文本中的转义序列;如果转义序列不被Python的解析器识别,则反斜杠和后续字符将包含在结果字符串中。</span><span class="yiyi-st" id="yiyi-85">但是,如果Python能够识别结果序列,则反斜杠应重复两次。</span><span class="yiyi-st" id="yiyi-86">这很复杂,也很难理解,所以强烈建议您使用原始字符串,除了最简单的表达式。</span></p></dd><dt><span class="yiyi-st" id="yiyi-87"><code class="docutils literal"><span class="pre">[]</span></code></span></dt><dd><p class="first"><span class="yiyi-st" id="yiyi-88">用来表示一个字符集合。</span><span class="yiyi-st" id="yiyi-89">在这个集合中:</span></p><ul class="last simple"><li><span class="yiyi-st" id="yiyi-90">字符可以被单独罗列,例如:</span><span class="yiyi-st" id="yiyi-91"><code class="docutils literal"><span class="pre">[amk]</span></code> 会匹配 <code class="docutils literal"><span class="pre">'a'</span></code>, <code class="docutils literal"><span class="pre">'m'</span></code>, 或 <code class="docutils literal"><span class="pre">'k'</span></code>.</span></li><li><span class="yiyi-st" id="yiyi-92">字符范围可以表明通过给予两个字符和分离他们的 <code class="docutils literal"><span class="pre">'-'</span></code>、 例如 <code class="docutils literal"><span class="pre">[z]</span></code> 将匹配任何小写字母的 ASCII 字母、 <code class="docutils literal"><span class="pre">[0-5] [0-9]</span></code> 将匹配所有两位数数字从 <code class="docutils literal"><span class="pre">00</span></code> 到 <code class="docutils literal"><span class="pre">59</span></code>,和 <code class="docutils literal"><span class="pre">[0-9A-Fa-f]</span></code> 将都匹配任何十六进制数字。</span><span class="yiyi-st" id="yiyi-93">如果<code class="docutils literal"><span class="pre">-</span></code>被转义(例如,</span><span class="yiyi-st" id="yiyi-94"><code class="docutils literal"><span class="pre">[a\-z]</span></code>),或者将它放置为第一个或最后一个字符(例如,</span><span class="yiyi-st" id="yiyi-95"><code class="docutils literal"><span class="pre">[a-]</span></code>),它将匹配文字<code class="docutils literal"><span class="pre">'-'</span></code>。</span></li><li><span class="yiyi-st" id="yiyi-96">在集合内,特殊字符失去特殊意义。</span><span class="yiyi-st" id="yiyi-97">例如,<code class="docutils literal"><span class="pre">[(+*)]</span></code> 将匹配任何字符 <code class="docutils literal"><span class="pre">'('</span></code>,<code class="docutils literal"><span class="pre">'+'</span></code>,<code class="docutils literal"><span class="pre">'* '</span></code>,或 <code class="docutils literal"><span class="pre">'')''</span></code>。</span></li><li><span class="yiyi-st" id="yiyi-98">如<code class="docutils literal"><span class="pre">\w</span></code> or <code class="docutils literal"><span class="pre">\S</span></code>等字符类别也是可以被接受的(译者注:不会失去特殊意义),尽管匹配的这些字符取决于<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a> or <a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal"><span class="pre">LOCALE</span></code></a> 模式是否被设置。</span></li><li><span class="yiyi-st" id="yiyi-99">不在一个范围内的字符可以通过<em class="dfn">补充</em>该集合来匹配。</span><span class="yiyi-st" id="yiyi-100">如果这个集合的第一个字符是<code class="docutils literal"><span class="pre">'^'</span></code>, 那么所有<em>不</em>在集合内的将会被匹配上。</span><span class="yiyi-st" id="yiyi-101">例如, <code class="docutils literal"><span class="pre">[^5]</span></code>将会配对除 <code class="docutils literal"><span class="pre">'5'</span></code>以外的任何字符,并且和<code class="docutils literal"><span class="pre">[^^]</span></code>将会匹配除<code class="docutils literal"><span class="pre">'^'</span></code>以外的任何字符。</span><span class="yiyi-st" id="yiyi-102">如果<code class="docutils literal"><span class="pre">^</span></code>不在集合的第一个位置那么它将没有特殊意义。</span></li><li><span class="yiyi-st" id="yiyi-103">想要在一个集合内匹配<code class="docutils literal"><span class="pre">']'</span></code>,需要在它的前面使用一个反斜杠转义,或者在集合开头处将它替换。</span><span class="yiyi-st" id="yiyi-104">例如, <code class="docutils literal"><span class="pre">[()[\]{}]</span></code> and <code class="docutils literal"><span class="pre">[]()[{}]</span></code> 都将会匹配一对括号。</span></li></ul></dd><dt><span class="yiyi-st" id="yiyi-105"><code class="docutils literal"><span class="pre">'|'</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-106"><code class="docutils literal"><span class="pre">A|B</span></code>, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. </span><span class="yiyi-st" id="yiyi-107">以这种方式可以用<code class="docutils literal"><span class="pre">'|'</span></code>分隔任意数量的RE。</span><span class="yiyi-st" id="yiyi-108">这同样可以用在组里面。</span><span class="yiyi-st" id="yiyi-109">当目标字符串被扫描时,由<code class="docutils literal"><span class="pre">'|'</span></code>分隔的RE从左到右尝试。</span><span class="yiyi-st" id="yiyi-110">当一个模式完全匹配时,该分支被接受。</span><span class="yiyi-st" id="yiyi-111">这意味着一旦<code class="docutils literal"><span class="pre">A</span></code>匹配,<code class="docutils literal"><span class="pre">B</span></code>将不会被进一步测试,即使它会产生更长的整体匹配。</span><span class="yiyi-st" id="yiyi-112">换句话说,<code class="docutils literal"><span class="pre">'|'</span></code>运算符永远不会贪婪。</span><span class="yiyi-st" id="yiyi-113">要匹配文字<code class="docutils literal"><span class="pre">'|'</span></code>,请使用<code class="docutils literal"><span class="pre">\|</span></code>,或将其放在字符类中,如<code class="docutils literal"><span class="pre">[|]</span></code>中所示。</span></dd><dt><span class="yiyi-st" id="yiyi-114"><code class="docutils literal"><span class="pre">(...)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-115">匹配括号内的任何正则表达式,并指明组的开始和结束;可以在执行匹配之后检索组的内容,并且可以稍后在字符串中与<code class="docutils literal"><span class="pre">\number</span></code>特殊序列匹配,如下所述。</span><span class="yiyi-st" id="yiyi-116">匹配字面上的<code class="docutils literal"><span class="pre">'('</span></code> or <code class="docutils literal"><span class="pre">')'</span></code>, 使用 <code class="docutils literal"><span class="pre">\(</span></code> or <code class="docutils literal"><span class="pre">\)</span></code>, 或者把它们装入一个字符集中: <code class="docutils literal"><span class="pre">[(]</span> <span class="pre">[)]</span></code>.</span></dd><dt><span class="yiyi-st" id="yiyi-117"><code class="docutils literal"><span class="pre">(?...)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-118">这是一个扩展符号 (a <code class="docutils literal"><span class="pre">'?'</span></code></span><span class="yiyi-st" id="yiyi-119">following a <code class="docutils literal"><span class="pre">'('</span></code> 没有别的意义).</span><span class="yiyi-st" id="yiyi-120">在 <code class="docutils literal"><span class="pre">'?'</span></code>之后的第一个字符</span><span class="yiyi-st" id="yiyi-121">决定了意义和进一步的语法结构是什么.</span><span class="yiyi-st" id="yiyi-122">扩展通常不会创建一个新的组; <code class="docutils literal"><span class="pre">(?P<name>...)</span></code> 这个规则的唯一例外.</span><span class="yiyi-st" id="yiyi-123">以下是当前支持的扩展。</span></dd><dt><span class="yiyi-st" id="yiyi-124"><code class="docutils literal"><span class="pre">(?aiLmsux)</span></code></span></dt><dd><p class="first"><span class="yiyi-st" id="yiyi-125">(一个或多个字母来自集合<code class="docutils literal"><span class="pre">'a'</span></code>, <code class="docutils literal"><span class="pre">'i'</span></code>, <code class="docutils literal"><span class="pre">'L'</span></code>, <code class="docutils literal"><span class="pre">'m'</span></code>, <code class="docutils literal"><span class="pre">'s'</span></code>, <code class="docutils literal"><span class="pre">'u'</span></code>, <code class="docutils literal"><span class="pre">'x'</span></code>.)</span><span class="yiyi-st" id="yiyi-126">此组匹配空字符串;这些字母设定了相关的标识: <a class="reference internal" href="#re.A" title="re.A"><code class="xref py py-const docutils literal"><span class="pre">re.A</span></code></a> (仅匹配ASCII), <a class="reference internal" href="#re.I" title="re.I"><code class="xref py py-const docutils literal"><span class="pre">re.I</span></code></a> (不管大小写), <a class="reference internal" href="#re.L" title="re.L"><code class="xref py py-const docutils literal"><span class="pre">re.L</span></code></a> (locale dependent), <a class="reference internal" href="#re.M" title="re.M"><code class="xref py py-const docutils literal"><span class="pre">re.M</span></code></a> (多行), <a class="reference internal" href="#re.S" title="re.S"><code class="xref py py-const docutils literal"><span class="pre">re.S</span></code></a> (不匹配所有), and <a class="reference internal" href="#re.X" title="re.X"><code class="xref py py-const docutils literal"><span class="pre">re.X</span></code></a> (冗长的), 对于整个正则表达式。.</span><span class="yiyi-st" id="yiyi-127">(这些标志在<a class="reference internal" href="#contents-of-module-re"><span>Module Contents</span></a>中描述。)</span><span class="yiyi-st" id="yiyi-128">如果你希望标识也是正则表达式的一部分那么这就是有用的, 而不是把<em>flag</em> 参数丢进 <a class="reference internal" href="#re.compile" title="re.compile"><code class="xref py py-func docutils literal"><span class="pre">re.compile()</span></code></a> 函数。</span></p><p class="last"><span class="yiyi-st" id="yiyi-129">请注意,<code class="docutils literal"><span class="pre">(?x)</span></code>标志会更改表达式的解析方式。</span><span class="yiyi-st" id="yiyi-130">它应该首先在表达式字符串中使用,或者在一个或多个空白字符之后使用。</span><span class="yiyi-st" id="yiyi-131">如果在标志之前有非空白字符,结果是未定义的。</span></p></dd><dt><span class="yiyi-st" id="yiyi-132"><code class="docutils literal"><span class="pre">(?:...)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-133">括号正则的一个不捕获版本。</span><span class="yiyi-st" id="yiyi-134">匹配括号内的任何正则表达式,但匹配的子字符串<em>不能</em>在执行匹配后提取或在后面的模式中引用。</span></dd><dt><span class="yiyi-st" id="yiyi-135"><code class="docutils literal"><span class="pre">(?P<name>...)</span></code></span></dt><dd><p class="first"><span class="yiyi-st" id="yiyi-136">和正则括号相似, 但是这个组匹配到的子字符串可以通过符号组名称<em>name</em>进行访问.</span><span class="yiyi-st" id="yiyi-137">组名称必须是有效的Python标识符, 并且每个组名称在正则表达式中只能被定义一次(注:组名必须唯一).</span><span class="yiyi-st" id="yiyi-138">一个符号组也是一个带编号的组, 就好像这个组没有被命名一样.(注:除了原有的编号外再指定一个额外的别名).</span></p><p><span class="yiyi-st" id="yiyi-139">可以在三种上下文中引用已命名的组。</span><span class="yiyi-st" id="yiyi-140">如果模式是 <code class="docutils literal"><span class="pre">(?P<quote>['"]).*?(?P=quote)</span></code> (例如:</span><span class="yiyi-st" id="yiyi-141">使用单引号或双引号来匹配一个被引用的字符串):</span></p><table border="1" class="last docutils"><thead valign="bottom"><tr class="row-odd"><th class="head"><span class="yiyi-st" id="yiyi-142">引用组的参考内容</span></th><th class="head"><span class="yiyi-st" id="yiyi-143">参考方法</span></th></tr></thead><tbody valign="top"><tr class="row-even"><td><span class="yiyi-st" id="yiyi-144">在相同的模式本身</span></td><td><ul class="first last simple"><li><span class="yiyi-st" id="yiyi-145"><code class="docutils literal"><span class="pre">(?P=quote)</span></code> (as shown)</span></li><li><span class="yiyi-st" id="yiyi-146"><code class="docutils literal"><span class="pre">\1</span></code></span></li></ul></td></tr><tr class="row-odd"><td><span class="yiyi-st" id="yiyi-147">处理匹配对象<code class="docutils literal"><span class="pre">m</span></code>时</span></td><td><ul class="first last simple"><li><span class="yiyi-st" id="yiyi-148"><code class="docutils literal"><span class="pre">m.group('quote')</span></code></span></li><li><span class="yiyi-st" id="yiyi-149"><code class="docutils literal"><span class="pre">m.end('quote')</span></code> (etc.)</span></li></ul></td></tr><tr class="row-even"><td><span class="yiyi-st" id="yiyi-150">在传递给<code class="docutils literal"><span class="pre">re.sub()</span></code>的<code class="docutils literal"><span class="pre">repl</span></code>参数的字符串中</span></td><td><ul class="first last simple"><li><span class="yiyi-st" id="yiyi-151"><code class="docutils literal"><span class="pre">\g<quote></span></code></span></li><li><span class="yiyi-st" id="yiyi-152"><code class="docutils literal"><span class="pre">\g<1></span></code></span></li><li><span class="yiyi-st" id="yiyi-153"><code class="docutils literal"><span class="pre">\1</span></code></span></li></ul></td></tr></tbody></table></dd><dt><span class="yiyi-st" id="yiyi-154"><code class="docutils literal"><span class="pre">(?P=name)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-155">对指定组的反向引用;它匹配任何名为<em>name</em>的早期组匹配的文本。</span></dd><dt><span class="yiyi-st" id="yiyi-156"><code class="docutils literal"><span class="pre">(?#...)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-157">注释:忽略括号内的内容</span></dd><dt><span class="yiyi-st" id="yiyi-158"><code class="docutils literal"><span class="pre">(?=...)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-159">如果<code class="docutils literal"><span class="pre">...</span></code>匹配后面的内容则匹配,但不消耗字符串的任何字符。</span><span class="yiyi-st" id="yiyi-160">这被称为前瞻性断言。</span><span class="yiyi-st" id="yiyi-161">例如,<code class="docutils literal"><span class="pre">Isaac</span> <span class="pre">(?=Asimov)</span></code> 会匹配<code class="docutils literal"><span class="pre">'Isaac</span> <span class="pre">'</span></code>,当且仅当它后面是<code class="docutils literal"><span class="pre">'Asimov'</span></code>。</span></dd><dt><span class="yiyi-st" id="yiyi-162"><code class="docutils literal"><span class="pre">(?!...)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-163">如果<code class="docutils literal"><span class="pre">...</span></code>不匹配后面的内容则匹配。</span><span class="yiyi-st" id="yiyi-164">这是一个负前瞻性断言。</span><span class="yiyi-st" id="yiyi-165">例如,<code class="docutils literal"><span class="pre">Isaac</span> <span class="pre">(?!Asimov)</span></code>会匹配<code class="docutils literal"><span class="pre">'Isaac</span> <span class="pre">'</span></code>,当且仅当它后面<em>不</em>是<code class="docutils literal"><span class="pre">'Asimov'</span></code>。</span></dd><dt><span class="yiyi-st" id="yiyi-166"><code class="docutils literal"><span class="pre">(?<=...)</span></code></span></dt><dd><p class="first"><span class="yiyi-st" id="yiyi-167">匹配如果字符串中的当前位置在<code class="docutils literal"><span class="pre">...</span></code>之前匹配以当前位置结束。</span><span class="yiyi-st" id="yiyi-168">这被称为<em class="dfn">积极lookbehind断言</em>。</span><span class="yiyi-st" id="yiyi-169"><code class="docutils literal"><span class="pre">(?<=abc)def</span></code> will find a match in <code class="docutils literal"><span class="pre">abcdef</span></code>, since the lookbehind will back up 3 characters and check if the contained pattern matches. </span><span class="yiyi-st" id="yiyi-170">包含的模式只能匹配一些固定长度的字符串,这意味着允许<code class="docutils literal"><span class="pre">abc</span></code>或<code class="docutils literal"><span class="pre">a|b</span></code>,但<code class="docutils literal"><span class="pre">a*</span></code>和<code class="docutils literal"><span class="pre">a{3,4}</span></code>不是。</span><span class="yiyi-st" id="yiyi-171">请注意,以正向lookbehind断言开始的模式在搜索字符串的开头不匹配;你很可能想使用<a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal"><span class="pre">search()</span></code></a>函数而不是<a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-func docutils literal"><span class="pre">match()</span></code></a>函数:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">re</span>
|
||
<span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'(?<=abc)def'</span><span class="p">,</span> <span class="s1">'abcdef'</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||
<span class="go">'def'</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-172">本示例在连字符后面查找单词:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'(?<=-)\w+'</span><span class="p">,</span> <span class="s1">'spam-egg'</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
|
||
<span class="go">'egg'</span>
|
||
</code></pre><div class="last versionchanged"><p><span class="yiyi-st" id="yiyi-173"><span class="versionmodified">版本3.5中已更改:</span>添加了对固定长度的组引用的支持。</span></p></div></dd><dt><span class="yiyi-st" id="yiyi-174"><code class="docutils literal"><span class="pre">(?<!...)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-175">匹配如果字符串中的当前位置未匹配<code class="docutils literal"><span class="pre">...</span></code>的匹配项。</span><span class="yiyi-st" id="yiyi-176">这被称为<em class="dfn">否定lookbehind 断言</em>。</span><span class="yiyi-st" id="yiyi-177">与正向lookbehind断言类似,所包含的模式只能匹配某些固定长度的字符串。</span><span class="yiyi-st" id="yiyi-178">以反向lookbehind断言开头的模式可能会匹配搜索字符串的开头。</span></dd><dt><span class="yiyi-st" id="yiyi-179"><code class="docutils literal"><span class="pre">(?(id/name)yes-pattern|no-pattern)</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-180">如果具有给定 <em>id</em> 或 <em>name</em> 的组存在,将尝试匹配 <code class="docutils literal"><span class="pre">yes-pattern</span></code>,否则匹配 <code class="docutils literal"><span class="pre">no-pattern</span></code>。</span><span class="yiyi-st" id="yiyi-181"><code class="docutils literal"><span class="pre">no-pattern</span></code>是可选的,可以省略。</span><span class="yiyi-st" id="yiyi-182">例如,<code class="docutils literal"><span class="pre">(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)</span></code> 是一个糟糕的电子邮件匹配模式,这将匹配 <code class="docutils literal"><span class="pre">'<user@host.com>'</span></code> 以及 <code class="docutils literal"><span class="pre">'user@host.com'</span></code>, 但不是 <code class="docutils literal"><span class="pre">'<user@host.com'</span></code> 也不是 <code class="docutils literal"><span class="pre">'user@host.com>'</span></code>.</span></dd></dl><p><span class="yiyi-st" id="yiyi-183">特殊序列由<code class="docutils literal"><span class="pre">'\'</span></code>和下面列表中的字符组成。</span><span class="yiyi-st" id="yiyi-184">如果普通字符不在列表中,则所得到的RE将匹配第二个字符。</span><span class="yiyi-st" id="yiyi-185">例如,<code class="docutils literal"><span class="pre">\$</span></code>匹配字符<code class="docutils literal"><span class="pre">'$'</span></code>。</span></p><dl class="docutils"><dt><span class="yiyi-st" id="yiyi-186"><code class="docutils literal"><span class="pre">\number</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-187">匹配相对应组编号的内容。</span><span class="yiyi-st" id="yiyi-188">组号从1开始。</span><span class="yiyi-st" id="yiyi-189">例如,<code class="docutils literal"><span class="pre">(.+)</span> <span class="pre">\1</span></code> 匹配 <code class="docutils literal"><span class="pre">'the</span> <span class="pre">the'</span></code>或者 <code class="docutils literal"><span class="pre">'55</span> <span class="pre">55'</span></code>, 但是不会匹配<code class="docutils literal"><span class="pre">'thethe'</span></code> (注意在匹配上的组后有空格)。</span><span class="yiyi-st" id="yiyi-190">该特殊序列只能用于匹配前99个组中的一个。</span><span class="yiyi-st" id="yiyi-191">如果<em>number</em>的第一个数字为0或<em>number</em>为3个八进制数字,则不会将其解释为组匹配,而是八进制值<em>号 T2>。</em></span><span class="yiyi-st" id="yiyi-192">在字符类的<code class="docutils literal"><span class="pre">'['</span></code>和<code class="docutils literal"><span class="pre">']'</span></code>内部,所有数字转义都被视为字符。</span></dd><dt><span class="yiyi-st" id="yiyi-193"><code class="docutils literal"><span class="pre">\A</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-194">仅仅匹配字符串开头</span></dd><dt><span class="yiyi-st" id="yiyi-195"><code class="docutils literal"><span class="pre">\b</span></code></span></dt><dd><p class="first"><span class="yiyi-st" id="yiyi-196">在单词的开始或结束匹配空字符。</span><span class="yiyi-st" id="yiyi-197">这里的单词的定义是由Unicode字母数字或下划线组成的序列,因此单词的结束由空格、非字母数字或非下划线Unicode字符表示。</span><span class="yiyi-st" id="yiyi-198">注意,<code class="docutils literal"><span class="pre">\b</span></code>定义为<code class="docutils literal"><span class="pre">\w</span></code>和<code class="docutils literal"><span class="pre">\W</span></code>字符直接或<code class="docutils literal"><span class="pre">\w</span></code>和字符串开始/结束的边界。</span><span class="yiyi-st" id="yiyi-199">这意味着,<code class="docutils literal"><span class="pre">r'\bfoo\b'</span></code>匹配<code class="docutils literal"><span class="pre">'foo'</span></code>、<code class="docutils literal"><span class="pre">'foo.'</span></code>、 <code class="docutils literal"><span class="pre">'(foo)'</span></code>、<code class="docutils literal"><span class="pre">'bar</span> <span class="pre">foo</span> <span class="pre">baz'</span></code>,但不能匹配<code class="docutils literal"><span class="pre">'foobar'</span></code>或<code class="docutils literal"><span class="pre">'foo3'</span></code>。</span></p><p class="last"><span class="yiyi-st" id="yiyi-200">默认情况下,使用Unicode字母和数字,但可以通过使用<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a>标志来更改。</span><span class="yiyi-st" id="yiyi-201">在字符范围内,<code class="docutils literal"><span class="pre">\b</span></code>表示退格字符,以便与Python的字符串字面值兼容。</span></p></dd><dt><span class="yiyi-st" id="yiyi-202"><code class="docutils literal"><span class="pre">\B</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-203">匹配一个单词<em>不</em>在开头或结尾的字符。</span><span class="yiyi-st" id="yiyi-204">这意味着<code class="docutils literal"><span class="pre">r'py \ B'</span></code>与<code class="docutils literal"><span class="pre">'python'</span></code>,<code class="docutils literal"><span class="pre">'py3'</span></code>,<code class="docutils literal"><span class="pre">'py2'</span></code>而非<code class="docutils literal"><span class="pre">'py'</span></code>,<code class="docutils literal"><span class="pre">'py.'</span></code>或<code class="docutils literal"><span class="pre">'py!'</span></code>匹配。</span><span class="yiyi-st" id="yiyi-205"><code class="docutils literal"><span class="pre">\B</span></code>与<code class="docutils literal"><span class="pre">\b</span></code>恰恰相反,所以单词字符是Unicode字母数字或下划线,尽管可以通过使用<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a>旗。</span></dd><dt><span class="yiyi-st" id="yiyi-206"><code class="docutils literal"><span class="pre">\d</span></code></span></dt><dd><dl class="first last docutils"><dt><span class="yiyi-st" id="yiyi-207">对于Unicode(str)模式:</span></dt><dd><span class="yiyi-st" id="yiyi-208">匹配任何Unicode十进制数字(即,Unicode字符类别[Nd]中的任何字符)。</span><span class="yiyi-st" id="yiyi-209">这包括<code class="docutils literal"><span class="pre">[0-9]</span></code>以及许多其他数字字符。</span><span class="yiyi-st" id="yiyi-210">如果使用<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a>标志,则只匹配<code class="docutils literal"><span class="pre">[0-9]</span></code>(但该标志影响整个正则表达式,因此在这种情况下使用明确的<code class="docutils literal"><span class="pre">[0-9]</span></code>可能是更好的选择)。</span></dd><dt><span class="yiyi-st" id="yiyi-211">对于8位(字节)模式:</span></dt><dd><span class="yiyi-st" id="yiyi-212">匹配任何十进制数字;这相当于<code class="docutils literal"><span class="pre">[0-9]</span></code>。</span></dd></dl></dd><dt><span class="yiyi-st" id="yiyi-213"><code class="docutils literal"><span class="pre">\d</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-214">匹配任何不是Unicode十进制数字的字符。</span><span class="yiyi-st" id="yiyi-215">这与<code class="docutils literal"><span class="pre">\d</span></code>相反。</span><span class="yiyi-st" id="yiyi-216">如果使用<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a>标志,则这变成等效于<code class="docutils literal"><span class="pre">[^0-9]</span></code>(但该标志影响整个正则表达式,所以在这种情况下使用明确的<code class="docutils literal"><span class="pre">[^0-9]</span></code>可能是更好的选择)。</span></dd><dt><span class="yiyi-st" id="yiyi-217"><code class="docutils literal"><span class="pre">\s</span></code></span></dt><dd><dl class="first last docutils"><dt><span class="yiyi-st" id="yiyi-218">对于Unicode(str)模式:</span></dt><dd><span class="yiyi-st" id="yiyi-219">匹配Unicode空白字符(包括<code class="docutils literal"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code>以及许多其它字符,例如在许多语言中由排版规则强制不换行的空白)。</span><span class="yiyi-st" id="yiyi-220">如果使用<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a>标志,则只匹配<code class="docutils literal"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code>(该标志会影响整个正则表达式,所以在这种情况下,使用明确的<code class="docutils literal"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code>可能是更好的选择)。</span></dd><dt><span class="yiyi-st" id="yiyi-221">对于8比特(字节)模式:</span></dt><dd><span class="yiyi-st" id="yiyi-222">匹配ASCII字符集中空白的字符;相当于<code class="docutils literal"><span class="pre">[</span> <span class="pre">\t\n\r\f\v]</span></code>。</span></dd></dl></dd><dt><span class="yiyi-st" id="yiyi-223"><code class="docutils literal"><span class="pre">\s</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-224">匹配任何不是Unicode空白字符的字符。</span><span class="yiyi-st" id="yiyi-225">这与<code class="docutils literal"><span class="pre">\s</span></code>相反。</span><span class="yiyi-st" id="yiyi-226">如果启用<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a>标志,它变得等同于<code class="docutils literal"><span class="pre">[^</span> <span class="pre">\t\n\r\f\v]</span></code>(但是这个标志影响整个正则表达式,所以在这种情况下使用显式的<code class="docutils literal"><span class="pre">[^</span> <span class="pre">\t\n\r\f\v]</span></code>可能是一个更好的选择)。</span></dd><dt><span class="yiyi-st" id="yiyi-227"><code class="docutils literal"><span class="pre">\w</span></code></span></dt><dd><dl class="first last docutils"><dt><span class="yiyi-st" id="yiyi-228">对于Unicode(str)模式:</span></dt><dd><span class="yiyi-st" id="yiyi-229">匹配Unicode字符;这包括大多数可以是任何语言的单词的一部分的字符,以及数字和下划线。</span><span class="yiyi-st" id="yiyi-230">If the <a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a> flag is used, only <code class="docutils literal"><span class="pre">[a-zA-Z0-9_]</span></code> is matched (but the flag affects the entire regular expression, so in such cases using an explicit <code class="docutils literal"><span class="pre">[a-zA-Z0-9_]</span></code> may be a better choice).</span></dd><dt><span class="yiyi-st" id="yiyi-231">对于8位(字节)模式:</span></dt><dd><span class="yiyi-st" id="yiyi-232">匹配ASCII字符集中被认为是字母数字的字符;这相当于<code class="docutils literal"><span class="pre">[a-zA-Z0-9_]</span></code>。</span></dd></dl></dd><dt><span class="yiyi-st" id="yiyi-233"><code class="docutils literal"><span class="pre">\w</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-234">匹配任何不是Unicode字符的字符。</span><span class="yiyi-st" id="yiyi-235">这与<code class="docutils literal"><span class="pre">\w</span></code>相反。</span><span class="yiyi-st" id="yiyi-236">如果使用<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">ASCII</span></code></a>标志,它将变成等同于<code class="docutils literal"><span class="pre">[^a-zA-Z0-9_]</span></code>(但标志会影响整个正则表达式,所以在这种情况下,使用明确的<code class="docutils literal"><span class="pre">[^a-zA-Z0-9_]</span></code>可能是更好的选择)。</span></dd><dt><span class="yiyi-st" id="yiyi-237"><code class="docutils literal"><span class="pre">\Z</span></code></span></dt><dd><span class="yiyi-st" id="yiyi-238">只匹配字符串的末尾。</span></dd></dl><p><span class="yiyi-st" id="yiyi-239">大多数由Python字符串文字支持的标准转义符也被正则表达式解析器接受:</span></p><pre><code class="language-python"><span></span>\<span class="n">a</span> \<span class="n">b</span> \<span class="n">f</span> \<span class="n">n</span>
|
||
\<span class="n">r</span> \<span class="n">t</span> \<span class="n">u</span> \<span class="n">U</span>
|
||
\<span class="n">v</span> \<span class="n">x</span> \\
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-240">(请注意,<code class="docutils literal"><span class="pre">\b</span></code>用于表示单词边界,仅在字符类内部表示“退格”。)</span></p><p><span class="yiyi-st" id="yiyi-241"><code class="docutils literal"><span class="pre">'\u'</span></code> and <code class="docutils literal"><span class="pre">'\U'</span></code> escape sequences are only recognized in Unicode patterns. </span><span class="yiyi-st" id="yiyi-242">在字节模式中,它们没有被专门处理。</span></p><p><span class="yiyi-st" id="yiyi-243">八进制逃生包括在一个有限的形式。</span><span class="yiyi-st" id="yiyi-244">如果第一个数字是0,或者如果有三个八进制数字,则它被认为是八进制转义。</span><span class="yiyi-st" id="yiyi-245">否则,它是一个组参考。</span><span class="yiyi-st" id="yiyi-246">至于字符串文字,八进制转义字符的长度总是最多三位数字。</span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-247"><span class="versionmodified">Changed in version 3.3: </span>The <code class="docutils literal"><span class="pre">'\u'</span></code> and <code class="docutils literal"><span class="pre">'\U'</span></code> escape sequences have been added.</span></p></div><div class="deprecated-removed"><p><span class="yiyi-st" id="yiyi-248"><span class="versionmodified">从版本3.5开始弃用,将在版本3.6中删除:</span>由<code class="docutils literal"><span class="pre">'\'</span></code>和ASCII字母组成的未知转义现在引发了弃用警告,并且在Python 3.6中将被禁止。</span></p></div><div class="admonition seealso"><p class="first admonition-title"><span class="yiyi-st" id="yiyi-249">也可以看看</span></p><dl class="last docutils"><dt><span class="yiyi-st" id="yiyi-250">掌握正则表达式</span></dt><dd><span class="yiyi-st" id="yiyi-251">由O'Reilly出版的由Jeffrey Friedl撰写的关于正则表达式的书。</span><span class="yiyi-st" id="yiyi-252">本书的第二版不再涵盖Python,但第一版涵盖了编写良好正则表达式模式的细节。</span></dd></dl></div></div><div class="section" id="module-contents"><h2><span class="yiyi-st" id="yiyi-253">6.2.2.</span><span class="yiyi-st" id="yiyi-254">模块内容</span></h2><p><span class="yiyi-st" id="yiyi-255">模块定义了几个函数、 常量和一个异常。</span><span class="yiyi-st" id="yiyi-256">某些函数是编译正则表达式全特性方法的简化版本。</span><span class="yiyi-st" id="yiyi-257">大多数复杂应用程序总是使用已编译的形式。</span></p><dl class="function"><dt id="re.compile"><span class="yiyi-st" id="yiyi-258"><code class="descclassname">re.</code><code class="descname">compile</code><span class="sig-paren">(</span><em>pattern</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-259">将正则表达式模式编译成正则表达式对象,其 <a class="reference internal" href="#re.regex.match" title="re.regex.match"><code class="xref py py-func docutils literal"><span class="pre">match()</span></code></a> 和 <a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-func docutils literal"><span class="pre">search()</span></code></a> 方法可用于匹配,描述如下。</span></p><p><span class="yiyi-st" id="yiyi-260">通过指定<em>flags</em> 的值,可以调整表达式的行为。</span><span class="yiyi-st" id="yiyi-261">值可以是任何以下变量, 以位运算 OR ( <code class="docutils literal"><span class="pre">|</span></code> 运算符)组合使用。</span></p><p><span class="yiyi-st" id="yiyi-262">下面两句</span></p><pre><code class="language-python"><span></span><span class="n">prog</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">pattern</span><span class="p">)</span>
|
||
<span class="n">result</span> <span class="o">=</span> <span class="n">prog</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">string</span><span class="p">)</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-263">等同于</span></p><pre><code class="language-python"><span></span><span class="n">result</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-264">但表达式在单个程序中多次使用时, 使用<a class="reference internal" href="#re.compile" title="re.compile"><code class="xref py py-func docutils literal"><span class="pre">re.compile()</span></code></a> 和保存生成的正则表达式对象重用效率更高。</span></p><div class="admonition note"><p class="first admonition-title"><span class="yiyi-st" id="yiyi-265">注意</span></p><p class="last"><span class="yiyi-st" id="yiyi-266">传递给<a class="reference internal" href="#re.compile" title="re.compile"><code class="xref py py-func docutils literal"><span class="pre">re.compile()</span></code></a>和模块级匹配函数的最新模式的编译版本被缓存,因此一次只使用少量正则表达式的程序不需要担心编译常用表达。</span></p></div></dd></dl><dl class="data"><dt id="re.A"><span class="yiyi-st" id="yiyi-267"><code class="descclassname">re.</code><code class="descname">A</code></span></dt><dt id="re.ASCII"><span class="yiyi-st" id="yiyi-268"><code class="descclassname">re.</code><code class="descname">ASCII</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-269">让 <code class="docutils literal"><span class="pre">\w</span></code>, <code class="docutils literal"><span class="pre">\W</span></code>, <code class="docutils literal"><span class="pre">\b</span></code>, <code class="docutils literal"><span class="pre">\B</span></code>, <code class="docutils literal"><span class="pre">\d</span></code>, <code class="docutils literal"><span class="pre">\D</span></code>, <code class="docutils literal"><span class="pre">\s</span></code> and <code class="docutils literal"><span class="pre">\S</span></code> 执行纯 ASCII 匹配,而不是全部Unicode匹配(译注:因为Unicode中存在多国文字页面,所以上述字符存在多个码点,作用相同而二进制不同)。</span><span class="yiyi-st" id="yiyi-270">这仅在Unicode模式下有意义,并在 byte 模式下被忽略。</span></p><p><span class="yiyi-st" id="yiyi-271">请注意,为了向后兼容,<code class="xref py py-const docutils literal"><span class="pre">re.U</span></code>标志仍然存在(以及它的同义词<code class="xref py py-const docutils literal"><span class="pre">re.UNICODE</span></code>及其嵌入的对应<code class="docutils literal"><span class="pre">(?u)</span></code></span></p></dd></dl><dl class="data"><dt id="re.DEBUG"><span class="yiyi-st" id="yiyi-272"><code class="descclassname">re.</code><code class="descname">DEBUG</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-273">显示关于编译表达式的调试信息</span></p></dd></dl><dl class="data"><dt id="re.I"><span class="yiyi-st" id="yiyi-274"><code class="descclassname">re.</code><code class="descname">I</code></span></dt><dt id="re.IGNORECASE"><span class="yiyi-st" id="yiyi-275"><code class="descclassname">re.</code><code class="descname">IGNORECASE</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-276">执行不区分大小写的匹配;如<code class="docutils literal"><span class="pre">[A-Z]</span></code>也会匹配小写字母。</span><span class="yiyi-st" id="yiyi-277">这不受当前语言环境的影响,对于Unicode字符能如愿地正常工作。</span></p></dd></dl><dl class="data"><dt id="re.L"><span class="yiyi-st" id="yiyi-278"><code class="descclassname">re.</code><code class="descname">L</code></span></dt><dt id="re.LOCALE"><span class="yiyi-st" id="yiyi-279"><code class="descclassname">re.</code><code class="descname">LOCALE</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-280">Make <code class="docutils literal"><span class="pre">\w</span></code>,<code class="docutils literal"><span class="pre">\W</span></code>,<code class="docutils literal"><span class="pre">\b</span></code>,<code class="docutils literal"><span class="pre">\B</span></code>,<code class="docutils literal"><span class="pre">\s</span></code>和<code class="docutils literal"><span class="pre">\S</span></code>取决于当前的语言环境。</span><span class="yiyi-st" id="yiyi-281">由于场所机制非常不可靠,因此不鼓励使用该标志,并且它一次只能处理一种“文化”。您应该使用Unicode匹配代替,这是Unicode 3(Unicode)(str)模式中的默认值。</span><span class="yiyi-st" id="yiyi-282">该标志仅在字节模式下才有意义。</span></p><div class="deprecated-removed"><p><span class="yiyi-st" id="yiyi-283"><span class="versionmodified">自版本3.5起弃用,将在版本3.6中删除:</span>弃用<a class="reference internal" href="#re.LOCALE" title="re.LOCALE"><code class="xref py py-const docutils literal"><span class="pre">re.LOCALE</span></code></a>的字符串模式或<a class="reference internal" href="#re.ASCII" title="re.ASCII"><code class="xref py py-const docutils literal"><span class="pre">re.ASCII</span></code></a>。</span></p></div></dd></dl><dl class="data"><dt id="re.M"><span class="yiyi-st" id="yiyi-284"><code class="descclassname">re.</code><code class="descname">M</code></span></dt><dt id="re.MULTILINE"><span class="yiyi-st" id="yiyi-285"><code class="descclassname">re.</code><code class="descname">MULTILINE</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-286">指定时,模式字符<code class="docutils literal"><span class="pre">'^'</span></code>匹配字符串的开头和每行的开始处(紧跟在每个换行符后面);和模式字符<code class="docutils literal"><span class="pre">'$'</span></code>匹配字符串的末尾和每行的末尾(紧接在每个换行符的前面)。</span><span class="yiyi-st" id="yiyi-287">默认情况下,<code class="docutils literal"><span class="pre">'^'</span></code>仅在字符串的开头匹配,而<code class="docutils literal"><span class="pre">'$'</span></code>仅在字符串的末尾和紧接换行符之前(如果有的话)匹配字符串的结尾。</span></p></dd></dl><dl class="data"><dt id="re.S"><span class="yiyi-st" id="yiyi-288"><code class="descclassname">re.</code><code class="descname">S</code></span></dt><dt id="re.DOTALL"><span class="yiyi-st" id="yiyi-289"><code class="descclassname">re.</code><code class="descname">DOTALL</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-290">制作<code class="docutils literal"><span class="pre">'.'</span></code></span><span class="yiyi-st" id="yiyi-291">特殊字符完全匹配任何字符,包括换行符;没有这个标志,<code class="docutils literal"><span class="pre">'.'</span></code></span><span class="yiyi-st" id="yiyi-292">会匹配除换行符之外的任何<em>。</em></span></p></dd></dl><dl class="data"><dt id="re.X"><span class="yiyi-st" id="yiyi-293"><code class="descclassname">re.</code><code class="descname">X</code></span></dt><dt id="re.VERBOSE"><span class="yiyi-st" id="yiyi-294"><code class="descclassname">re.</code><code class="descname">VERBOSE</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-295">该标志允许您编写正则表达式,通过允许您在视觉上分离模式的逻辑部分并添加注释,该正则表达式看起来更好,并且更易读。</span><span class="yiyi-st" id="yiyi-296">除非在字符类中或前面加上了未转义的反斜杠,否则模式内的空格将被忽略。</span><span class="yiyi-st" id="yiyi-297">当一行包含不在字符类中的<code class="docutils literal"><span class="pre">#</span></code>,并且前面没有未转义的反斜杠时,从最左边的这个<code class="docutils literal"><span class="pre">#</span></code>到行尾的所有字符都是忽略。</span></p><p><span class="yiyi-st" id="yiyi-298">这意味着匹配一个十进制数的下面两个正则表达式对象在功能上是相等的:</span></p><pre><code class="language-python"><span></span><span class="n">a</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">r"""\d + # the integral part</span>
|
||
<span class="s2"> \. # the decimal point</span>
|
||
<span class="s2"> \d * # some fractional digits"""</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">X</span><span class="p">)</span>
|
||
<span class="n">b</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">r"\d+\.\d*"</span><span class="p">)</span>
|
||
</code></pre></dd></dl><dl class="function"><dt id="re.search"><span class="yiyi-st" id="yiyi-299"><code class="descclassname">re.</code><code class="descname">search</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-300">顺序扫描 <em>string</em> ,寻找正则表达式 <em>pattern</em> 产生匹配的第一个位置,并返回相应的 <a class="reference internal" href="#match-objects"><span>match object</span></a>。 </span><span class="yiyi-st" id="yiyi-301">若string中没有任何位置能匹配该模式,则返回 <code class="docutils literal"><span class="pre">None</span></code> ;注意:这与在string某处找到一个零宽度匹配不同。 </span></p></dd></dl><dl class="function"><dt id="re.match"><span class="yiyi-st" id="yiyi-302"><code class="descclassname">re.</code><code class="descname">match</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-303">若在 <em>string</em> 起始位置的0个或多个字符匹配正则表达式 <em>pattern</em>,返回相应的 <a class="reference internal" href="#match-objects"><span>match object</span></a>。</span><span class="yiyi-st" id="yiyi-304">若该 string 不匹配该 pattern ,返回 <code class="docutils literal"><span class="pre">None</span></code> ; 注意,这与零宽度匹配不同。</span></p><p><span class="yiyi-st" id="yiyi-305">注意,即便在 <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal"><span class="pre">MULTILINE</span></code></a> 模式下, <a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-func docutils literal"><span class="pre">re.match()</span></code></a> 也只会匹配该 string 的开始位置,而不会匹配每一行的开始位置。</span></p><p><span class="yiyi-st" id="yiyi-306">如果要在<em>字符串</em>中的任何位置找到匹配项,请改为使用<a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal"><span class="pre">search()</span></code></a>(另请参阅<a class="reference internal" href="#search-vs-match"><span>search() vs. match()</span></a>)。</span></p></dd></dl><dl class="function"><dt id="re.fullmatch"><span class="yiyi-st" id="yiyi-307"><code class="descclassname">re.</code><code class="descname">fullmatch</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-308">如果整个<em>字符串</em>匹配正则表达式<em>pattern</em>,则返回相应的<a class="reference internal" href="#match-objects"><span>match object</span></a>。</span><span class="yiyi-st" id="yiyi-309">若该 string 不匹配该 pattern ,返回 <code class="docutils literal"><span class="pre">None</span></code> ; 注意,这与零宽度匹配不同。</span></p><div class="versionadded"><p><span class="yiyi-st" id="yiyi-310"><span class="versionmodified">版本3.4中的新功能。</span></span></p></div></dd></dl><dl class="function"><dt id="re.split"><span class="yiyi-st" id="yiyi-311"><code class="descclassname">re.</code><code class="descname">split</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>maxsplit=0</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-312">根据<em>pattern</em>的出现拆分<em>字符串</em>。</span><span class="yiyi-st" id="yiyi-313">如果在<em>pattern</em>中使用捕获括号,则模式中所有组的文本也会作为结果列表的一部分返回。</span><span class="yiyi-st" id="yiyi-314">如果<em>maxsplit</em>不为零,则至多出现<em>maxsplit</em>分裂,并且字符串的其余部分作为列表的最后一个元素返回。</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'\W+'</span><span class="p">,</span> <span class="s1">'Words, words, words.'</span><span class="p">)</span>
|
||
<span class="go">['Words', 'words', 'words', '']</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'(\W+)'</span><span class="p">,</span> <span class="s1">'Words, words, words.'</span><span class="p">)</span>
|
||
<span class="go">['Words', ', ', 'words', ', ', 'words', '.', '']</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'\W+'</span><span class="p">,</span> <span class="s1">'Words, words, words.'</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
|
||
<span class="go">['Words', 'words, words.']</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'[a-f]+'</span><span class="p">,</span> <span class="s1">'0a3B9'</span><span class="p">,</span> <span class="n">flags</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">IGNORECASE</span><span class="p">)</span>
|
||
<span class="go">['0', '3', '9']</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-315">如果分隔符中有捕获组,并且在字符串的开头匹配,则结果将以空字符串开头。</span><span class="yiyi-st" id="yiyi-316">对于字符串的末尾也同样如此:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'(\W+)'</span><span class="p">,</span> <span class="s1">'...words, words...'</span><span class="p">)</span>
|
||
<span class="go">['', '...', 'words', ', ', 'words', '...', '']</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-317">这样的话, 分割的组分也会在结果列表中存在。</span></p><div class="admonition note"><p class="first admonition-title"><span class="yiyi-st" id="yiyi-318">注意</span></p><p><span class="yiyi-st" id="yiyi-319"><a class="reference internal" href="#re.split" title="re.split"><code class="xref py py-func docutils literal"><span class="pre">split()</span></code></a> doesn’t currently split a string on an empty pattern match. </span><span class="yiyi-st" id="yiyi-320">例如:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">'x*'</span><span class="p">,</span> <span class="s1">'axbc'</span><span class="p">)</span>
|
||
<span class="go">['a', 'bc']</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-321">即使<code class="docutils literal"><span class="pre">'x*'</span></code>也与'a'之前,'b'和'c'之间以及'c'之后的0'x'匹配,当前这些匹配被忽略。</span><span class="yiyi-st" id="yiyi-322">正确的行为(即</span><span class="yiyi-st" id="yiyi-323">splitting on empty matches too and returning <code class="docutils literal"><span class="pre">['',</span> <span class="pre">'a',</span> <span class="pre">'b',</span> <span class="pre">'c',</span> <span class="pre">'']</span></code>) will be implemented in future versions of Python, but since this is a backward incompatible change, a <a class="reference internal" href="exceptions.html#FutureWarning" title="FutureWarning"><code class="xref py py-exc docutils literal"><span class="pre">FutureWarning</span></code></a> will be raised in the meanwhile.</span></p><p><span class="yiyi-st" id="yiyi-324">目前只能匹配空字符串的模式永远不会拆分字符串。</span><span class="yiyi-st" id="yiyi-325">由于这与预期的行为不匹配,因此将从Python 3.5开始引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a>异常:</span></p><div class="last highlight-python3"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"^$"</span><span class="p">,</span> <span class="s2">"foo</span><span class="se">\n\n</span><span class="s2">bar</span><span class="se">\n</span><span class="s2">"</span><span class="p">,</span> <span class="n">flags</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">M</span><span class="p">)</span>
|
||
<span class="gt">Traceback (most recent call last):</span>
|
||
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
|
||
<span class="c">...</span>
|
||
<span class="gr">ValueError</span>: <span class="n">split() requires a non-empty pattern match.</span>
|
||
</pre></div></div></div><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-326"><span class="versionmodified">在版本3.1中更改:</span>添加了可选标志参数。</span></p></div><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-327"><span class="versionmodified">在版本3.5中更改:</span>分割可能匹配空字符串的模式现在引发警告。</span><span class="yiyi-st" id="yiyi-328">现在只能匹配空字符串的模式被拒绝。</span></p></div></dd></dl><dl class="function"><dt id="re.findall"><span class="yiyi-st" id="yiyi-329"><code class="descclassname">re.</code><code class="descname">findall</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-330">返回<em>字符串</em>中<em>pattern</em>的所有非重叠匹配项作为字符串列表。</span><span class="yiyi-st" id="yiyi-331">The <em>string</em>是从左到右扫描的,所以匹配的内容是按照该顺序来的</span><span class="yiyi-st" id="yiyi-332">如果模式中存在一个或多个组,请返回组列表;如果模式有多个组,这将是一个元组列表。</span><span class="yiyi-st" id="yiyi-333">Return all non-overlapping matches of pattern in string, as a list of strings. The string是从左到右扫描的,所以匹配的内容是按照该顺序来的If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.</span></p></dd></dl><dl class="function"><dt id="re.finditer"><span class="yiyi-st" id="yiyi-334"><code class="descclassname">re.</code><code class="descname">finditer</code><span class="sig-paren">(</span><em>pattern</em>, <em>string</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-335">在<em>字符串</em>中的RE <em>模式</em>的所有非重叠匹配中返回产生<a class="reference internal" href="#match-objects"><span>match objects</span></a>的<a class="reference internal" href="../glossary.html#term-iterator"><span class="xref std std-term">iterator</span></a>。</span><span class="yiyi-st" id="yiyi-336">The <em>string</em>是从左到右扫描的,所以匹配的内容是按照该顺序来的</span><span class="yiyi-st" id="yiyi-337">Return all non-overlapping matches of pattern in string, as a list of strings. The string是从左到右扫描的,所以匹配的内容是按照该顺序来的If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.</span></p></dd></dl><dl class="function"><dt id="re.sub"><span class="yiyi-st" id="yiyi-338"><code class="descclassname">re.</code><code class="descname">sub</code><span class="sig-paren">(</span><em>pattern</em>, <em>repl</em>, <em>string</em>, <em>count=0</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-339">将<em>string</em>中最左侧非重叠出现的<em>pattern</em>替换为<em>repl</em>,返回所获得的字符串。</span><span class="yiyi-st" id="yiyi-340">如果未找到该模式,则<em>字符串</em>将保持不变。</span><span class="yiyi-st" id="yiyi-341"><em>repl</em> 可以是一个字符串或一个函数;如果是一个字符串, 则会处理每个反斜杠转义。</span><span class="yiyi-st" id="yiyi-342">即,<code class="docutils literal"><span class="pre">\n</span></code>被转换为单个换行符,<code class="docutils literal"><span class="pre">\r</span></code>被转换为回车符,依此类推。</span><span class="yiyi-st" id="yiyi-343">未知的转义如<code class="docutils literal"><span class="pre">\&</span></code>是单独存在的。</span><span class="yiyi-st" id="yiyi-344">反向引用(例如<code class="docutils literal"><span class="pre">\6</span></code>)被替换为模式中组6所匹配的子字符串。</span><span class="yiyi-st" id="yiyi-345">例如:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s1">r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):'</span><span class="p">,</span>
|
||
<span class="gp">... </span> <span class="s1">r'static PyObject*\npy_\1(void)\n{'</span><span class="p">,</span>
|
||
<span class="gp">... </span> <span class="s1">'def myfunc():'</span><span class="p">)</span>
|
||
<span class="go">'static PyObject*\npy_myfunc(void)\n{'</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-346">如果<em>repl</em>是一个函数,则会针对<em>pattern</em>的每个非重叠事件调用它。</span><span class="yiyi-st" id="yiyi-347">该函数采用单个匹配对象参数,并返回替换字符串。</span><span class="yiyi-st" id="yiyi-348">例如:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">dashrepl</span><span class="p">(</span><span class="n">matchobj</span><span class="p">):</span>
|
||
<span class="gp">... </span> <span class="k">if</span> <span class="n">matchobj</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="s1">'-'</span><span class="p">:</span> <span class="k">return</span> <span class="s1">' '</span>
|
||
<span class="gp">... </span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="s1">'-'</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s1">'-{1,2}'</span><span class="p">,</span> <span class="n">dashrepl</span><span class="p">,</span> <span class="s1">'pro----gram-files'</span><span class="p">)</span>
|
||
<span class="go">'pro--gram files'</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s1">r'\sAND\s'</span><span class="p">,</span> <span class="s1">' & '</span><span class="p">,</span> <span class="s1">'Baked Beans And Spam'</span><span class="p">,</span> <span class="n">flags</span><span class="o">=</span><span class="n">re</span><span class="o">.</span><span class="n">IGNORECASE</span><span class="p">)</span>
|
||
<span class="go">'Baked Beans & Spam'</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-349">pattern可以是一个字符串或一个RE对象。</span></p><p><span class="yiyi-st" id="yiyi-350">可选参数<em>count</em>是要替换的模式出现的最大数量;<em>count</em>必须是非负整数。</span><span class="yiyi-st" id="yiyi-351">如果省略或为零,则所有出现都将被替换。</span><span class="yiyi-st" id="yiyi-352">只有当不与上一个匹配相邻时,才替换模式的空匹配,因此<code class="docutils literal"><span class="pre">sub('x*',</span> <span class="pre">'-',</span> <span class="pre">'abc')</span></code>返回<code class="docutils literal"><span class="pre">'-a-b-c-'</span></code>。</span></p><p><span class="yiyi-st" id="yiyi-353">In string-type <em>repl</em> arguments, in addition to the character escapes and backreferences described above, <code class="docutils literal"><span class="pre">\g<name></span></code> will use the substring matched by the group named <code class="docutils literal"><span class="pre">name</span></code>, as defined by the <code class="docutils literal"><span class="pre">(?P<name>...)</span></code> syntax. </span><span class="yiyi-st" id="yiyi-354"><code class="docutils literal"><span class="pre">\g<number></span></code>使用相应的组号码;因此,<code class="docutils literal"><span class="pre">\g<2></span></code>等于<code class="docutils literal"><span class="pre">\2</span></code>,但是在诸如<code class="docutils literal"><span class="pre">\g<2>0</span></code>的替换中不是不明确的。</span><span class="yiyi-st" id="yiyi-355"><code class="docutils literal"><span class="pre">\20</span></code>将被解释为对组20的引用,而不是对组2的引用,后面是文字字符<code class="docutils literal"><span class="pre">'0'</span></code>。</span><span class="yiyi-st" id="yiyi-356">反向引用<code class="docutils literal"><span class="pre">\g<0></span></code>代替RE中匹配的整个子字符串。</span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-357"><span class="versionmodified">在版本3.1中更改:</span>添加了可选标志参数。</span></p></div><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-358"><span class="versionmodified">在版本3.5中更改:</span>将不匹配的组替换为空字符串。</span></p></div><div class="deprecated-removed"><p><span class="yiyi-st" id="yiyi-359"><span class="versionmodified">从版本3.5开始弃用,将在版本3.6中删除:</span>未知转义由<code class="docutils literal"><span class="pre">'\'</span></code>和ASCII字母组成,现在引发弃用警告,并且在Python 3.6中将被禁止。</span></p></div></dd></dl><dl class="function"><dt id="re.subn"><span class="yiyi-st" id="yiyi-360"><code class="descclassname">re.</code><code class="descname">subn</code><span class="sig-paren">(</span><em>pattern</em>, <em>repl</em>, <em>string</em>, <em>count=0</em>, <em>flags=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-361">执行与<a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal"><span class="pre">sub()</span></code></a>相同的操作,但返回一个元组<code class="docutils literal"><span class="pre">(new_string,</span> <span class="pre">number_of_subs_made)</span></code>。</span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-362"><span class="versionmodified">在版本3.1中更改:</span>添加了可选标志参数。</span></p></div><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-363"><span class="versionmodified">在版本3.5中更改:</span>将不匹配的组替换为空字符串。</span></p></div></dd></dl><dl class="function"><dt id="re.escape"><span class="yiyi-st" id="yiyi-364"><code class="descclassname">re.</code><code class="descname">escape</code><span class="sig-paren">(</span><em>string</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-365">转义模式中除ASCII字母、数字和<code class="docutils literal"><span class="pre">'_'</span></code>之外的所有字符。</span><span class="yiyi-st" id="yiyi-366">如果你想匹配任何可能具有正则表达式元字符的文本字符串,这非常有用。</span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-367"><span class="versionmodified">在版本3.3中更改:</span> <code class="docutils literal"><span class="pre">'_'</span></code>字符不再转义。</span></p></div></dd></dl><dl class="function"><dt id="re.purge"><span class="yiyi-st" id="yiyi-368"><code class="descclassname">re.</code><code class="descname">purge</code><span class="sig-paren">(</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-369">清除正则表达式缓存。</span></p></dd></dl><dl class="exception"><dt id="re.error"><span class="yiyi-st" id="yiyi-370"><em class="property">exception </em><code class="descclassname">re.</code><code class="descname">error</code><span class="sig-paren">(</span><em>msg</em>, <em>pattern=None</em>, <em>pos=None</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-371">当传递给其中一个函数的字符串不是有效的正则表达式(例如,它可能包含不匹配的圆括号)或编译或匹配过程中发生其他错误时引发的异常。</span><span class="yiyi-st" id="yiyi-372">如果一个字符串不包含匹配的模式,那永远不会出错。</span><span class="yiyi-st" id="yiyi-373">错误实例具有以下附加属性:</span></p><dl class="attribute"><dt id="re.error.msg"><span class="yiyi-st" id="yiyi-374"> <code class="descname">msg</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-375">未格式化的错误消息。</span></p></dd></dl><dl class="attribute"><dt id="re.error.pattern"><span class="yiyi-st" id="yiyi-376"><code class="descname">pattern</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-377">正则表达式模式。</span></p></dd></dl><dl class="attribute"><dt id="re.error.pos"><span class="yiyi-st" id="yiyi-378"> <code class="descname">pos</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-379">编译失败的<em>pattern</em>索引。</span></p></dd></dl><dl class="attribute"><dt id="re.error.lineno"><span class="yiyi-st" id="yiyi-380"> <code class="descname">lineno</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-381">与<em>pos</em>对应的行。</span></p></dd></dl><dl class="attribute"><dt id="re.error.colno"><span class="yiyi-st" id="yiyi-382"> <code class="descname">colno</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-383">对应于<em>pos</em>的列。</span></p></dd></dl><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-384"><span class="versionmodified">在版本3.5中更改:</span>添加了其他属性。</span></p></div></dd></dl></div><div class="section" id="regular-expression-objects"><h2><span class="yiyi-st" id="yiyi-385">6.2.3.</span><span class="yiyi-st" id="yiyi-386">正则表达式对象</span></h2><p><span class="yiyi-st" id="yiyi-387">已编译的正则表达式对象支持下列方法和属性︰</span></p><dl class="method"><dt id="re.regex.search"><span class="yiyi-st" id="yiyi-388"><code class="descclassname">regex.</code><code class="descname">search</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-389">扫描<em>string</em> 寻找正则表达式产生匹配后的第一个位置 , 然后返回一个相对应的 <a class="reference internal" href="#match-objects"><span>match object</span></a>.</span><span class="yiyi-st" id="yiyi-390">若string中没有任何位置能匹配该模式,则返回 <code class="docutils literal"><span class="pre">None</span></code> ;注意:这与在string某处找到一个零宽度匹配不同。 </span></p><p><span class="yiyi-st" id="yiyi-391">可选的第二个参数 <em>pos</em> 给索引在字符串中搜索在哪里开始;它默认为 <code class="docutils literal"><span class="pre">0</span></code>。</span><span class="yiyi-st" id="yiyi-392">这不完全等同于切分字符串; <code class="docutils literal"><span class="pre">'^'</span></code>模式字符匹配字符串的实际开始位置和紧跟在换行符后面的位置,但不一定位于搜索开始的索引位置。</span></p><p><span class="yiyi-st" id="yiyi-393">可选参数<em>endpos</em>限制了字符串搜索的距离;它就好像字符串是<em>endpos</em>个字符,所以只有从<em>pos</em>到<code class="docutils literal"><span class="pre">endpos</span> <span class="pre">的字符 - t5> <span class="pre">1</span></span></code>将被搜索匹配。</span><span class="yiyi-st" id="yiyi-394">If <em>endpos</em> is less than <em>pos</em>, no match will be found; otherwise, if <em>rx</em> is a compiled regular expression object, <code class="docutils literal"><span class="pre">rx.search(string,</span> <span class="pre">0,</span> <span class="pre">50)</span></code> is equivalent to <code class="docutils literal"><span class="pre">rx.search(string[:50],</span> <span class="pre">0)</span></code>.</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"d"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">)</span> <span class="c1"># Match at index 0</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(0, 1), match='d'></span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># No match; search doesn't include the "d"</span>
|
||
</code></pre></dd></dl><dl class="method"><dt id="re.regex.match"><span class="yiyi-st" id="yiyi-395"><code class="descclassname">regex.</code><code class="descname">match</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-396">如果<em>字符串</em>的<em>开始</em>处的零个或多个字符与此正则表达式匹配,则返回相应的<a class="reference internal" href="#match-objects"><span>match object</span></a>。</span><span class="yiyi-st" id="yiyi-397">如果返回<code class="docutils literal"><span class="pre"> None </span></code> ,则该字符串不匹配模式;注意这不同于一个零字节长度的匹配。</span></p><p><span class="yiyi-st" id="yiyi-398">可选的<em>pos</em>和<em>endpos</em>参数与<a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-meth docutils literal"><span class="pre">search()</span></code></a>方法的含义相同。</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"o"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">)</span> <span class="c1"># No match as "o" is not at the start of "dog".</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># Match as "o" is the 2nd character of "dog".</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(1, 2), match='o'></span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-399">如果要在<em>字符串</em>中的任何位置找到匹配项,请改为使用<a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-meth docutils literal"><span class="pre">search()</span></code></a>(另请参阅<a class="reference internal" href="#search-vs-match"><span>search() vs. match()</span></a>)。</span></p></dd></dl><dl class="method"><dt id="re.regex.fullmatch"><span class="yiyi-st" id="yiyi-400"><code class="descclassname">regex.</code><code class="descname">fullmatch</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-401">如果整个<em>字符串</em>与此正则表达式匹配,则返回相应的<a class="reference internal" href="#match-objects"><span>match object</span></a>。</span><span class="yiyi-st" id="yiyi-402">若该 string 不匹配该 pattern ,返回 <code class="docutils literal"><span class="pre">None</span></code> ; 注意,这与零宽度匹配不同。</span></p><p><span class="yiyi-st" id="yiyi-403">可选的<em>pos</em>和<em>endpos</em>参数与<a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-meth docutils literal"><span class="pre">search()</span></code></a>方法的含义相同。</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">pattern</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">"o[gh]"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">fullmatch</span><span class="p">(</span><span class="s2">"dog"</span><span class="p">)</span> <span class="c1"># No match as "o" is not at the start of "dog".</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">fullmatch</span><span class="p">(</span><span class="s2">"ogre"</span><span class="p">)</span> <span class="c1"># No match as not the full string matches.</span>
|
||
<span class="gp">>>> </span><span class="n">pattern</span><span class="o">.</span><span class="n">fullmatch</span><span class="p">(</span><span class="s2">"doggie"</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="c1"># Matches within given limits.</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(1, 3), match='og'></span>
|
||
</code></pre><div class="versionadded"><p><span class="yiyi-st" id="yiyi-404"><span class="versionmodified">版本3.4中的新功能。</span></span></p></div></dd></dl><dl class="method"><dt id="re.regex.split"><span class="yiyi-st" id="yiyi-405"><code class="descclassname">regex.</code><code class="descname">split</code><span class="sig-paren">(</span><em>string</em>, <em>maxsplit=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-406">与使用编译模式的<a class="reference internal" href="#re.split" title="re.split"><code class="xref py py-func docutils literal"><span class="pre">split()</span></code></a>函数相同。</span></p></dd></dl><dl class="method"><dt id="re.regex.findall"><span class="yiyi-st" id="yiyi-407"><code class="descclassname">regex.</code><code class="descname">findall</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-408">类似于 <a class="reference internal" href="#re.findall" title="re.findall"><code class="xref py py-func docutils literal"><span class="pre">findall()</span></code></a> 的功能,使用已编译的模版,但也接受可选的 <em>pos</em> 和 <em>endpos</em> 参数限制搜索区域像 <a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-meth docutils literal"><span class="pre">match ()</span></code></a>。</span></p></dd></dl><dl class="method"><dt id="re.regex.finditer"><span class="yiyi-st" id="yiyi-409"><code class="descclassname">regex.</code><code class="descname">finditer</code><span class="sig-paren">(</span><em>string</em><span class="optional">[</span>, <em>pos</em><span class="optional">[</span>, <em>endpos</em><span class="optional">]</span><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-410">类似于 <a class="reference internal" href="#re.finditer" title="re.finditer"><code class="xref py py-func docutils literal"><span class="pre">finditer()</span></code></a> 的功能,使用已编译的模版,但也接受可选的 <em>pos</em> 和 <em>endpos</em> 参数像 <a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-meth docutils literal"><span class="pre">match ()</span></code></a>限制搜索区域。</span></p></dd></dl><dl class="method"><dt id="re.regex.sub"><span class="yiyi-st" id="yiyi-411"> <code class="descclassname">regex.</code><code class="descname">sub</code><span class="sig-paren">(</span><em>repl</em>, <em>string</em>, <em>count=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-412">与使用编译模式的<a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal"><span class="pre">sub()</span></code></a>函数相同。</span></p></dd></dl><dl class="method"><dt id="re.regex.subn"><span class="yiyi-st" id="yiyi-413"> <code class="descclassname">regex.</code><code class="descname">subn</code><span class="sig-paren">(</span><em>repl</em>, <em>string</em>, <em>count=0</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-414">与使用编译模式的<a class="reference internal" href="#re.subn" title="re.subn"><code class="xref py py-func docutils literal"><span class="pre">subn()</span></code></a>函数相同。</span></p></dd></dl><dl class="attribute"><dt id="re.regex.flags"><span class="yiyi-st" id="yiyi-415"><code class="descclassname">regex.</code><code class="descname">flags</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-416">正则表达式匹配标志。</span><span class="yiyi-st" id="yiyi-417">这是给予<a class="reference internal" href="#re.compile" title="re.compile"><code class="xref py py-func docutils literal"><span class="pre">compile()</span></code></a>的标志,任何<code class="docutils literal"><span class="pre">(?...)</span></code></span><span class="yiyi-st" id="yiyi-418">内联标志以及隐式标志,如<code class="xref py py-data docutils literal"><span class="pre">UNICODE</span></code>,如果模式是Unicode字符串。</span></p></dd></dl><dl class="attribute"><dt id="re.regex.groups"><span class="yiyi-st" id="yiyi-419"><code class="descclassname">regex.</code><code class="descname">groups</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-420">模式中的捕获组数量。</span></p></dd></dl><dl class="attribute"><dt id="re.regex.groupindex"><span class="yiyi-st" id="yiyi-421"><code class="descclassname">regex.</code><code class="descname">groupindex</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-422">将由<code class="docutils literal"><span class="pre">(?P<id>)</span></code>定义的任何符号组名称映射到组编号的字典。</span><span class="yiyi-st" id="yiyi-423">如果模式中没有使用符号组,则字典为空。</span></p></dd></dl><dl class="attribute"><dt id="re.regex.pattern"><span class="yiyi-st" id="yiyi-424"><code class="descclassname">regex.</code><code class="descname">pattern</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-425">编译RE对象的模式字符串。</span></p></dd></dl></div><div class="section" id="match-objects"><h2><span class="yiyi-st" id="yiyi-426">6.2.4.</span><span class="yiyi-st" id="yiyi-427">Match 对象</span></h2><p><span class="yiyi-st" id="yiyi-428">Match对象总是有一个<code class="docutils literal"><span class="pre">True</span></code>的布尔值。</span><span class="yiyi-st" id="yiyi-429">由于<a class="reference internal" href="#re.regex.match" title="re.regex.match"><code class="xref py py-meth docutils literal"><span class="pre">match()</span></code></a>和<a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-meth docutils literal"><span class="pre">search()</span></code></a> 在没有匹配上时返回 <code class="docutils literal"><span class="pre">None</span></code>。你可以用简单的 <code class="docutils literal"><span class="pre">if</span></code>语句测试是否有match对象。</span></p><pre><code class="language-python"><span></span><span class="n">match</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="n">pattern</span><span class="p">,</span> <span class="n">string</span><span class="p">)</span>
|
||
<span class="k">if</span> <span class="n">match</span><span class="p">:</span>
|
||
<span class="n">process</span><span class="p">(</span><span class="n">match</span><span class="p">)</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-430">Match 对象支持下列方法和属性︰</span></p><dl class="method"><dt id="re.match.expand"><span class="yiyi-st" id="yiyi-431"><code class="descclassname">match.</code><code class="descname">expand</code><span class="sig-paren">(</span><em>template</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-432">如<a class="reference internal" href="#re.regex.sub" title="re.regex.sub"><code class="xref py py-meth docutils literal"><span class="pre">sub()</span></code></a>方法所做的那样,返回通过对模板字符串<em>template</em>执行反斜杠替换获得的字符串。</span><span class="yiyi-st" id="yiyi-433">像<code class="docutils literal"><span class="pre">\n</span></code>之类的转义符会被转换成对应的字符,数字型的反向捕获(<code class="docutils literal"><span class="pre">\1</span></code>, <code class="docutils literal"><span class="pre">\2</span></code>)和命名反向捕获(<code class="docutils literal"><span class="pre">\g<1></span></code>,<code class="docutils literal"><span class="pre">\g<name></span></code>)会被替换为相应捕获组中的内容。</span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-434"><span class="versionmodified">在版本3.5中更改:</span>将不匹配的组替换为空字符串。</span></p></div></dd></dl><dl class="method"><dt id="re.match.group"><span class="yiyi-st" id="yiyi-435"><code class="descclassname">match.</code><code class="descname">group</code><span class="sig-paren">(</span><span class="optional">[</span><em>group1</em>, <em>...</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-436">返回匹配的一个或多个子组。</span><span class="yiyi-st" id="yiyi-437">如果有一个参数,结果是一个单一的字符串;如果有多个参数,则结果是每个参数有一个项目的元组。</span><span class="yiyi-st" id="yiyi-438">如果没有参数, <em>group1</em>默认为零 (返回整个match的匹配结果)。</span><span class="yiyi-st" id="yiyi-439">如果<em>groupN</em>参数为零,则相应的返回值是整个匹配的字符串;如果它在包含范围[1..99]中,则它是匹配相应括号组的字符串。</span><span class="yiyi-st" id="yiyi-440">如果组编号为负数或大于模式中定义的组数,则引发<a class="reference internal" href="exceptions.html#IndexError" title="IndexError"><code class="xref py py-exc docutils literal"><span class="pre">IndexError</span></code></a>异常。</span><span class="yiyi-st" id="yiyi-441">如果一个组包含在不匹配的模式的一部分中,则相应的结果是<code class="docutils literal"><span class="pre">None</span></code>。</span><span class="yiyi-st" id="yiyi-442">如果一个组包含在多次匹配的模式的一部分中,则返回最后的匹配。</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r"(\w+) (\w+)"</span><span class="p">,</span> <span class="s2">"Isaac Newton, physicist"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="c1"># The entire match</span>
|
||
<span class="go">'Isaac Newton'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># The first parenthesized subgroup.</span>
|
||
<span class="go">'Isaac'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span> <span class="c1"># The second parenthesized subgroup.</span>
|
||
<span class="go">'Newton'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span> <span class="c1"># Multiple arguments give us a tuple.</span>
|
||
<span class="go">('Isaac', 'Newton')</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-443">如果正则表达式使用<code class="docutils literal"><span class="pre">(?P<name>...)</span></code>语法,则<em>groupN</em>参数也可以是通过其组名称标识组的字符串。</span><span class="yiyi-st" id="yiyi-444">如果在模式中没有使用字符串参数作为组名称,则会引发<a class="reference internal" href="exceptions.html#IndexError" title="IndexError"><code class="xref py py-exc docutils literal"><span class="pre">IndexError</span></code></a>异常。</span></p><p><span class="yiyi-st" id="yiyi-445">一个中等复杂的例子:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r"(?P<first_name>\w+) (?P<last_name>\w+)"</span><span class="p">,</span> <span class="s2">"Malcolm Reynolds"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'first_name'</span><span class="p">)</span>
|
||
<span class="go">'Malcolm'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="s1">'last_name'</span><span class="p">)</span>
|
||
<span class="go">'Reynolds'</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-446">命名组也可以通过它们的索引来引用:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="go">'Malcolm'</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
|
||
<span class="go">'Reynolds'</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-447">如果一个组匹配多次,只能访问最后一场比赛:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r"(..)+"</span><span class="p">,</span> <span class="s2">"a1b2c3"</span><span class="p">)</span> <span class="c1"># Matches 3 times.</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># Returns only the last match.</span>
|
||
<span class="go">'c3'</span>
|
||
</code></pre></dd></dl><dl class="method"><dt id="re.match.groups"><span class="yiyi-st" id="yiyi-448"><code class="descclassname">match.</code><code class="descname">groups</code><span class="sig-paren">(</span><em>default=None</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-449">返回一个包含匹配所有子组的元组,从1开始,直到模式中有多个组。</span><span class="yiyi-st" id="yiyi-450"><em>默认</em>参数用于未参与匹配的组;它默认为<code class="docutils literal"><span class="pre">None</span></code>。</span></p><p><span class="yiyi-st" id="yiyi-451">例如:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r"(\d+)\.(\d+)"</span><span class="p">,</span> <span class="s2">"24.1632"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">groups</span><span class="p">()</span>
|
||
<span class="go">('24', '1632')</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-452">如果我们将小数点后的所有数字都设为可选,则并非所有组都可以参与比赛。</span><span class="yiyi-st" id="yiyi-453">除非给出<em>默认</em>参数,否则这些组将默认为<code class="docutils literal"><span class="pre">None</span></code>:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r"(\d+)\.?(\d+)?"</span><span class="p">,</span> <span class="s2">"24"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">groups</span><span class="p">()</span> <span class="c1"># Second group defaults to None.</span>
|
||
<span class="go">('24', None)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">groups</span><span class="p">(</span><span class="s1">'0'</span><span class="p">)</span> <span class="c1"># Now, the second group defaults to '0'.</span>
|
||
<span class="go">('24', '0')</span>
|
||
</code></pre></dd></dl><dl class="method"><dt id="re.match.groupdict"><span class="yiyi-st" id="yiyi-454"><code class="descclassname">match.</code><code class="descname">groupdict</code><span class="sig-paren">(</span><em>default=None</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-455">返回一个有<em>别名</em>的组的匹配子组的字典(没有别名的子组不包含在内)。键为子组名,值为子串。</span><span class="yiyi-st" id="yiyi-456"><em>默认</em>参数用于未参与匹配的组;它默认为<code class="docutils literal"><span class="pre">None</span></code>。</span><span class="yiyi-st" id="yiyi-457">例如:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r"(?P<first_name>\w+) (?P<last_name>\w+)"</span><span class="p">,</span> <span class="s2">"Malcolm Reynolds"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">m</span><span class="o">.</span><span class="n">groupdict</span><span class="p">()</span>
|
||
<span class="go">{'first_name': 'Malcolm', 'last_name': 'Reynolds'}</span>
|
||
</code></pre></dd></dl><dl class="method"><dt id="re.match.start"><span class="yiyi-st" id="yiyi-458"> <code class="descclassname">match.</code><code class="descname">start</code><span class="sig-paren">(</span><span class="optional">[</span><em>group</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dt id="re.match.end"><span class="yiyi-st" id="yiyi-459"> <code class="descclassname">match.</code><code class="descname">end</code><span class="sig-paren">(</span><span class="optional">[</span><em>group</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-460">返回由<em>组</em>匹配的子串的开始和结束的索引; <em>组</em>默认为零(表示整个匹配的子字符串)。</span><span class="yiyi-st" id="yiyi-461">如果<em>组</em>存在但是不参与匹配,则返回<code class="docutils literal"><span class="pre">-1</span></code>。</span><span class="yiyi-st" id="yiyi-462">对于确实有助于匹配的匹配对象<em>m</em>和组<em>g</em>,子组匹配<em>g</em>(等同于<code class="docutils literal"><span class="pre">m.group(g)</span></code>)是</span></p><pre><code class="language-python"><span></span><span class="n">m</span><span class="o">.</span><span class="n">string</span><span class="p">[</span><span class="n">m</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">g</span><span class="p">):</span><span class="n">m</span><span class="o">.</span><span class="n">end</span><span class="p">(</span><span class="n">g</span><span class="p">)]</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-463">Note that <code class="docutils literal"><span class="pre">m.start(group)</span></code> will equal <code class="docutils literal"><span class="pre">m.end(group)</span></code> if <em>group</em> matched a null string. </span><span class="yiyi-st" id="yiyi-464">For example, after <code class="docutils literal"><span class="pre">m</span> <span class="pre">=</span> <span class="pre">re.search('b(c?)',</span> <span class="pre">'cba')</span></code>, <code class="docutils literal"><span class="pre">m.start(0)</span></code> is 1, <code class="docutils literal"><span class="pre">m.end(0)</span></code> is 2, <code class="docutils literal"><span class="pre">m.start(1)</span></code> and <code class="docutils literal"><span class="pre">m.end(1)</span></code> are both 2, and <code class="docutils literal"><span class="pre">m.start(2)</span></code> raises an <a class="reference internal" href="exceptions.html#IndexError" title="IndexError"><code class="xref py py-exc docutils literal"><span class="pre">IndexError</span></code></a> exception.</span></p><p><span class="yiyi-st" id="yiyi-465">一个将从电子邮件地址中删除<em>remove_this</em>的示例:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">email</span> <span class="o">=</span> <span class="s2">"tony@tiremove_thisger.net"</span>
|
||
<span class="gp">>>> </span><span class="n">m</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"remove_this"</span><span class="p">,</span> <span class="n">email</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">email</span><span class="p">[:</span><span class="n">m</span><span class="o">.</span><span class="n">start</span><span class="p">()]</span> <span class="o">+</span> <span class="n">email</span><span class="p">[</span><span class="n">m</span><span class="o">.</span><span class="n">end</span><span class="p">():]</span>
|
||
<span class="go">'tony@tiger.net'</span>
|
||
</code></pre></dd></dl><dl class="method"><dt id="re.match.span"><span class="yiyi-st" id="yiyi-466"> <code class="descclassname">match.</code><code class="descname">span</code><span class="sig-paren">(</span><span class="optional">[</span><em>group</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-467">对于匹配<em>m</em>,返回2元组<code class="docutils literal"><span class="pre">(m.start(group),</span> <span class="pre">m.end(group))</span> < / T1>。</code></span><span class="yiyi-st" id="yiyi-468">请注意,如果<em>组</em>没有对匹配做出贡献,则这是<code class="docutils literal"><span class="pre">( - 1,</span> <span class="pre">-1)</span></code>。</span><span class="yiyi-st" id="yiyi-469"><em>组</em>默认为零,整个匹配。</span></p></dd></dl><dl class="attribute"><dt id="re.match.pos"><span class="yiyi-st" id="yiyi-470"><code class="descclassname">match.</code><code class="descname">pos</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-471">传递给<a class="reference internal" href="#re-objects"><span>regex object</span></a>的<a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-meth docutils literal"><span class="pre">search()</span></code></a>或<a class="reference internal" href="#re.regex.match" title="re.regex.match"><code class="xref py py-meth docutils literal"><span class="pre">match()</span></code></a>方法的<em>pos</em>的值。</span><span class="yiyi-st" id="yiyi-472">这是RE引擎开始寻找匹配的字符串的索引。</span></p></dd></dl><dl class="attribute"><dt id="re.match.endpos"><span class="yiyi-st" id="yiyi-473"><code class="descclassname">match.</code><code class="descname">endpos</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-474">传递给<a class="reference internal" href="#re-objects"><span>regex object</span></a>的<a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-meth docutils literal"><span class="pre">search()</span></code></a>或<a class="reference internal" href="#re.regex.match" title="re.regex.match"><code class="xref py py-meth docutils literal"><span class="pre">match()</span></code></a>方法的<em>endpos</em>的值。</span><span class="yiyi-st" id="yiyi-475">这是RE引擎不会去的字符串的索引。</span></p></dd></dl><dl class="attribute"><dt id="re.match.lastindex"><span class="yiyi-st" id="yiyi-476"><code class="descclassname">match.</code><code class="descname">lastindex</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-477">最后一个匹配捕获组的整数索引,或者<code class="docutils literal"><span class="pre">None</span></code>,如果没有组完全匹配。</span><span class="yiyi-st" id="yiyi-478">For example, the expressions <code class="docutils literal"><span class="pre">(a)b</span></code>, <code class="docutils literal"><span class="pre">((a)(b))</span></code>, and <code class="docutils literal"><span class="pre">((ab))</span></code> will have <code class="docutils literal"><span class="pre">lastindex</span> <span class="pre">==</span> <span class="pre">1</span></code> if applied to the string <code class="docutils literal"><span class="pre">'ab'</span></code>, while the expression <code class="docutils literal"><span class="pre">(a)(b)</span></code> will have <code class="docutils literal"><span class="pre">lastindex</span> <span class="pre">==</span> <span class="pre">2</span></code>, if applied to the same string.</span></p></dd></dl><dl class="attribute"><dt id="re.match.lastgroup"><span class="yiyi-st" id="yiyi-479"><code class="descclassname">match.</code><code class="descname">lastgroup</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-480">最后匹配的捕获组的名称,如果组没有名称,或者根本没有匹配组,则<code class="docutils literal"><span class="pre">None</span></code>。</span></p></dd></dl><dl class="attribute"><dt id="re.match.re"><span class="yiyi-st" id="yiyi-481"><code class="descclassname">match.</code><code class="descname">re</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-482"><a class="reference internal" href="#re.regex.match" title="re.regex.match"><code class="xref py py-meth docutils literal"><span class="pre">match()</span></code></a>或<a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-meth docutils literal"><span class="pre">search()</span></code></a>方法的正则表达式对象产生了此匹配实例。</span></p></dd></dl><dl class="attribute"><dt id="re.match.string"><span class="yiyi-st" id="yiyi-483"><code class="descclassname">match.</code><code class="descname">string</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-484">被<a class="reference internal" href="#re.regex.match" title="re.regex.match"><code class="xref py py-meth docutils literal"><span class="pre">match()</span></code></a>或者<a class="reference internal" href="#re.regex.search" title="re.regex.search"><code class="xref py py-meth docutils literal"><span class="pre">search()</span></code></a>匹配的字符串。</span></p></dd></dl></div><div class="section" id="regular-expression-examples"><h2><span class="yiyi-st" id="yiyi-485">6.2.5.</span><span class="yiyi-st" id="yiyi-486">正则表达式实例</span></h2><div class="section" id="checking-for-a-pair"><h3><span class="yiyi-st" id="yiyi-487">6.2.5.1.</span><span class="yiyi-st" id="yiyi-488">检查一对</span></h3><p><span class="yiyi-st" id="yiyi-489">在这个例子中,我们将使用以下辅助函数来更加优雅地显示匹配对象:</span></p><pre><code class="language-python"><span></span><span class="k">def</span> <span class="nf">displaymatch</span><span class="p">(</span><span class="n">match</span><span class="p">):</span>
|
||
<span class="k">if</span> <span class="n">match</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||
<span class="k">return</span> <span class="kc">None</span>
|
||
<span class="k">return</span> <span class="s1">'<Match: </span><span class="si">%r</span><span class="s1">, groups=</span><span class="si">%r</span><span class="s1">>'</span> <span class="o">%</span> <span class="p">(</span><span class="n">match</span><span class="o">.</span><span class="n">group</span><span class="p">(),</span> <span class="n">match</span><span class="o">.</span><span class="n">groups</span><span class="p">())</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-490">假设您正在编写一个扑克程序,其中玩家的手表示为5个字符的字符串,每个字符代表一张牌,“a”代表王牌,“k”代表国王,“q”代表女王,“j”代表插孔, “t”为10,“2”至“9”代表具有该值的卡。</span></p><p><span class="yiyi-st" id="yiyi-491">要查看给定的字符串是否是有效的手,可以执行以下操作:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">valid</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">r"^[a2-9tjqk]</span><span class="si">{5}</span><span class="s2">$"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">valid</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"akt5q"</span><span class="p">))</span> <span class="c1"># Valid.</span>
|
||
<span class="go">"<Match: 'akt5q', groups=()>"</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">valid</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"akt5e"</span><span class="p">))</span> <span class="c1"># Invalid.</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">valid</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"akt"</span><span class="p">))</span> <span class="c1"># Invalid.</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">valid</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"727ak"</span><span class="p">))</span> <span class="c1"># Valid.</span>
|
||
<span class="go">"<Match: '727ak', groups=()>"</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-492">That last hand, <code class="docutils literal"><span class="pre">"727ak"</span></code>, contained a pair, or two of the same valued cards. </span><span class="yiyi-st" id="yiyi-493">为了与正则表达式匹配,可以使用反向引用:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">pair</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="s2">r".*(.).*\1"</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"717ak"</span><span class="p">))</span> <span class="c1"># Pair of 7s.</span>
|
||
<span class="go">"<Match: '717', groups=('7',)>"</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"718ak"</span><span class="p">))</span> <span class="c1"># No pairs.</span>
|
||
<span class="gp">>>> </span><span class="n">displaymatch</span><span class="p">(</span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"354aa"</span><span class="p">))</span> <span class="c1"># Pair of aces.</span>
|
||
<span class="go">"<Match: '354aa', groups=('a',)>"</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-494">为了找出这对卡片组成的卡片,可以按照以下方式使用匹配对象的<a class="reference internal" href="#re.match.group" title="re.match.group"><code class="xref py py-meth docutils literal"><span class="pre">group()</span></code></a>方法:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"717ak"</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="go">'7'</span>
|
||
|
||
<span class="go"># Error because re.match() returns None, which doesn't have a group() method:</span>
|
||
<span class="gp">>>> </span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"718ak"</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="gt">Traceback (most recent call last):</span>
|
||
File <span class="nb">"<pyshell#23>"</span>, line <span class="m">1</span>, in <span class="n"><module></span>
|
||
<span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r".*(.).*\1"</span><span class="p">,</span> <span class="s2">"718ak"</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="gr">AttributeError</span>: <span class="n">'NoneType' object has no attribute 'group'</span>
|
||
|
||
<span class="gp">>>> </span><span class="n">pair</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"354aa"</span><span class="p">)</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
|
||
<span class="go">'a'</span>
|
||
</code></pre></div><div class="section" id="simulating-scanf"><h3><span class="yiyi-st" id="yiyi-495">6.2.5.2.</span><span class="yiyi-st" id="yiyi-496">模拟scanf()</span></h3><p id="index-0"><span class="yiyi-st" id="yiyi-497">Python目前不具有与<code class="xref c c-func docutils literal"><span class="pre">scanf()</span></code>等效的功能。</span><span class="yiyi-st" id="yiyi-498">正则表达式通常比<code class="xref c c-func docutils literal"><span class="pre">scanf()</span></code>格式字符串更强大,但也更加冗长。</span><span class="yiyi-st" id="yiyi-499">下表提供了<code class="xref c c-func docutils literal"><span class="pre">scanf()</span></code>格式标记和正则表达式之间的一些或多或少的等价映射。</span></p><table border="1" class="docutils"><thead valign="bottom"><tr class="row-odd"><th class="head"><span class="yiyi-st" id="yiyi-500"><code class="xref c c-func docutils literal"><span class="pre">scanf()</span></code>令牌</span></th><th class="head"><span class="yiyi-st" id="yiyi-501">正则表达式</span></th></tr></thead><tbody valign="top"><tr class="row-even"><td><span class="yiyi-st" id="yiyi-502"><code class="docutils literal"><span class="pre">%c</span></code></span></td><td><span class="yiyi-st" id="yiyi-503"><code class="docutils literal"><span class="pre">.</span></code></span></td></tr><tr class="row-odd"><td><span class="yiyi-st" id="yiyi-504"><code class="docutils literal"><span class="pre">%5c</span></code></span></td><td><span class="yiyi-st" id="yiyi-505"><code class="docutils literal"><span class="pre">.{5}</span></code></span></td></tr><tr class="row-even"><td><span class="yiyi-st" id="yiyi-506"><code class="docutils literal"><span class="pre">%d</span></code></span></td><td><span class="yiyi-st" id="yiyi-507"><code class="docutils literal"><span class="pre">[-+]?\d+</span></code></span></td></tr><tr class="row-odd"><td><span class="yiyi-st" id="yiyi-508"><code class="docutils literal"><span class="pre">%e</span></code>, <code class="docutils literal"><span class="pre">%E</span></code>, <code class="docutils literal"><span class="pre">%f</span></code>, <code class="docutils literal"><span class="pre">%g</span></code></span></td><td><span class="yiyi-st" id="yiyi-509"><code class="docutils literal"><span class="pre">[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?</span></code></span></td></tr><tr class="row-even"><td><span class="yiyi-st" id="yiyi-510"><code class="docutils literal"><span class="pre">%I</span></code></span></td><td><span class="yiyi-st" id="yiyi-511"><code class="docutils literal"><span class="pre">[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)</span></code></span></td></tr><tr class="row-odd"><td><span class="yiyi-st" id="yiyi-512"><code class="docutils literal"><span class="pre">%o</span></code></span></td><td><span class="yiyi-st" id="yiyi-513"><code class="docutils literal"><span class="pre">[-+]?[0-7]+</span></code></span></td></tr><tr class="row-even"><td><span class="yiyi-st" id="yiyi-514"><code class="docutils literal"><span class="pre">%S</span></code></span></td><td><span class="yiyi-st" id="yiyi-515"><code class="docutils literal"><span class="pre">\S+</span></code></span></td></tr><tr class="row-odd"><td><span class="yiyi-st" id="yiyi-516"><code class="docutils literal"><span class="pre">%U</span></code></span></td><td><span class="yiyi-st" id="yiyi-517"><code class="docutils literal"><span class="pre">\d+</span></code></span></td></tr><tr class="row-even"><td><span class="yiyi-st" id="yiyi-518"><code class="docutils literal"><span class="pre">%x</span></code>,<code class="docutils literal"><span class="pre">%X</span></code></span></td><td><span class="yiyi-st" id="yiyi-519"><code class="docutils literal"><span class="pre">[-+]?(0[xX])?[\dA-Fa-f]+</span></code></span></td></tr></tbody></table><p><span class="yiyi-st" id="yiyi-520">从字符串中提取文件名和数字</span></p><pre><code class="language-python"><span></span><span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">sbin</span><span class="o">/</span><span class="n">sendmail</span> <span class="o">-</span> <span class="mi">0</span> <span class="n">errors</span><span class="p">,</span> <span class="mi">4</span> <span class="n">warnings</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-521">您可以使用<code class="xref c c-func docutils literal"><span class="pre">scanf()</span></code>格式</span></p><pre><code class="language-python"><span></span><span class="o">%</span><span class="n">s</span> <span class="o">-</span> <span class="o">%</span><span class="n">d</span> <span class="n">errors</span><span class="p">,</span> <span class="o">%</span><span class="n">d</span> <span class="n">warnings</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-522">等效的正则表达式是</span></p><pre><code class="language-python"><span></span><span class="p">(</span>\<span class="n">S</span><span class="o">+</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span>\<span class="n">d</span><span class="o">+</span><span class="p">)</span> <span class="n">errors</span><span class="p">,</span> <span class="p">(</span>\<span class="n">d</span><span class="o">+</span><span class="p">)</span> <span class="n">warnings</span>
|
||
</code></pre></div><div class="section" id="search-vs-match"><h3><span class="yiyi-st" id="yiyi-523">6.2.5.3. search() vs. match()</span></h3><p><span class="yiyi-st" id="yiyi-524">Python提供了基于正则表达式的两种不同的基本操作:<a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-func docutils literal"><span class="pre">re.match()</span></code></a>仅在字符串的开始处检查匹配,而<a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal"><span class="pre">re.search()</span></code></a>检查匹配字符串中的任意位置(这是Perl默认执行的操作)。</span></p><p><span class="yiyi-st" id="yiyi-525">例如:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"c"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># No match</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"c"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># Match</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(2, 3), match='c'></span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-526">以<code class="docutils literal"><span class="pre">'^'</span></code>开头的正则表达式可以与<a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal"><span class="pre">search()</span></code></a>一起用于限制字符串开始处的匹配:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"c"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># No match</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"^c"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># No match</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s2">"^a"</span><span class="p">,</span> <span class="s2">"abcdef"</span><span class="p">)</span> <span class="c1"># Match</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(0, 1), match='a'></span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-527">Note however that in <a class="reference internal" href="#re.MULTILINE" title="re.MULTILINE"><code class="xref py py-const docutils literal"><span class="pre">MULTILINE</span></code></a> mode <a class="reference internal" href="#re.match" title="re.match"><code class="xref py py-func docutils literal"><span class="pre">match()</span></code></a> only matches at the beginning of the string, whereas using <a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal"><span class="pre">search()</span></code></a> with a regular expression beginning with <code class="docutils literal"><span class="pre">'^'</span></code> will match at the beginning of each line.</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s1">'X'</span><span class="p">,</span> <span class="s1">'A</span><span class="se">\n</span><span class="s1">B</span><span class="se">\n</span><span class="s1">X'</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">MULTILINE</span><span class="p">)</span> <span class="c1"># No match</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">search</span><span class="p">(</span><span class="s1">'^X'</span><span class="p">,</span> <span class="s1">'A</span><span class="se">\n</span><span class="s1">B</span><span class="se">\n</span><span class="s1">X'</span><span class="p">,</span> <span class="n">re</span><span class="o">.</span><span class="n">MULTILINE</span><span class="p">)</span> <span class="c1"># Match</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(4, 5), match='X'></span>
|
||
</code></pre></div><div class="section" id="making-a-phonebook"><h3><span class="yiyi-st" id="yiyi-528">6.2.5.4.</span><span class="yiyi-st" id="yiyi-529">制作电话簿</span></h3><p><span class="yiyi-st" id="yiyi-530"><a class="reference internal" href="#re.split" title="re.split"><code class="xref py py-func docutils literal"><span class="pre">split()</span></code></a> splits a string into a list delimited by the passed pattern. </span><span class="yiyi-st" id="yiyi-531">该方法对于将文本数据转换为可由Python轻松读取和修改的数据结构非常有用,如以下创建电话簿的示例所示。</span></p><p><span class="yiyi-st" id="yiyi-532">首先,这是输入。</span><span class="yiyi-st" id="yiyi-533">通常它可能来自一个文件,这里我们使用三引号字符串语法:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">text</span> <span class="o">=</span> <span class="s2">"""Ross McFluff: 834.345.1254 155 Elm Street</span>
|
||
<span class="gp">...</span><span class="s2"></span>
|
||
<span class="gp">... </span><span class="s2">Ronald Heathmore: 892.345.3428 436 Finley Avenue</span>
|
||
<span class="gp">... </span><span class="s2">Frank Burger: 925.541.7625 662 South Dogwood Way</span>
|
||
<span class="gp">...</span><span class="s2"></span>
|
||
<span class="gp">...</span><span class="s2"></span>
|
||
<span class="gp">... </span><span class="s2">Heather Albrecht: 548.326.4584 919 Park Place"""</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-534">条目由一个或多个换行符分隔。</span><span class="yiyi-st" id="yiyi-535">现在我们将字符串转换为一个列表,每个非空行都有自己的条目:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">entries</span> <span class="o">=</span> <span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">+"</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">entries</span>
|
||
<span class="go">['Ross McFluff: 834.345.1254 155 Elm Street',</span>
|
||
<span class="go">'Ronald Heathmore: 892.345.3428 436 Finley Avenue',</span>
|
||
<span class="go">'Frank Burger: 925.541.7625 662 South Dogwood Way',</span>
|
||
<span class="go">'Heather Albrecht: 548.326.4584 919 Park Place']</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-536">最后,将每个条目分成一个名字,姓氏,电话号码和地址。</span><span class="yiyi-st" id="yiyi-537">我们使用<a class="reference internal" href="#re.split" title="re.split"><code class="xref py py-func docutils literal"><span class="pre">split()</span></code></a>的<code class="docutils literal"><span class="pre">maxsplit</span></code>参数,因为地址中有空格(我们的分割模式):</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="p">[</span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">":? "</span><span class="p">,</span> <span class="n">entry</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">entries</span><span class="p">]</span>
|
||
<span class="go">[['Ross', 'McFluff', '834.345.1254', '155 Elm Street'],</span>
|
||
<span class="go">['Ronald', 'Heathmore', '892.345.3428', '436 Finley Avenue'],</span>
|
||
<span class="go">['Frank', 'Burger', '925.541.7625', '662 South Dogwood Way'],</span>
|
||
<span class="go">['Heather', 'Albrecht', '548.326.4584', '919 Park Place']]</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-538"><code class="docutils literal"><span class="pre">:?</span></code></span><span class="yiyi-st" id="yiyi-539">模式匹配姓氏后的冒号,以便它不会出现在结果列表中。</span><span class="yiyi-st" id="yiyi-540">用<code class="docutils literal"><span class="pre">4</span></code>的<code class="docutils literal"><span class="pre">maxsplit</span></code>,我们可以将房屋号码与街道名称分开:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="p">[</span><span class="n">re</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">":? "</span><span class="p">,</span> <span class="n">entry</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span> <span class="k">for</span> <span class="n">entry</span> <span class="ow">in</span> <span class="n">entries</span><span class="p">]</span>
|
||
<span class="go">[['Ross', 'McFluff', '834.345.1254', '155', 'Elm Street'],</span>
|
||
<span class="go">['Ronald', 'Heathmore', '892.345.3428', '436', 'Finley Avenue'],</span>
|
||
<span class="go">['Frank', 'Burger', '925.541.7625', '662', 'South Dogwood Way'],</span>
|
||
<span class="go">['Heather', 'Albrecht', '548.326.4584', '919', 'Park Place']]</span>
|
||
</code></pre></div><div class="section" id="text-munging"><h3><span class="yiyi-st" id="yiyi-541">6.2.5.5.</span><span class="yiyi-st" id="yiyi-542">文本发送</span></h3><p><span class="yiyi-st" id="yiyi-543"><a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal"><span class="pre">sub()</span></code></a>用字符串或函数的结果替换每个模式的出现。</span><span class="yiyi-st" id="yiyi-544">这个例子演示了如何使用带有函数的<a class="reference internal" href="#re.sub" title="re.sub"><code class="xref py py-func docutils literal"><span class="pre">sub()</span></code></a>来“模糊”文本,或者随机化除第一个和最后一个字符之外的每个单词中所有字符的顺序:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="k">def</span> <span class="nf">repl</span><span class="p">(</span><span class="n">m</span><span class="p">):</span>
|
||
<span class="gp">... </span> <span class="n">inner_word</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span>
|
||
<span class="gp">... </span> <span class="n">random</span><span class="o">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">inner_word</span><span class="p">)</span>
|
||
<span class="gp">... </span> <span class="k">return</span> <span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="s2">""</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">inner_word</span><span class="p">)</span> <span class="o">+</span> <span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
|
||
<span class="gp">>>> </span><span class="n">text</span> <span class="o">=</span> <span class="s2">"Professor Abdolmalek, please report your absences promptly."</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s2">r"(\w)(\w+)(\w)"</span><span class="p">,</span> <span class="n">repl</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
|
||
<span class="go">'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.'</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">sub</span><span class="p">(</span><span class="s2">r"(\w)(\w+)(\w)"</span><span class="p">,</span> <span class="n">repl</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
|
||
<span class="go">'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.'</span>
|
||
</code></pre></div><div class="section" id="finding-all-adverbs"><h3><span class="yiyi-st" id="yiyi-545">6.2.5.6.</span><span class="yiyi-st" id="yiyi-546">找到所有副词</span></h3><p><span class="yiyi-st" id="yiyi-547"><a class="reference internal" href="#re.findall" title="re.findall"><code class="xref py py-func docutils literal"><span class="pre">findall()</span></code></a> matches <em>all</em> occurrences of a pattern, not just the first one as <a class="reference internal" href="#re.search" title="re.search"><code class="xref py py-func docutils literal"><span class="pre">search()</span></code></a> does. </span><span class="yiyi-st" id="yiyi-548">例如,如果一个人是作家,并且想要在某些文本中找到所有副词,他或她可以按以下方式使用<a class="reference internal" href="#re.findall" title="re.findall"><code class="xref py py-func docutils literal"><span class="pre">findall()</span></code></a>:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">text</span> <span class="o">=</span> <span class="s2">"He was carefully disguised but captured quickly by police."</span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">findall</span><span class="p">(</span><span class="s2">r"\w+ly"</span><span class="p">,</span> <span class="n">text</span><span class="p">)</span>
|
||
<span class="go">['carefully', 'quickly']</span>
|
||
</code></pre></div><div class="section" id="finding-all-adverbs-and-their-positions"><h3><span class="yiyi-st" id="yiyi-549">6.2.5.7.</span><span class="yiyi-st" id="yiyi-550">找到所有副词及其位置</span></h3><p><span class="yiyi-st" id="yiyi-551">如果想要获取关于匹配文本的所有匹配的更多信息,<a class="reference internal" href="#re.finditer" title="re.finditer"><code class="xref py py-func docutils literal"><span class="pre">finditer()</span></code></a>非常有用,因为它提供了<a class="reference internal" href="#match-objects"><span>match objects</span></a>而不是字符串。</span><span class="yiyi-st" id="yiyi-552">Continuing with the previous example, if one was a writer who wanted to find all of the adverbs <em>and their positions</em> in some text, he or she would use <a class="reference internal" href="#re.finditer" title="re.finditer"><code class="xref py py-func docutils literal"><span class="pre">finditer()</span></code></a> in the following manner:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">text</span> <span class="o">=</span> <span class="s2">"He was carefully disguised but captured quickly by police."</span>
|
||
<span class="gp">>>> </span><span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="n">re</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="s2">r"\w+ly"</span><span class="p">,</span> <span class="n">text</span><span class="p">):</span>
|
||
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="si">%02d</span><span class="s1">-</span><span class="si">%02d</span><span class="s1">: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">m</span><span class="o">.</span><span class="n">start</span><span class="p">(),</span> <span class="n">m</span><span class="o">.</span><span class="n">end</span><span class="p">(),</span> <span class="n">m</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="mi">0</span><span class="p">)))</span>
|
||
<span class="go">07-16: carefully</span>
|
||
<span class="go">40-47: quickly</span>
|
||
</code></pre></div><div class="section" id="raw-string-notation"><h3><span class="yiyi-st" id="yiyi-553">6.2.5.8.</span><span class="yiyi-st" id="yiyi-554">原始字符串表示法</span></h3><p><span class="yiyi-st" id="yiyi-555">原始字符串符号(<code class="docutils literal"><span class="pre">r"text"</span></code>)使正则表达式保持正常。</span><span class="yiyi-st" id="yiyi-556">没有它,正则表达式中的每个反斜杠(<code class="docutils literal"><span class="pre">'\'</span></code>)都必须以另一个反斜杠作为前缀。</span><span class="yiyi-st" id="yiyi-557">例如,以下两行代码是等价的:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r"\W(.)\1\W"</span><span class="p">,</span> <span class="s2">" ff "</span><span class="p">)</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(0, 4), match=' ff '></span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"</span><span class="se">\\</span><span class="s2">W(.)</span><span class="se">\\</span><span class="s2">1</span><span class="se">\\</span><span class="s2">W"</span><span class="p">,</span> <span class="s2">" ff "</span><span class="p">)</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(0, 4), match=' ff '></span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-558">当想要匹配文字反斜杠时,它必须在正则表达式中转义。</span><span class="yiyi-st" id="yiyi-559">使用原始字符串表示法,这意味着<code class="docutils literal"><span class="pre">r"\\"</span></code>。</span><span class="yiyi-st" id="yiyi-560">没有原始字符串表示法,必须使用<code class="docutils literal"><span class="pre">"\\\\"</span></code>,使以下代码行功能相同:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">r"</span><span class="se">\\</span><span class="s2">"</span><span class="p">,</span> <span class="s2">r"</span><span class="se">\\</span><span class="s2">"</span><span class="p">)</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(0, 1), match='\\'></span>
|
||
<span class="gp">>>> </span><span class="n">re</span><span class="o">.</span><span class="n">match</span><span class="p">(</span><span class="s2">"</span><span class="se">\\\\</span><span class="s2">"</span><span class="p">,</span> <span class="s2">r"</span><span class="se">\\</span><span class="s2">"</span><span class="p">)</span>
|
||
<span class="go"><_sre.SRE_Match object; span=(0, 1), match='\\'></span>
|
||
</code></pre></div><div class="section" id="writing-a-tokenizer"><h3><span class="yiyi-st" id="yiyi-561">6.2.5.9.</span><span class="yiyi-st" id="yiyi-562">编写一个Tokenizer</span></h3><p><span class="yiyi-st" id="yiyi-563"><a class="reference external" href="https://en.wikipedia.org/wiki/Lexical_analysis">标记器或扫描器</a>分析字符串以对字符组进行分类。</span><span class="yiyi-st" id="yiyi-564">这是在编写编译器或解释器时有用的第一步。 </span></p><p><span class="yiyi-st" id="yiyi-565">文本类别使用正则表达式指定。</span><span class="yiyi-st" id="yiyi-566">该技术将这些组合成一个主正则表达式,并循环连续匹配:</span></p><pre><code class="language-python"><span></span><span class="kn">import</span> <span class="nn">collections</span>
|
||
<span class="kn">import</span> <span class="nn">re</span>
|
||
|
||
<span class="n">Token</span> <span class="o">=</span> <span class="n">collections</span><span class="o">.</span><span class="n">namedtuple</span><span class="p">(</span><span class="s1">'Token'</span><span class="p">,</span> <span class="p">[</span><span class="s1">'typ'</span><span class="p">,</span> <span class="s1">'value'</span><span class="p">,</span> <span class="s1">'line'</span><span class="p">,</span> <span class="s1">'column'</span><span class="p">])</span>
|
||
|
||
<span class="k">def</span> <span class="nf">tokenize</span><span class="p">(</span><span class="n">code</span><span class="p">):</span>
|
||
<span class="n">keywords</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'IF'</span><span class="p">,</span> <span class="s1">'THEN'</span><span class="p">,</span> <span class="s1">'ENDIF'</span><span class="p">,</span> <span class="s1">'FOR'</span><span class="p">,</span> <span class="s1">'NEXT'</span><span class="p">,</span> <span class="s1">'GOSUB'</span><span class="p">,</span> <span class="s1">'RETURN'</span><span class="p">}</span>
|
||
<span class="n">token_specification</span> <span class="o">=</span> <span class="p">[</span>
|
||
<span class="p">(</span><span class="s1">'NUMBER'</span><span class="p">,</span> <span class="s1">r'\d+(\.\d*)?'</span><span class="p">),</span> <span class="c1"># Integer or decimal number</span>
|
||
<span class="p">(</span><span class="s1">'ASSIGN'</span><span class="p">,</span> <span class="s1">r':='</span><span class="p">),</span> <span class="c1"># Assignment operator</span>
|
||
<span class="p">(</span><span class="s1">'END'</span><span class="p">,</span> <span class="s1">r';'</span><span class="p">),</span> <span class="c1"># Statement terminator</span>
|
||
<span class="p">(</span><span class="s1">'ID'</span><span class="p">,</span> <span class="s1">r'[A-Za-z]+'</span><span class="p">),</span> <span class="c1"># Identifiers</span>
|
||
<span class="p">(</span><span class="s1">'OP'</span><span class="p">,</span> <span class="s1">r'[+\-*/]'</span><span class="p">),</span> <span class="c1"># Arithmetic operators</span>
|
||
<span class="p">(</span><span class="s1">'NEWLINE'</span><span class="p">,</span> <span class="s1">r'\n'</span><span class="p">),</span> <span class="c1"># Line endings</span>
|
||
<span class="p">(</span><span class="s1">'SKIP'</span><span class="p">,</span> <span class="s1">r'[ \t]+'</span><span class="p">),</span> <span class="c1"># Skip over spaces and tabs</span>
|
||
<span class="p">(</span><span class="s1">'MISMATCH'</span><span class="p">,</span><span class="s1">r'.'</span><span class="p">),</span> <span class="c1"># Any other character</span>
|
||
<span class="p">]</span>
|
||
<span class="n">tok_regex</span> <span class="o">=</span> <span class="s1">'|'</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s1">'(?P<</span><span class="si">%s</span><span class="s1">></span><span class="si">%s</span><span class="s1">)'</span> <span class="o">%</span> <span class="n">pair</span> <span class="k">for</span> <span class="n">pair</span> <span class="ow">in</span> <span class="n">token_specification</span><span class="p">)</span>
|
||
<span class="n">line_num</span> <span class="o">=</span> <span class="mi">1</span>
|
||
<span class="n">line_start</span> <span class="o">=</span> <span class="mi">0</span>
|
||
<span class="k">for</span> <span class="n">mo</span> <span class="ow">in</span> <span class="n">re</span><span class="o">.</span><span class="n">finditer</span><span class="p">(</span><span class="n">tok_regex</span><span class="p">,</span> <span class="n">code</span><span class="p">):</span>
|
||
<span class="n">kind</span> <span class="o">=</span> <span class="n">mo</span><span class="o">.</span><span class="n">lastgroup</span>
|
||
<span class="n">value</span> <span class="o">=</span> <span class="n">mo</span><span class="o">.</span><span class="n">group</span><span class="p">(</span><span class="n">kind</span><span class="p">)</span>
|
||
<span class="k">if</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'NEWLINE'</span><span class="p">:</span>
|
||
<span class="n">line_start</span> <span class="o">=</span> <span class="n">mo</span><span class="o">.</span><span class="n">end</span><span class="p">()</span>
|
||
<span class="n">line_num</span> <span class="o">+=</span> <span class="mi">1</span>
|
||
<span class="k">elif</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'SKIP'</span><span class="p">:</span>
|
||
<span class="k">pass</span>
|
||
<span class="k">elif</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'MISMATCH'</span><span class="p">:</span>
|
||
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="s1">'</span><span class="si">%r</span><span class="s1"> unexpected on line </span><span class="si">%d</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span><span class="n">value</span><span class="p">,</span> <span class="n">line_num</span><span class="p">))</span>
|
||
<span class="k">else</span><span class="p">:</span>
|
||
<span class="k">if</span> <span class="n">kind</span> <span class="o">==</span> <span class="s1">'ID'</span> <span class="ow">and</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">keywords</span><span class="p">:</span>
|
||
<span class="n">kind</span> <span class="o">=</span> <span class="n">value</span>
|
||
<span class="n">column</span> <span class="o">=</span> <span class="n">mo</span><span class="o">.</span><span class="n">start</span><span class="p">()</span> <span class="o">-</span> <span class="n">line_start</span>
|
||
<span class="k">yield</span> <span class="n">Token</span><span class="p">(</span><span class="n">kind</span><span class="p">,</span> <span class="n">value</span><span class="p">,</span> <span class="n">line_num</span><span class="p">,</span> <span class="n">column</span><span class="p">)</span>
|
||
|
||
<span class="n">statements</span> <span class="o">=</span> <span class="s1">'''</span>
|
||
<span class="s1"> IF quantity THEN</span>
|
||
<span class="s1"> total := total + price * quantity;</span>
|
||
<span class="s1"> tax := price * 0.05;</span>
|
||
<span class="s1"> ENDIF;</span>
|
||
<span class="s1">'''</span>
|
||
|
||
<span class="k">for</span> <span class="n">token</span> <span class="ow">in</span> <span class="n">tokenize</span><span class="p">(</span><span class="n">statements</span><span class="p">):</span>
|
||
<span class="nb">print</span><span class="p">(</span><span class="n">token</span><span class="p">)</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-567">令牌生成器产生以下输出:</span></p><pre><code class="language-python"><span></span><span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'IF'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'IF'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'quantity'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">7</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'THEN'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'THEN'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'total'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ASSIGN'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">':='</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">14</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'total'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">17</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'OP'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'+'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">23</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'price'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">25</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'OP'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'*'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">31</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'quantity'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">33</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'END'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">';'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">41</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'tax'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">8</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ASSIGN'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">':='</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ID'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'price'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">15</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'OP'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'*'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">21</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'NUMBER'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'0.05'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">23</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'END'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">';'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">27</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'ENDIF'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">'ENDIF'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
|
||
<span class="n">Token</span><span class="p">(</span><span class="n">typ</span><span class="o">=</span><span class="s1">'END'</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="s1">';'</span><span class="p">,</span> <span class="n">line</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">column</span><span class="o">=</span><span class="mi">9</span><span class="p">)</span>
|
||
</code></pre></div></div></div></div> |