mirror of
https://github.com/fofolee/uTools-Manuals.git
synced 2025-06-08 15:04:05 +08:00
91 lines
24 KiB
HTML
91 lines
24 KiB
HTML
<div class="body" role="main"><div class="section" id="module-tokenize"><h1><span class="yiyi-st" id="yiyi-10">32.7. <a class="reference internal" href="#module-tokenize" title="tokenize: Lexical scanner for Python source code."><code class="xref py py-mod docutils literal"><span class="pre">tokenize</span></code></a> - Python源代码</span></h1><p><span class="yiyi-st" id="yiyi-11"><strong>源代码:</strong> <a class="reference external" href="https://hg.python.org/cpython/file/3.5/Lib/tokenize.py">Lib / tokenize.py</a></span></p><p><span class="yiyi-st" id="yiyi-12"><a class="reference internal" href="#module-tokenize" title="tokenize: Lexical scanner for Python source code."><code class="xref py py-mod docutils literal"><span class="pre">tokenize</span></code></a>模块为Python源代码提供了一个词法扫描器,在Python中实现。</span><span class="yiyi-st" id="yiyi-13">该模块中的扫描器也返回作为标记的注释,使得它对于实现“漂亮打印机”很有用,包括用于屏幕显示的着色器。</span></p><p><span class="yiyi-st" id="yiyi-14">为了简化令牌流处理,使用通用的<a class="reference internal" href="token.html#token.OP" title="token.OP"><code class="xref py py-data docutils literal"><span class="pre">token.OP</span></code></a>令牌类型返回所有<a class="reference internal" href="../reference/lexical_analysis.html#operators"><span>Operators</span></a>和<a class="reference internal" href="../reference/lexical_analysis.html#delimiters"><span>Delimiters</span></a>令牌。</span><span class="yiyi-st" id="yiyi-15">确切类型可以通过检查<a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize.tokenize()</span></code></a>返回的<a class="reference internal" href="../glossary.html#term-named-tuple"><span class="xref std std-term">named tuple</span></a>上的<code class="docutils literal"><span class="pre">exact_type</span></code>属性来确定。</span></p><div class="section" id="tokenizing-input"><h2><span class="yiyi-st" id="yiyi-16">32.7.1. </span><span class="yiyi-st" id="yiyi-17">Tokenizing Input</span></h2><p><span class="yiyi-st" id="yiyi-18">主要入口点是<a class="reference internal" href="../glossary.html#term-generator"><span class="xref std std-term">generator</span></a>:</span></p><dl class="function"><dt id="tokenize.tokenize"><span class="yiyi-st" id="yiyi-19"> <code class="descclassname">tokenize.</code><code class="descname">tokenize</code><span class="sig-paren">(</span><em>readline</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-20"><a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>生成器需要一个参数<em>readline</em>,它必须是一个可调用对象,它提供与<a class="reference internal" href="io.html#io.IOBase.readline" title="io.IOBase.readline"><code class="xref py py-meth docutils literal"><span class="pre">io.IOBase.readline()</span></code></a></span><span class="yiyi-st" id="yiyi-21">每个函数的调用都应该返回一行输入作为字节。</span></p><p><span class="yiyi-st" id="yiyi-22">The generator produces 5-tuples with these members: the token type; the token string; a 2-tuple <code class="docutils literal"><span class="pre">(srow,</span> <span class="pre">scol)</span></code> of ints specifying the row and column where the token begins in the source; a 2-tuple <code class="docutils literal"><span class="pre">(erow,</span> <span class="pre">ecol)</span></code> of ints specifying the row and column where the token ends in the source; and the line on which the token was found. </span><span class="yiyi-st" id="yiyi-23">传递的行(最后一个元组项)是<em>逻辑</em>行;连续线。</span><span class="yiyi-st" id="yiyi-24">5元组作为<a class="reference internal" href="../glossary.html#term-named-tuple"><span class="xref std std-term">named tuple</span></a>返回,其字段名称为:<code class="docutils literal"><span class="pre">type</span> <span class="pre">string</span> <span class="pre">start</span> <span class="pre"> end <span class="pre">行</span></span></code>。</span></p><p><span class="yiyi-st" id="yiyi-25">返回的<a class="reference internal" href="../glossary.html#term-named-tuple"><span class="xref std std-term">named tuple</span></a>有一个名为<code class="docutils literal"><span class="pre">exact_type</span></code>的附加属性,其中包含<a class="reference internal" href="token.html#token.OP" title="token.OP"><code class="xref py py-data docutils literal"><span class="pre">token.OP</span></code></a>令牌的确切操作符号类型。</span><span class="yiyi-st" id="yiyi-26">对于所有其他令牌类型,<code class="docutils literal"><span class="pre">exact_type</span></code>等于命名的元组<code class="docutils literal"><span class="pre">type</span></code>字段。</span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-27"><span class="versionmodified">在版本3.1中已更改:</span>添加了对命名元组的支持。</span></p></div><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-28"><span class="versionmodified">在版本3.3中已更改:</span>添加对<code class="docutils literal"><span class="pre">exact_type</span></code>的支持。</span></p></div><p><span class="yiyi-st" id="yiyi-29"><a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>根据<span class="target" id="index-0"></span> <a class="pep reference external" href="https://www.python.org/dev/peps/pep-0263"><strong>PEP 263</strong></a>,通过查找UTF-8 BOM或编码Cookie来确定文件的源编码。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-30"><a class="reference internal" href="token.html#module-token" title="token: Constants representing terminal nodes of the parse tree."><code class="xref py py-mod docutils literal"><span class="pre">token</span></code></a>模块中的所有常量也从<a class="reference internal" href="#module-tokenize" title="tokenize: Lexical scanner for Python source code."><code class="xref py py-mod docutils literal"><span class="pre">tokenize</span></code></a>导出,以及三个附加的令牌类型值:</span></p><dl class="data"><dt id="tokenize.COMMENT"><span class="yiyi-st" id="yiyi-31"> <code class="descclassname">tokenize.</code><code class="descname">COMMENT</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-32">用于表示注释的令牌值。</span></p></dd></dl><dl class="data"><dt id="tokenize.NL"><span class="yiyi-st" id="yiyi-33"> <code class="descclassname">tokenize.</code><code class="descname">NL</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-34">用于表示非终止换行符的令牌值。</span><span class="yiyi-st" id="yiyi-35">NEWLINE标记表示Python代码的逻辑行的结束;当逻辑代码行在多个物理线路上继续时,生成NL令牌。</span></p></dd></dl><dl class="data"><dt id="tokenize.ENCODING"><span class="yiyi-st" id="yiyi-36"> <code class="descclassname">tokenize.</code><code class="descname">ENCODING</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-37">表示用于将源字节解码为文本的编码的令牌值。</span><span class="yiyi-st" id="yiyi-38"><a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>返回的第一个令牌将始终是ENCODING令牌。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-39">提供了另一个功能来反转令牌化过程。</span><span class="yiyi-st" id="yiyi-40">这对创建标记脚本,修改令牌流和回写修改的脚本的工具很有用。</span></p><dl class="function"><dt id="tokenize.untokenize"><span class="yiyi-st" id="yiyi-41"> <code class="descclassname">tokenize.</code><code class="descname">untokenize</code><span class="sig-paren">(</span><em>iterable</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-42">将令牌转换回Python源代码。</span><span class="yiyi-st" id="yiyi-43"><em>iterable</em>必须返回具有至少两个元素的序列,即令牌类型和令牌字符串。</span><span class="yiyi-st" id="yiyi-44">任何其他序列元素都将被忽略。</span></p><p><span class="yiyi-st" id="yiyi-45">重构的脚本作为单个字符串返回。</span><span class="yiyi-st" id="yiyi-46">结果保证将令牌化返回以匹配输入,使得转换是无损的并且确保往返行程。</span><span class="yiyi-st" id="yiyi-47">保证仅适用于令牌类型和令牌字符串,因为令牌之间的间隔(列位置)可能改变。</span></p><p><span class="yiyi-st" id="yiyi-48">它返回使用ENCODING令牌编码的字节,ENCODING令牌是<a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>输出的第一个令牌序列。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-49"><a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>需要检测其标记化的源文件的编码。</span><span class="yiyi-st" id="yiyi-50">它使用的功能可用:</span></p><dl class="function"><dt id="tokenize.detect_encoding"><span class="yiyi-st" id="yiyi-51"> <code class="descclassname">tokenize.</code><code class="descname">detect_encoding</code><span class="sig-paren">(</span><em>readline</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-52"><a class="reference internal" href="#tokenize.detect_encoding" title="tokenize.detect_encoding"><code class="xref py py-func docutils literal"><span class="pre">detect_encoding()</span></code></a>函数用于检测应用于解码Python源文件的编码。</span><span class="yiyi-st" id="yiyi-53">它需要一个参数readline,其方式与<a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>生成器相同。</span></p><p><span class="yiyi-st" id="yiyi-54">它将调用readline最多两次,并返回所使用的编码(作为字符串),以及它已读入的任何行(未从字节解码)的列表。</span></p><p><span class="yiyi-st" id="yiyi-55">它根据<span class="target" id="index-1"></span> <a class="pep reference external" href="https://www.python.org/dev/peps/pep-0263"><strong>PEP 263</strong></a>中指定的UTF-8 BOM或编码Cookie的存在检测编码。</span><span class="yiyi-st" id="yiyi-56">如果BOM和cookie都存在,但不同意,则会引发一个SyntaxError。</span><span class="yiyi-st" id="yiyi-57">注意,如果找到BOM,则<code class="docutils literal"><span class="pre">'utf-8-sig'</span></code>将作为编码返回。</span></p><p><span class="yiyi-st" id="yiyi-58">如果未指定编码,则将返回默认值<code class="docutils literal"><span class="pre">'utf-8'</span></code>。</span></p><p><span class="yiyi-st" id="yiyi-59">使用<a class="reference internal" href="#tokenize.open" title="tokenize.open"><code class="xref py py-func docutils literal"><span class="pre">open()</span></code></a>打开Python源文件:它使用<a class="reference internal" href="#tokenize.detect_encoding" title="tokenize.detect_encoding"><code class="xref py py-func docutils literal"><span class="pre">detect_encoding()</span></code></a>检测文件编码。</span></p></dd></dl><dl class="function"><dt id="tokenize.open"><span class="yiyi-st" id="yiyi-60"> <code class="descclassname">tokenize.</code><code class="descname">open</code><span class="sig-paren">(</span><em>filename</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-61">使用由<a class="reference internal" href="#tokenize.detect_encoding" title="tokenize.detect_encoding"><code class="xref py py-func docutils literal"><span class="pre">detect_encoding()</span></code></a>检测到的编码以只读模式打开文件。</span></p><div class="versionadded"><p><span class="yiyi-st" id="yiyi-62"><span class="versionmodified">版本3.2中的新功能。</span></span></p></div></dd></dl><dl class="exception"><dt id="tokenize.TokenError"><span class="yiyi-st" id="yiyi-63"> <em class="property">exception </em><code class="descclassname">tokenize.</code><code class="descname">TokenError</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-64">在可能拆分为多行的docstring或表达式未在文件中的任何位置完成时触发,例如:</span></p><pre><code class="language-python"><span></span><span class="s2">"""Beginning of</span>
|
||
<span class="s2">docstring</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-65">要么:</span></p><pre><code class="language-python"><span></span><span class="p">[</span><span class="mi">1</span><span class="p">,</span>
|
||
<span class="mi">2</span><span class="p">,</span>
|
||
<span class="mi">3</span>
|
||
</code></pre></dd></dl><p><span class="yiyi-st" id="yiyi-66">请注意,未闭合的单引号字符串不会引起错误。</span><span class="yiyi-st" id="yiyi-67">它们被标记为<code class="docutils literal"><span class="pre">ERRORTOKEN</span></code>,随后是其内容的标记化。</span></p></div><div class="section" id="command-line-usage"><h2><span class="yiyi-st" id="yiyi-68">32.7.2. </span><span class="yiyi-st" id="yiyi-69">Command-Line Usage</span></h2><div class="versionadded"><p><span class="yiyi-st" id="yiyi-70"><span class="versionmodified">版本3.3中的新功能。</span></span></p></div><p><span class="yiyi-st" id="yiyi-71"><a class="reference internal" href="#module-tokenize" title="tokenize: Lexical scanner for Python source code."><code class="xref py py-mod docutils literal"><span class="pre">tokenize</span></code></a>模块可以作为脚本从命令行执行。</span><span class="yiyi-st" id="yiyi-72">它很简单:</span></p><div class="highlight-sh"><div class="highlight"><pre><span></span>python -m tokenize <span class="o">[</span>-e<span class="o">]</span> <span class="o">[</span>filename.py<span class="o">]</span>
|
||
</pre></div></div><p><span class="yiyi-st" id="yiyi-73">接受以下选项:</span></p><dl class="cmdoption"><dt id="cmdoption-tokenize-h"><span class="yiyi-st" id="yiyi-74"> <span id="cmdoption-tokenize--help"></span><code class="descname">-h</code><code class="descclassname"></code><code class="descclassname">, </code><code class="descname">--help</code><code class="descclassname"></code></span></dt><dd><p><span class="yiyi-st" id="yiyi-75">显示此帮助消息并退出</span></p></dd></dl><dl class="cmdoption"><dt id="cmdoption-tokenize-e"><span class="yiyi-st" id="yiyi-76"> <span id="cmdoption-tokenize--exact"></span><code class="descname">-e</code><code class="descclassname"></code><code class="descclassname">, </code><code class="descname">--exact</code><code class="descclassname"></code></span></dt><dd><p><span class="yiyi-st" id="yiyi-77">使用确切类型显示令牌名称</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-78">如果指定<code class="file docutils literal"><span class="pre">filename.py</span></code>,它的内容被标记为stdout。</span><span class="yiyi-st" id="yiyi-79">否则,对stdin执行标记化。</span></p></div><div class="section" id="examples"><h2><span class="yiyi-st" id="yiyi-80">32.7.3. </span><span class="yiyi-st" id="yiyi-81">Examples</span></h2><p><span class="yiyi-st" id="yiyi-82">将浮点字面值转换为十进制对象的脚本重写器示例:</span></p><pre><code class="language-python"><span></span><span class="kn">from</span> <span class="nn">tokenize</span> <span class="k">import</span> <span class="n">tokenize</span><span class="p">,</span> <span class="n">untokenize</span><span class="p">,</span> <span class="n">NUMBER</span><span class="p">,</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">NAME</span><span class="p">,</span> <span class="n">OP</span>
|
||
<span class="kn">from</span> <span class="nn">io</span> <span class="k">import</span> <span class="n">BytesIO</span>
|
||
|
||
<span class="k">def</span> <span class="nf">decistmt</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
|
||
<span class="sd">"""Substitute Decimals for floats in a string of statements.</span>
|
||
|
||
<span class="sd"> >>> from decimal import Decimal</span>
|
||
<span class="sd"> >>> s = 'print(+21.3e-5*-.1234/81.7)'</span>
|
||
<span class="sd"> >>> decistmt(s)</span>
|
||
<span class="sd"> "print (+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))"</span>
|
||
|
||
<span class="sd"> The format of the exponent is inherited from the platform C library.</span>
|
||
<span class="sd"> Known cases are "e-007" (Windows) and "e-07" (not Windows). Since</span>
|
||
<span class="sd"> we're only showing 12 digits, and the 13th isn't close to 5, the</span>
|
||
<span class="sd"> rest of the output should be platform-independent.</span>
|
||
|
||
<span class="sd"> >>> exec(s) #doctest: +ELLIPSIS</span>
|
||
<span class="sd"> -3.21716034272e-0...7</span>
|
||
|
||
<span class="sd"> Output from calculations with Decimal should be identical across all</span>
|
||
<span class="sd"> platforms.</span>
|
||
|
||
<span class="sd"> >>> exec(decistmt(s))</span>
|
||
<span class="sd"> -3.217160342717258261933904529E-7</span>
|
||
<span class="sd"> """</span>
|
||
<span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
|
||
<span class="n">g</span> <span class="o">=</span> <span class="n">tokenize</span><span class="p">(</span><span class="n">BytesIO</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span><span class="o">.</span><span class="n">readline</span><span class="p">)</span> <span class="c1"># tokenize the string</span>
|
||
<span class="k">for</span> <span class="n">toknum</span><span class="p">,</span> <span class="n">tokval</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">g</span><span class="p">:</span>
|
||
<span class="k">if</span> <span class="n">toknum</span> <span class="o">==</span> <span class="n">NUMBER</span> <span class="ow">and</span> <span class="s1">'.'</span> <span class="ow">in</span> <span class="n">tokval</span><span class="p">:</span> <span class="c1"># replace NUMBER tokens</span>
|
||
<span class="n">result</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span>
|
||
<span class="p">(</span><span class="n">NAME</span><span class="p">,</span> <span class="s1">'Decimal'</span><span class="p">),</span>
|
||
<span class="p">(</span><span class="n">OP</span><span class="p">,</span> <span class="s1">'('</span><span class="p">),</span>
|
||
<span class="p">(</span><span class="n">STRING</span><span class="p">,</span> <span class="nb">repr</span><span class="p">(</span><span class="n">tokval</span><span class="p">)),</span>
|
||
<span class="p">(</span><span class="n">OP</span><span class="p">,</span> <span class="s1">')'</span><span class="p">)</span>
|
||
<span class="p">])</span>
|
||
<span class="k">else</span><span class="p">:</span>
|
||
<span class="n">result</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">toknum</span><span class="p">,</span> <span class="n">tokval</span><span class="p">))</span>
|
||
<span class="k">return</span> <span class="n">untokenize</span><span class="p">(</span><span class="n">result</span><span class="p">)</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-83">从命令行进行标记化的示例。</span><span class="yiyi-st" id="yiyi-84">剧本:</span></p><pre><code class="language-python"><span></span><span class="k">def</span> <span class="nf">say_hello</span><span class="p">():</span>
|
||
<span class="nb">print</span><span class="p">(</span><span class="s2">"Hello, World!"</span><span class="p">)</span>
|
||
|
||
<span class="n">say_hello</span><span class="p">()</span>
|
||
</code></pre><p><span class="yiyi-st" id="yiyi-85">将被标记为以下输出,其中第一列是发现令牌的行/列坐标的范围,第二列是令牌的名称,最后一列是令牌的值(如果有的话)</span></p><div class="highlight-sh"><div class="highlight"><pre><span></span>$ python -m tokenize hello.py
|
||
0,0-0,0: ENCODING <span class="s1">'utf-8'</span>
|
||
1,0-1,3: NAME <span class="s1">'def'</span>
|
||
1,4-1,13: NAME <span class="s1">'say_hello'</span>
|
||
1,13-1,14: OP <span class="s1">'('</span>
|
||
1,14-1,15: OP <span class="s1">')'</span>
|
||
1,15-1,16: OP <span class="s1">':'</span>
|
||
1,16-1,17: NEWLINE <span class="s1">'\n'</span>
|
||
2,0-2,4: INDENT <span class="s1">' '</span>
|
||
2,4-2,9: NAME <span class="s1">'print'</span>
|
||
2,9-2,10: OP <span class="s1">'('</span>
|
||
2,10-2,25: STRING <span class="s1">'"Hello, World!"'</span>
|
||
2,25-2,26: OP <span class="s1">')'</span>
|
||
2,26-2,27: NEWLINE <span class="s1">'\n'</span>
|
||
3,0-3,1: NL <span class="s1">'\n'</span>
|
||
4,0-4,0: DEDENT <span class="s1">''</span>
|
||
4,0-4,9: NAME <span class="s1">'say_hello'</span>
|
||
4,9-4,10: OP <span class="s1">'('</span>
|
||
4,10-4,11: OP <span class="s1">')'</span>
|
||
4,11-4,12: NEWLINE <span class="s1">'\n'</span>
|
||
5,0-5,0: ENDMARKER <span class="s1">''</span>
|
||
</pre></div></div><p><span class="yiyi-st" id="yiyi-86">可以使用<code class="docutils literal"><span class="pre">-e</span></code>选项显示确切的令牌类型名称:</span></p><div class="highlight-sh"><div class="highlight"><pre><span></span>$ python -m tokenize -e hello.py
|
||
0,0-0,0: ENCODING <span class="s1">'utf-8'</span>
|
||
1,0-1,3: NAME <span class="s1">'def'</span>
|
||
1,4-1,13: NAME <span class="s1">'say_hello'</span>
|
||
1,13-1,14: LPAR <span class="s1">'('</span>
|
||
1,14-1,15: RPAR <span class="s1">')'</span>
|
||
1,15-1,16: COLON <span class="s1">':'</span>
|
||
1,16-1,17: NEWLINE <span class="s1">'\n'</span>
|
||
2,0-2,4: INDENT <span class="s1">' '</span>
|
||
2,4-2,9: NAME <span class="s1">'print'</span>
|
||
2,9-2,10: LPAR <span class="s1">'('</span>
|
||
2,10-2,25: STRING <span class="s1">'"Hello, World!"'</span>
|
||
2,25-2,26: RPAR <span class="s1">')'</span>
|
||
2,26-2,27: NEWLINE <span class="s1">'\n'</span>
|
||
3,0-3,1: NL <span class="s1">'\n'</span>
|
||
4,0-4,0: DEDENT <span class="s1">''</span>
|
||
4,0-4,9: NAME <span class="s1">'say_hello'</span>
|
||
4,9-4,10: LPAR <span class="s1">'('</span>
|
||
4,10-4,11: RPAR <span class="s1">')'</span>
|
||
4,11-4,12: NEWLINE <span class="s1">'\n'</span>
|
||
5,0-5,0: ENDMARKER <span class="s1">''</span>
|
||
</pre></div></div></div></div></div> |