uTools-Manuals/docs/python/tokenize.html
2019-04-21 11:50:48 +08:00

91 lines
24 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<div class="body" role="main"><div class="section" id="module-tokenize"><h1><span class="yiyi-st" id="yiyi-10">32.7. <a class="reference internal" href="#module-tokenize" title="tokenize: Lexical scanner for Python source code."><code class="xref py py-mod docutils literal"><span class="pre">tokenize</span></code></a> - Python源代码</span></h1><p><span class="yiyi-st" id="yiyi-11"><strong>源代码:</strong> <a class="reference external" href="https://hg.python.org/cpython/file/3.5/Lib/tokenize.py">Lib / tokenize.py</a></span></p><p><span class="yiyi-st" id="yiyi-12"><a class="reference internal" href="#module-tokenize" title="tokenize: Lexical scanner for Python source code."><code class="xref py py-mod docutils literal"><span class="pre">tokenize</span></code></a>模块为Python源代码提供了一个词法扫描器在Python中实现。</span><span class="yiyi-st" id="yiyi-13">该模块中的扫描器也返回作为标记的注释,使得它对于实现“漂亮打印机”很有用,包括用于屏幕显示的着色器。</span></p><p><span class="yiyi-st" id="yiyi-14">为了简化令牌流处理,使用通用的<a class="reference internal" href="token.html#token.OP" title="token.OP"><code class="xref py py-data docutils literal"><span class="pre">token.OP</span></code></a>令牌类型返回所有​​<a class="reference internal" href="../reference/lexical_analysis.html#operators"><span>Operators</span></a><a class="reference internal" href="../reference/lexical_analysis.html#delimiters"><span>Delimiters</span></a>令牌。</span><span class="yiyi-st" id="yiyi-15">确切类型可以通过检查<a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize.tokenize()</span></code></a>返回的<a class="reference internal" href="../glossary.html#term-named-tuple"><span class="xref std std-term">named tuple</span></a>上的<code class="docutils literal"><span class="pre">exact_type</span></code>属性来确定。</span></p><div class="section" id="tokenizing-input"><h2><span class="yiyi-st" id="yiyi-16">32.7.1. </span><span class="yiyi-st" id="yiyi-17">Tokenizing Input</span></h2><p><span class="yiyi-st" id="yiyi-18">主要入口点是<a class="reference internal" href="../glossary.html#term-generator"><span class="xref std std-term">generator</span></a></span></p><dl class="function"><dt id="tokenize.tokenize"><span class="yiyi-st" id="yiyi-19"> <code class="descclassname">tokenize.</code><code class="descname">tokenize</code><span class="sig-paren">(</span><em>readline</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-20"><a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>生成器需要一个参数<em>readline</em>,它必须是一个可调用对象,它提供与<a class="reference internal" href="io.html#io.IOBase.readline" title="io.IOBase.readline"><code class="xref py py-meth docutils literal"><span class="pre">io.IOBase.readline()</span></code></a></span><span class="yiyi-st" id="yiyi-21">每个函数的调用都应该返回一行输入作为字节。</span></p><p><span class="yiyi-st" id="yiyi-22">The generator produces 5-tuples with these members: the token type; the token string; a 2-tuple <code class="docutils literal"><span class="pre">(srow,</span> <span class="pre">scol)</span></code> of ints specifying the row and column where the token begins in the source; a 2-tuple <code class="docutils literal"><span class="pre">(erow,</span> <span class="pre">ecol)</span></code> of ints specifying the row and column where the token ends in the source; and the line on which the token was found. </span><span class="yiyi-st" id="yiyi-23">传递的行(最后一个元组项)是<em>逻辑</em>行;连续线。</span><span class="yiyi-st" id="yiyi-24">5元组作为<a class="reference internal" href="../glossary.html#term-named-tuple"><span class="xref std std-term">named tuple</span></a>返回,其字段名称为:<code class="docutils literal"><span class="pre">type</span> <span class="pre">string</span> <span class="pre">start</span> <span class="pre"> end <span class="pre"></span></span></code></span></p><p><span class="yiyi-st" id="yiyi-25">返回的<a class="reference internal" href="../glossary.html#term-named-tuple"><span class="xref std std-term">named tuple</span></a>有一个名为<code class="docutils literal"><span class="pre">exact_type</span></code>的附加属性,其中包含<a class="reference internal" href="token.html#token.OP" title="token.OP"><code class="xref py py-data docutils literal"><span class="pre">token.OP</span></code></a>令牌的确切操作符号类型。</span><span class="yiyi-st" id="yiyi-26">对于所有其他令牌类型,<code class="docutils literal"><span class="pre">exact_type</span></code>等于命名的元组<code class="docutils literal"><span class="pre">type</span></code>字段。</span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-27"><span class="versionmodified">在版本3.1中已更改:</span>添加了对命名元组的支持。</span></p></div><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-28"><span class="versionmodified">在版本3.3中已更改:</span>添加对<code class="docutils literal"><span class="pre">exact_type</span></code>的支持。</span></p></div><p><span class="yiyi-st" id="yiyi-29"><a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>根据<span class="target" id="index-0"></span> <a class="pep reference external" href="https://www.python.org/dev/peps/pep-0263"><strong>PEP 263</strong></a>通过查找UTF-8 BOM或编码Cookie来确定文件的源编码。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-30"><a class="reference internal" href="token.html#module-token" title="token: Constants representing terminal nodes of the parse tree."><code class="xref py py-mod docutils literal"><span class="pre">token</span></code></a>模块中的所有常量也从<a class="reference internal" href="#module-tokenize" title="tokenize: Lexical scanner for Python source code."><code class="xref py py-mod docutils literal"><span class="pre">tokenize</span></code></a>导出,以及三个附加的令牌类型值:</span></p><dl class="data"><dt id="tokenize.COMMENT"><span class="yiyi-st" id="yiyi-31"> <code class="descclassname">tokenize.</code><code class="descname">COMMENT</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-32">用于表示注释的令牌值。</span></p></dd></dl><dl class="data"><dt id="tokenize.NL"><span class="yiyi-st" id="yiyi-33"> <code class="descclassname">tokenize.</code><code class="descname">NL</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-34">用于表示非终止换行符的令牌值。</span><span class="yiyi-st" id="yiyi-35">NEWLINE标记表示Python代码的逻辑行的结束当逻辑代码行在多个物理线路上继续时生成NL令牌。</span></p></dd></dl><dl class="data"><dt id="tokenize.ENCODING"><span class="yiyi-st" id="yiyi-36"> <code class="descclassname">tokenize.</code><code class="descname">ENCODING</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-37">表示用于将源字节解码为文本的编码的令牌值。</span><span class="yiyi-st" id="yiyi-38"><a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>返回的第一个令牌将始终是ENCODING令牌。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-39">提供了另一个功能来反转令牌化过程。</span><span class="yiyi-st" id="yiyi-40">这对创建标记脚本,修改令牌流和回写修改的脚本的工具很有用。</span></p><dl class="function"><dt id="tokenize.untokenize"><span class="yiyi-st" id="yiyi-41"> <code class="descclassname">tokenize.</code><code class="descname">untokenize</code><span class="sig-paren">(</span><em>iterable</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-42">将令牌转换回Python源代码。</span><span class="yiyi-st" id="yiyi-43"><em>iterable</em>必须返回具有至少两个元素的序列,即令牌类型和令牌字符串。</span><span class="yiyi-st" id="yiyi-44">任何其他序列元素都将被忽略。</span></p><p><span class="yiyi-st" id="yiyi-45">重构的脚本作为单个字符串返回。</span><span class="yiyi-st" id="yiyi-46">结果保证将令牌化返回以匹配输入,使得转换是无损的并且确保往返行程。</span><span class="yiyi-st" id="yiyi-47">保证仅适用于令牌类型和令牌字符串,因为令牌之间的间隔(列位置)可能改变。</span></p><p><span class="yiyi-st" id="yiyi-48">它返回使用ENCODING令牌编码的字节ENCODING令牌是<a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>输出的第一个令牌序列。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-49"><a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>需要检测其标记化的源文件的编码。</span><span class="yiyi-st" id="yiyi-50">它使用的功能可用:</span></p><dl class="function"><dt id="tokenize.detect_encoding"><span class="yiyi-st" id="yiyi-51"> <code class="descclassname">tokenize.</code><code class="descname">detect_encoding</code><span class="sig-paren">(</span><em>readline</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-52"><a class="reference internal" href="#tokenize.detect_encoding" title="tokenize.detect_encoding"><code class="xref py py-func docutils literal"><span class="pre">detect_encoding()</span></code></a>函数用于检测应用于解码Python源文件的编码。</span><span class="yiyi-st" id="yiyi-53">它需要一个参数readline其方式与<a class="reference internal" href="#tokenize.tokenize" title="tokenize.tokenize"><code class="xref py py-func docutils literal"><span class="pre">tokenize()</span></code></a>生成器相同。</span></p><p><span class="yiyi-st" id="yiyi-54">它将调用readline最多两次并返回所使用的编码作为字符串以及它已读入的任何行未从字节解码的列表。</span></p><p><span class="yiyi-st" id="yiyi-55">它根据<span class="target" id="index-1"></span> <a class="pep reference external" href="https://www.python.org/dev/peps/pep-0263"><strong>PEP 263</strong></a>中指定的UTF-8 BOM或编码Cookie的存在检测编码。</span><span class="yiyi-st" id="yiyi-56">如果BOM和cookie都存在但不同意则会引发一个SyntaxError。</span><span class="yiyi-st" id="yiyi-57">注意如果找到BOM<code class="docutils literal"><span class="pre">'utf-8-sig'</span></code>将作为编码返回。</span></p><p><span class="yiyi-st" id="yiyi-58">如果未指定编码,则将返回默认值<code class="docutils literal"><span class="pre">'utf-8'</span></code></span></p><p><span class="yiyi-st" id="yiyi-59">使用<a class="reference internal" href="#tokenize.open" title="tokenize.open"><code class="xref py py-func docutils literal"><span class="pre">open()</span></code></a>打开Python源文件它使用<a class="reference internal" href="#tokenize.detect_encoding" title="tokenize.detect_encoding"><code class="xref py py-func docutils literal"><span class="pre">detect_encoding()</span></code></a>检测文件编码。</span></p></dd></dl><dl class="function"><dt id="tokenize.open"><span class="yiyi-st" id="yiyi-60"> <code class="descclassname">tokenize.</code><code class="descname">open</code><span class="sig-paren">(</span><em>filename</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-61">使用由<a class="reference internal" href="#tokenize.detect_encoding" title="tokenize.detect_encoding"><code class="xref py py-func docutils literal"><span class="pre">detect_encoding()</span></code></a>检测到的编码以只读模式打开文件。</span></p><div class="versionadded"><p><span class="yiyi-st" id="yiyi-62"><span class="versionmodified">版本3.2中的新功能。</span></span></p></div></dd></dl><dl class="exception"><dt id="tokenize.TokenError"><span class="yiyi-st" id="yiyi-63"> <em class="property">exception </em><code class="descclassname">tokenize.</code><code class="descname">TokenError</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-64">在可能拆分为多行的docstring或表达式未在文件中的任何位置完成时触发例如</span></p><pre><code class="language-python"><span></span><span class="s2">"""Beginning of</span>
<span class="s2">docstring</span>
</code></pre><p><span class="yiyi-st" id="yiyi-65">要么:</span></p><pre><code class="language-python"><span></span><span class="p">[</span><span class="mi">1</span><span class="p">,</span>
<span class="mi">2</span><span class="p">,</span>
<span class="mi">3</span>
</code></pre></dd></dl><p><span class="yiyi-st" id="yiyi-66">请注意,未闭合的单引号字符串不会引起错误。</span><span class="yiyi-st" id="yiyi-67">它们被标记为<code class="docutils literal"><span class="pre">ERRORTOKEN</span></code>,随后是其内容的标记化。</span></p></div><div class="section" id="command-line-usage"><h2><span class="yiyi-st" id="yiyi-68">32.7.2. </span><span class="yiyi-st" id="yiyi-69">Command-Line Usage</span></h2><div class="versionadded"><p><span class="yiyi-st" id="yiyi-70"><span class="versionmodified">版本3.3中的新功能。</span></span></p></div><p><span class="yiyi-st" id="yiyi-71"><a class="reference internal" href="#module-tokenize" title="tokenize: Lexical scanner for Python source code."><code class="xref py py-mod docutils literal"><span class="pre">tokenize</span></code></a>模块可以作为脚本从命令行执行。</span><span class="yiyi-st" id="yiyi-72">它很简单:</span></p><div class="highlight-sh"><div class="highlight"><pre><span></span>python -m tokenize <span class="o">[</span>-e<span class="o">]</span> <span class="o">[</span>filename.py<span class="o">]</span>
</pre></div></div><p><span class="yiyi-st" id="yiyi-73">接受以下选项:</span></p><dl class="cmdoption"><dt id="cmdoption-tokenize-h"><span class="yiyi-st" id="yiyi-74"> <span id="cmdoption-tokenize--help"></span><code class="descname">-h</code><code class="descclassname"></code><code class="descclassname">, </code><code class="descname">--help</code><code class="descclassname"></code></span></dt><dd><p><span class="yiyi-st" id="yiyi-75">显示此帮助消息并退出</span></p></dd></dl><dl class="cmdoption"><dt id="cmdoption-tokenize-e"><span class="yiyi-st" id="yiyi-76"> <span id="cmdoption-tokenize--exact"></span><code class="descname">-e</code><code class="descclassname"></code><code class="descclassname">, </code><code class="descname">--exact</code><code class="descclassname"></code></span></dt><dd><p><span class="yiyi-st" id="yiyi-77">使用确切类型显示令牌名称</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-78">如果指定<code class="file docutils literal"><span class="pre">filename.py</span></code>它的内容被标记为stdout。</span><span class="yiyi-st" id="yiyi-79">否则对stdin执行标记化。</span></p></div><div class="section" id="examples"><h2><span class="yiyi-st" id="yiyi-80">32.7.3. </span><span class="yiyi-st" id="yiyi-81">Examples</span></h2><p><span class="yiyi-st" id="yiyi-82">将浮点字面值转换为十进制对象的脚本重写器示例:</span></p><pre><code class="language-python"><span></span><span class="kn">from</span> <span class="nn">tokenize</span> <span class="k">import</span> <span class="n">tokenize</span><span class="p">,</span> <span class="n">untokenize</span><span class="p">,</span> <span class="n">NUMBER</span><span class="p">,</span> <span class="n">STRING</span><span class="p">,</span> <span class="n">NAME</span><span class="p">,</span> <span class="n">OP</span>
<span class="kn">from</span> <span class="nn">io</span> <span class="k">import</span> <span class="n">BytesIO</span>
<span class="k">def</span> <span class="nf">decistmt</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="sd">"""Substitute Decimals for floats in a string of statements.</span>
<span class="sd"> &gt;&gt;&gt; from decimal import Decimal</span>
<span class="sd"> &gt;&gt;&gt; s = 'print(+21.3e-5*-.1234/81.7)'</span>
<span class="sd"> &gt;&gt;&gt; decistmt(s)</span>
<span class="sd"> "print (+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))"</span>
<span class="sd"> The format of the exponent is inherited from the platform C library.</span>
<span class="sd"> Known cases are "e-007" (Windows) and "e-07" (not Windows). Since</span>
<span class="sd"> we're only showing 12 digits, and the 13th isn't close to 5, the</span>
<span class="sd"> rest of the output should be platform-independent.</span>
<span class="sd"> &gt;&gt;&gt; exec(s) #doctest: +ELLIPSIS</span>
<span class="sd"> -3.21716034272e-0...7</span>
<span class="sd"> Output from calculations with Decimal should be identical across all</span>
<span class="sd"> platforms.</span>
<span class="sd"> &gt;&gt;&gt; exec(decistmt(s))</span>
<span class="sd"> -3.217160342717258261933904529E-7</span>
<span class="sd"> """</span>
<span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">g</span> <span class="o">=</span> <span class="n">tokenize</span><span class="p">(</span><span class="n">BytesIO</span><span class="p">(</span><span class="n">s</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">))</span><span class="o">.</span><span class="n">readline</span><span class="p">)</span> <span class="c1"># tokenize the string</span>
<span class="k">for</span> <span class="n">toknum</span><span class="p">,</span> <span class="n">tokval</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">g</span><span class="p">:</span>
<span class="k">if</span> <span class="n">toknum</span> <span class="o">==</span> <span class="n">NUMBER</span> <span class="ow">and</span> <span class="s1">'.'</span> <span class="ow">in</span> <span class="n">tokval</span><span class="p">:</span> <span class="c1"># replace NUMBER tokens</span>
<span class="n">result</span><span class="o">.</span><span class="n">extend</span><span class="p">([</span>
<span class="p">(</span><span class="n">NAME</span><span class="p">,</span> <span class="s1">'Decimal'</span><span class="p">),</span>
<span class="p">(</span><span class="n">OP</span><span class="p">,</span> <span class="s1">'('</span><span class="p">),</span>
<span class="p">(</span><span class="n">STRING</span><span class="p">,</span> <span class="nb">repr</span><span class="p">(</span><span class="n">tokval</span><span class="p">)),</span>
<span class="p">(</span><span class="n">OP</span><span class="p">,</span> <span class="s1">')'</span><span class="p">)</span>
<span class="p">])</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">result</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">toknum</span><span class="p">,</span> <span class="n">tokval</span><span class="p">))</span>
<span class="k">return</span> <span class="n">untokenize</span><span class="p">(</span><span class="n">result</span><span class="p">)</span><span class="o">.</span><span class="n">decode</span><span class="p">(</span><span class="s1">'utf-8'</span><span class="p">)</span>
</code></pre><p><span class="yiyi-st" id="yiyi-83">从命令行进行标记化的示例。</span><span class="yiyi-st" id="yiyi-84">剧本:</span></p><pre><code class="language-python"><span></span><span class="k">def</span> <span class="nf">say_hello</span><span class="p">():</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"Hello, World!"</span><span class="p">)</span>
<span class="n">say_hello</span><span class="p">()</span>
</code></pre><p><span class="yiyi-st" id="yiyi-85">将被标记为以下输出,其中第一列是发现令牌的行/列坐标的范围,第二列是令牌的名称,最后一列是令牌的值(如果有的话)</span></p><div class="highlight-sh"><div class="highlight"><pre><span></span>$ python -m tokenize hello.py
0,0-0,0: ENCODING <span class="s1">'utf-8'</span>
1,0-1,3: NAME <span class="s1">'def'</span>
1,4-1,13: NAME <span class="s1">'say_hello'</span>
1,13-1,14: OP <span class="s1">'('</span>
1,14-1,15: OP <span class="s1">')'</span>
1,15-1,16: OP <span class="s1">':'</span>
1,16-1,17: NEWLINE <span class="s1">'\n'</span>
2,0-2,4: INDENT <span class="s1">' '</span>
2,4-2,9: NAME <span class="s1">'print'</span>
2,9-2,10: OP <span class="s1">'('</span>
2,10-2,25: STRING <span class="s1">'"Hello, World!"'</span>
2,25-2,26: OP <span class="s1">')'</span>
2,26-2,27: NEWLINE <span class="s1">'\n'</span>
3,0-3,1: NL <span class="s1">'\n'</span>
4,0-4,0: DEDENT <span class="s1">''</span>
4,0-4,9: NAME <span class="s1">'say_hello'</span>
4,9-4,10: OP <span class="s1">'('</span>
4,10-4,11: OP <span class="s1">')'</span>
4,11-4,12: NEWLINE <span class="s1">'\n'</span>
5,0-5,0: ENDMARKER <span class="s1">''</span>
</pre></div></div><p><span class="yiyi-st" id="yiyi-86">可以使用<code class="docutils literal"><span class="pre">-e</span></code>选项显示确切的令牌类型名称:</span></p><div class="highlight-sh"><div class="highlight"><pre><span></span>$ python -m tokenize -e hello.py
0,0-0,0: ENCODING <span class="s1">'utf-8'</span>
1,0-1,3: NAME <span class="s1">'def'</span>
1,4-1,13: NAME <span class="s1">'say_hello'</span>
1,13-1,14: LPAR <span class="s1">'('</span>
1,14-1,15: RPAR <span class="s1">')'</span>
1,15-1,16: COLON <span class="s1">':'</span>
1,16-1,17: NEWLINE <span class="s1">'\n'</span>
2,0-2,4: INDENT <span class="s1">' '</span>
2,4-2,9: NAME <span class="s1">'print'</span>
2,9-2,10: LPAR <span class="s1">'('</span>
2,10-2,25: STRING <span class="s1">'"Hello, World!"'</span>
2,25-2,26: RPAR <span class="s1">')'</span>
2,26-2,27: NEWLINE <span class="s1">'\n'</span>
3,0-3,1: NL <span class="s1">'\n'</span>
4,0-4,0: DEDENT <span class="s1">''</span>
4,0-4,9: NAME <span class="s1">'say_hello'</span>
4,9-4,10: LPAR <span class="s1">'('</span>
4,10-4,11: RPAR <span class="s1">')'</span>
4,11-4,12: NEWLINE <span class="s1">'\n'</span>
5,0-5,0: ENDMARKER <span class="s1">''</span>
</pre></div></div></div></div></div>