uTools-Manuals/docs/python/unicodedata.html
2019-04-21 11:50:48 +08:00

16 lines
14 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<div class="body" role="main"><div class="section" id="module-unicodedata"><h1><span class="yiyi-st" id="yiyi-10">6.5. <a class="reference internal" href="#module-unicodedata" title="unicodedata: Access the Unicode Database."><code class="xref py py-mod docutils literal"><span class="pre">unicodedata</span></code></a> - Unicode数据库</span></h1><p><span class="yiyi-st" id="yiyi-11">此模块提供对Unicode字符数据库UCD的访问该字符数据库定义所有Unicode字符的字符属性。</span><span class="yiyi-st" id="yiyi-12">此数据库中包含的数据是从<a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd">UCD版本8.0.0</a>编译的。</span></p><p><span class="yiyi-st" id="yiyi-13">该模块使用与Unicode标准附录44<a class="reference external" href="http://www.unicode.org/reports/tr44/tr44-6.html">“Unicode字符数据库”</a>定义的名称和符号相同的名称和符号。</span><span class="yiyi-st" id="yiyi-14">它定义了以下功能:</span></p><dl class="function"><dt id="unicodedata.lookup"><span class="yiyi-st" id="yiyi-15"><code class="descclassname">unicodedata.</code><code class="descname">lookup</code><span class="sig-paren">(</span><em>name</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-16">通过一个名称来查找字符。</span><span class="yiyi-st" id="yiyi-17">如果找到具有给定名称的字符,则返回相应的字符。</span><span class="yiyi-st" id="yiyi-18">如果未找到,则会引发<a class="reference internal" href="exceptions.html#KeyError" title="KeyError"><code class="xref py py-exc docutils literal"><span class="pre">KeyError</span></code></a></span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-19"><span class="versionmodified">在版本3.3中新增:</span>支持名称别名<a class="footnote-reference" href="#id3" id="id1">[1]</a>和命名序列<a class="footnote-reference" href="#id4" id="id2">[2]</a></span></p></div></dd></dl><dl class="function"><dt id="unicodedata.name"><span class="yiyi-st" id="yiyi-20"><code class="descclassname">unicodedata.</code><code class="descname">name</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-21">返回分配给字符<em>chr</em>的名称作为字符串。按字符来查找它的名称。</span><span class="yiyi-st" id="yiyi-22">如果未定义名称,则返回<em>默认</em>,如果未指定,则会引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a></span></p></dd></dl><dl class="function"><dt id="unicodedata.decimal"><span class="yiyi-st" id="yiyi-23"><code class="descclassname">unicodedata.</code><code class="descname">decimal</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-24">将分配给字符<em>chr</em>的十进制值作为整数返回。</span><span class="yiyi-st" id="yiyi-25">如果没有定义这样的值,则返回<em>默认</em>,或者如果没有给出,则引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a></span></p></dd></dl><dl class="function"><dt id="unicodedata.digit"><span class="yiyi-st" id="yiyi-26"><code class="descclassname">unicodedata.</code><code class="descname">digit</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-27">将分配给字符<em>chr</em>的数字值作为整数返回。</span><span class="yiyi-st" id="yiyi-28">如果没有定义这样的值,则返回<em>默认</em>,或者如果没有给出,则引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a></span></p></dd></dl><dl class="function"><dt id="unicodedata.numeric"><span class="yiyi-st" id="yiyi-29"><code class="descclassname">unicodedata.</code><code class="descname">numeric</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-30">返回分配给字符<em>chr</em>的数值为float。</span><span class="yiyi-st" id="yiyi-31">如果没有定义这样的值,则返回<em>默认</em>,或者如果没有给出,则引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a></span></p></dd></dl><dl class="function"><dt id="unicodedata.category"><span class="yiyi-st" id="yiyi-32"><code class="descclassname">unicodedata.</code><code class="descname">category</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-33">将分配给字符<em>chr</em>的一般类别返回为字符串。</span></p></dd></dl><dl class="function"><dt id="unicodedata.bidirectional"><span class="yiyi-st" id="yiyi-34"><code class="descclassname">unicodedata.</code><code class="descname">bidirectional</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-35">以字符串形式返回分配给字符<em>chr</em>的双向类。</span><span class="yiyi-st" id="yiyi-36">如果没有定义这样的值,则返回一个空字符串。</span></p></dd></dl><dl class="function"><dt id="unicodedata.combining"><span class="yiyi-st" id="yiyi-37"><code class="descclassname">unicodedata.</code><code class="descname">combining</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-38">返回分配给字符<em>chr</em>的规范组合类作为整数。</span><span class="yiyi-st" id="yiyi-39">如果未定义组合类,则返回<code class="docutils literal"><span class="pre">0</span></code></span></p></dd></dl><dl class="function"><dt id="unicodedata.east_asian_width"><span class="yiyi-st" id="yiyi-40"><code class="descclassname">unicodedata.</code><code class="descname">east_asian_width</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-41">将分配给字符<em>chr</em>的东亚宽度返回为字符串。</span></p></dd></dl><dl class="function"><dt id="unicodedata.mirrored"><span class="yiyi-st" id="yiyi-42"><code class="descclassname">unicodedata.</code><code class="descname">mirrored</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-43">将分配给字符<em>chr</em>的镜像属性返回为整数。</span><span class="yiyi-st" id="yiyi-44">如果字符在双向文本中被识别为“镜像”字符,则返回<code class="docutils literal"><span class="pre">1</span></code>,否则返回<code class="docutils literal"><span class="pre">0</span></code></span></p></dd></dl><dl class="function"><dt id="unicodedata.decomposition"><span class="yiyi-st" id="yiyi-45"><code class="descclassname">unicodedata.</code><code class="descname">decomposition</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-46">以字符串形式返回分配给字符<em>chr</em>的字符分解映射。</span><span class="yiyi-st" id="yiyi-47">如果未定义此类映射,则返回空字符串。</span></p></dd></dl><dl class="function"><dt id="unicodedata.normalize"><span class="yiyi-st" id="yiyi-48"><code class="descclassname">unicodedata.</code><code class="descname">normalize</code><span class="sig-paren">(</span><em>form</em>, <em>unistr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-49">返回Unicode字符串<em>unistr</em><em>form</em>规范形式。</span><span class="yiyi-st" id="yiyi-50"><em>form</em>的有效值有“NFC”、“NFKC”、“NFD”和“NFKD”。</span></p><p><span class="yiyi-st" id="yiyi-51">Unicode标准基于标准性等价和兼容性等价的定义来定义Unicode字符串的各种规范化形式。</span><span class="yiyi-st" id="yiyi-52">在Unicode中有几个字符可以用多种方式表示。</span><span class="yiyi-st" id="yiyi-53">例如字符U+00C7带有下变音符的大写拉丁字母C也可以表示为序列U+0043大写拉丁字母CU+0327和下变音符</span></p><p><span class="yiyi-st" id="yiyi-54">对于每个字符有两种规范形式规范形式C和规范形式D。规范形式DNFD也称为标准性分解将每个字符转换为其分解形式。</span><span class="yiyi-st" id="yiyi-55">规范形式CNFC首先应用标准性分解然后再次组合可以组合的字符。</span></p><p><span class="yiyi-st" id="yiyi-56">除了这两种形式,还有两种额外的规范形式,基于兼容性等价。</span><span class="yiyi-st" id="yiyi-57">在Unicode中支持某些字符通常与其他字符统一。</span><span class="yiyi-st" id="yiyi-58">例如U+2160罗马数字1与U+0049拉丁大写字母I事实上一模一样。</span><span class="yiyi-st" id="yiyi-59">但是Unicode都支持它们以兼容现有的字符集例如</span><span class="yiyi-st" id="yiyi-60">gb2312</span></p><p><span class="yiyi-st" id="yiyi-61">规范形式KDNFKD将应用兼容性分解</span><span class="yiyi-st" id="yiyi-62">用兼容的等同字符替换所有字符。</span><span class="yiyi-st" id="yiyi-63">规范形式KCNFKC首先应用兼容性分解随后应用标准性组合。</span></p><p><span class="yiyi-st" id="yiyi-64">即使两个unicode字符串被规范化并且对人类读者看起来相同但是如果一个具有组合字符而另一个没有它们也可能不会相等。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-65">此外,模块还会显示以下常量:</span></p><dl class="data"><dt id="unicodedata.unidata_version"><span class="yiyi-st" id="yiyi-66"><code class="descclassname">unicodedata.</code><code class="descname">unidata_version</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-67">此模块中使用的Unicode数据库的版本。</span></p></dd></dl><dl class="data"><dt id="unicodedata.ucd_3_2_0"><span class="yiyi-st" id="yiyi-68"><code class="descclassname">unicodedata.</code><code class="descname">ucd_3_2_0</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-69">这是一个与整个模块具有相同方法的对象但对于需要此特定版本的Unicode数据库例如IDNA的应用程序则使用Unicode数据库版本3.2。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-70">例子:</span></p><pre><code class="language-python"><span></span><span class="gp">&gt;&gt;&gt; </span><span class="kn">import</span> <span class="nn">unicodedata</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="s1">'LEFT CURLY BRACKET'</span><span class="p">)</span>
<span class="go">'{'</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">name</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
<span class="go">'SOLIDUS'</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">decimal</span><span class="p">(</span><span class="s1">'9'</span><span class="p">)</span>
<span class="go">9</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">decimal</span><span class="p">(</span><span class="s1">'a'</span><span class="p">)</span>
<span class="gt">Traceback (most recent call last):</span>
File <span class="nb">"&lt;stdin&gt;"</span>, line <span class="m">1</span>, in <span class="n">?</span>
<span class="gr">ValueError</span>: <span class="n">not a decimal</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">category</span><span class="p">(</span><span class="s1">'A'</span><span class="p">)</span> <span class="c1"># 'L'etter, 'u'ppercase</span>
<span class="go">'Lu'</span>
<span class="gp">&gt;&gt;&gt; </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">bidirectional</span><span class="p">(</span><span class="s1">'</span><span class="se">\u0660</span><span class="s1">'</span><span class="p">)</span> <span class="c1"># 'A'rabic, 'N'umber</span>
<span class="go">'AN'</span>
</code></pre><p class="rubric"><span class="yiyi-st" id="yiyi-71">脚注</span></p><table class="docutils footnote" frame="void" id="id3" rules="none"><tbody valign="top"><tr><td class="label"><span class="yiyi-st" id="yiyi-72"><a class="fn-backref" href="#id1">[1]</a></span></td><td><span class="yiyi-st" id="yiyi-73"><a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt">http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt</a></span></td></tr></tbody></table><table class="docutils footnote" frame="void" id="id4" rules="none"><tbody valign="top"><tr><td class="label"><span class="yiyi-st" id="yiyi-74"><a class="fn-backref" href="#id2">[2]</a></span></td><td><span class="yiyi-st" id="yiyi-75"><a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd/NamedSequences.txt">http://www.unicode.org/Public/8.0.0/ucd/NamedSequences.txt</a></span></td></tr></tbody></table></div></div>