mirror of
https://github.com/fofolee/uTools-Manuals.git
synced 2025-06-09 23:44:06 +08:00
16 lines
14 KiB
HTML
16 lines
14 KiB
HTML
<div class="body" role="main"><div class="section" id="module-unicodedata"><h1><span class="yiyi-st" id="yiyi-10">6.5. <a class="reference internal" href="#module-unicodedata" title="unicodedata: Access the Unicode Database."><code class="xref py py-mod docutils literal"><span class="pre">unicodedata</span></code></a> - Unicode数据库</span></h1><p><span class="yiyi-st" id="yiyi-11">此模块提供对Unicode字符数据库(UCD)的访问,该字符数据库定义所有Unicode字符的字符属性。</span><span class="yiyi-st" id="yiyi-12">此数据库中包含的数据是从<a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd">UCD版本8.0.0</a>编译的。</span></p><p><span class="yiyi-st" id="yiyi-13">该模块使用与Unicode标准附录#44,<a class="reference external" href="http://www.unicode.org/reports/tr44/tr44-6.html">“Unicode字符数据库”</a>定义的名称和符号相同的名称和符号。</span><span class="yiyi-st" id="yiyi-14">它定义了以下功能:</span></p><dl class="function"><dt id="unicodedata.lookup"><span class="yiyi-st" id="yiyi-15"><code class="descclassname">unicodedata.</code><code class="descname">lookup</code><span class="sig-paren">(</span><em>name</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-16">通过一个名称来查找字符。</span><span class="yiyi-st" id="yiyi-17">如果找到具有给定名称的字符,则返回相应的字符。</span><span class="yiyi-st" id="yiyi-18">如果未找到,则会引发<a class="reference internal" href="exceptions.html#KeyError" title="KeyError"><code class="xref py py-exc docutils literal"><span class="pre">KeyError</span></code></a>。</span></p><div class="versionchanged"><p><span class="yiyi-st" id="yiyi-19"><span class="versionmodified">在版本3.3中新增:</span>支持名称别名<a class="footnote-reference" href="#id3" id="id1">[1]</a>和命名序列<a class="footnote-reference" href="#id4" id="id2">[2]</a>。</span></p></div></dd></dl><dl class="function"><dt id="unicodedata.name"><span class="yiyi-st" id="yiyi-20"><code class="descclassname">unicodedata.</code><code class="descname">name</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-21">返回分配给字符<em>chr</em>的名称作为字符串。按字符来查找它的名称。</span><span class="yiyi-st" id="yiyi-22">如果未定义名称,则返回<em>默认</em>,如果未指定,则会引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a>。</span></p></dd></dl><dl class="function"><dt id="unicodedata.decimal"><span class="yiyi-st" id="yiyi-23"><code class="descclassname">unicodedata.</code><code class="descname">decimal</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-24">将分配给字符<em>chr</em>的十进制值作为整数返回。</span><span class="yiyi-st" id="yiyi-25">如果没有定义这样的值,则返回<em>默认</em>,或者如果没有给出,则引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a>。</span></p></dd></dl><dl class="function"><dt id="unicodedata.digit"><span class="yiyi-st" id="yiyi-26"><code class="descclassname">unicodedata.</code><code class="descname">digit</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-27">将分配给字符<em>chr</em>的数字值作为整数返回。</span><span class="yiyi-st" id="yiyi-28">如果没有定义这样的值,则返回<em>默认</em>,或者如果没有给出,则引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a>。</span></p></dd></dl><dl class="function"><dt id="unicodedata.numeric"><span class="yiyi-st" id="yiyi-29"><code class="descclassname">unicodedata.</code><code class="descname">numeric</code><span class="sig-paren">(</span><em>chr</em><span class="optional">[</span>, <em>default</em><span class="optional">]</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-30">返回分配给字符<em>chr</em>的数值为float。</span><span class="yiyi-st" id="yiyi-31">如果没有定义这样的值,则返回<em>默认</em>,或者如果没有给出,则引发<a class="reference internal" href="exceptions.html#ValueError" title="ValueError"><code class="xref py py-exc docutils literal"><span class="pre">ValueError</span></code></a>。</span></p></dd></dl><dl class="function"><dt id="unicodedata.category"><span class="yiyi-st" id="yiyi-32"><code class="descclassname">unicodedata.</code><code class="descname">category</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-33">将分配给字符<em>chr</em>的一般类别返回为字符串。</span></p></dd></dl><dl class="function"><dt id="unicodedata.bidirectional"><span class="yiyi-st" id="yiyi-34"><code class="descclassname">unicodedata.</code><code class="descname">bidirectional</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-35">以字符串形式返回分配给字符<em>chr</em>的双向类。</span><span class="yiyi-st" id="yiyi-36">如果没有定义这样的值,则返回一个空字符串。</span></p></dd></dl><dl class="function"><dt id="unicodedata.combining"><span class="yiyi-st" id="yiyi-37"><code class="descclassname">unicodedata.</code><code class="descname">combining</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-38">返回分配给字符<em>chr</em>的规范组合类作为整数。</span><span class="yiyi-st" id="yiyi-39">如果未定义组合类,则返回<code class="docutils literal"><span class="pre">0</span></code>。</span></p></dd></dl><dl class="function"><dt id="unicodedata.east_asian_width"><span class="yiyi-st" id="yiyi-40"><code class="descclassname">unicodedata.</code><code class="descname">east_asian_width</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-41">将分配给字符<em>chr</em>的东亚宽度返回为字符串。</span></p></dd></dl><dl class="function"><dt id="unicodedata.mirrored"><span class="yiyi-st" id="yiyi-42"><code class="descclassname">unicodedata.</code><code class="descname">mirrored</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-43">将分配给字符<em>chr</em>的镜像属性返回为整数。</span><span class="yiyi-st" id="yiyi-44">如果字符在双向文本中被识别为“镜像”字符,则返回<code class="docutils literal"><span class="pre">1</span></code>,否则返回<code class="docutils literal"><span class="pre">0</span></code>。</span></p></dd></dl><dl class="function"><dt id="unicodedata.decomposition"><span class="yiyi-st" id="yiyi-45"><code class="descclassname">unicodedata.</code><code class="descname">decomposition</code><span class="sig-paren">(</span><em>chr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-46">以字符串形式返回分配给字符<em>chr</em>的字符分解映射。</span><span class="yiyi-st" id="yiyi-47">如果未定义此类映射,则返回空字符串。</span></p></dd></dl><dl class="function"><dt id="unicodedata.normalize"><span class="yiyi-st" id="yiyi-48"><code class="descclassname">unicodedata.</code><code class="descname">normalize</code><span class="sig-paren">(</span><em>form</em>, <em>unistr</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-49">返回Unicode字符串<em>unistr</em>的<em>form</em>规范形式。</span><span class="yiyi-st" id="yiyi-50"><em>form</em>的有效值有“NFC”、“NFKC”、“NFD”和“NFKD”。</span></p><p><span class="yiyi-st" id="yiyi-51">Unicode标准基于标准性等价和兼容性等价的定义来定义Unicode字符串的各种规范化形式。</span><span class="yiyi-st" id="yiyi-52">在Unicode中,有几个字符可以用多种方式表示。</span><span class="yiyi-st" id="yiyi-53">例如,字符U+00C7(带有下变音符的大写拉丁字母C)也可以表示为序列U+0043(大写拉丁字母C)U+0327(和下变音符)。</span></p><p><span class="yiyi-st" id="yiyi-54">对于每个字符,有两种规范形式:规范形式C和规范形式D。规范形式D(NFD)也称为标准性分解,将每个字符转换为其分解形式。</span><span class="yiyi-st" id="yiyi-55">规范形式C(NFC)首先应用标准性分解,然后再次组合可以组合的字符。</span></p><p><span class="yiyi-st" id="yiyi-56">除了这两种形式,还有两种额外的规范形式,基于兼容性等价。</span><span class="yiyi-st" id="yiyi-57">在Unicode中,支持某些字符,通常与其他字符统一。</span><span class="yiyi-st" id="yiyi-58">例如,U+2160(罗马数字1)与U+0049(拉丁大写字母I)事实上一模一样。</span><span class="yiyi-st" id="yiyi-59">但是,Unicode都支持它们以兼容现有的字符集(例如</span><span class="yiyi-st" id="yiyi-60">gb2312)。</span></p><p><span class="yiyi-st" id="yiyi-61">规范形式KD(NFKD)将应用兼容性分解,即</span><span class="yiyi-st" id="yiyi-62">用兼容的等同字符替换所有字符。</span><span class="yiyi-st" id="yiyi-63">规范形式KC(NFKC)首先应用兼容性分解,随后应用标准性组合。</span></p><p><span class="yiyi-st" id="yiyi-64">即使两个unicode字符串被规范化并且对人类读者看起来相同,但是如果一个具有组合字符而另一个没有,它们也可能不会相等。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-65">此外,模块还会显示以下常量:</span></p><dl class="data"><dt id="unicodedata.unidata_version"><span class="yiyi-st" id="yiyi-66"><code class="descclassname">unicodedata.</code><code class="descname">unidata_version</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-67">此模块中使用的Unicode数据库的版本。</span></p></dd></dl><dl class="data"><dt id="unicodedata.ucd_3_2_0"><span class="yiyi-st" id="yiyi-68"><code class="descclassname">unicodedata.</code><code class="descname">ucd_3_2_0</code></span></dt><dd><p><span class="yiyi-st" id="yiyi-69">这是一个与整个模块具有相同方法的对象,但对于需要此特定版本的Unicode数据库(例如IDNA)的应用程序,则使用Unicode数据库版本3.2。</span></p></dd></dl><p><span class="yiyi-st" id="yiyi-70">例子:</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">unicodedata</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">lookup</span><span class="p">(</span><span class="s1">'LEFT CURLY BRACKET'</span><span class="p">)</span>
|
||
<span class="go">'{'</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">name</span><span class="p">(</span><span class="s1">'/'</span><span class="p">)</span>
|
||
<span class="go">'SOLIDUS'</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">decimal</span><span class="p">(</span><span class="s1">'9'</span><span class="p">)</span>
|
||
<span class="go">9</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">decimal</span><span class="p">(</span><span class="s1">'a'</span><span class="p">)</span>
|
||
<span class="gt">Traceback (most recent call last):</span>
|
||
File <span class="nb">"<stdin>"</span>, line <span class="m">1</span>, in <span class="n">?</span>
|
||
<span class="gr">ValueError</span>: <span class="n">not a decimal</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">category</span><span class="p">(</span><span class="s1">'A'</span><span class="p">)</span> <span class="c1"># 'L'etter, 'u'ppercase</span>
|
||
<span class="go">'Lu'</span>
|
||
<span class="gp">>>> </span><span class="n">unicodedata</span><span class="o">.</span><span class="n">bidirectional</span><span class="p">(</span><span class="s1">'</span><span class="se">\u0660</span><span class="s1">'</span><span class="p">)</span> <span class="c1"># 'A'rabic, 'N'umber</span>
|
||
<span class="go">'AN'</span>
|
||
</code></pre><p class="rubric"><span class="yiyi-st" id="yiyi-71">脚注</span></p><table class="docutils footnote" frame="void" id="id3" rules="none"><tbody valign="top"><tr><td class="label"><span class="yiyi-st" id="yiyi-72"><a class="fn-backref" href="#id1">[1]</a></span></td><td><span class="yiyi-st" id="yiyi-73"><a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt">http://www.unicode.org/Public/8.0.0/ucd/NameAliases.txt</a></span></td></tr></tbody></table><table class="docutils footnote" frame="void" id="id4" rules="none"><tbody valign="top"><tr><td class="label"><span class="yiyi-st" id="yiyi-74"><a class="fn-backref" href="#id2">[2]</a></span></td><td><span class="yiyi-st" id="yiyi-75"><a class="reference external" href="http://www.unicode.org/Public/8.0.0/ucd/NamedSequences.txt">http://www.unicode.org/Public/8.0.0/ucd/NamedSequences.txt</a></span></td></tr></tbody></table></div></div> |