uTools-Manuals/docs/php/levenshtein.html
2019-04-08 23:22:26 +08:00

160 lines
11 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>计算两个字符串之间的编辑距离</title>
</head>
<body class="docs"><div id="layout">
<div id="layout-content"><div id="function.levenshtein" class="refentry">
<div class="refnamediv">
<h1 class="refname">levenshtein</h1>
<p class="verinfo">(PHP 4 &gt;= 4.0.1, PHP 5, PHP 7)</p><p class="refpurpose"><span class="refname">levenshtein</span> &mdash; <span class="dc-title">计算两个字符串之间的编辑距离</span></p>
</div>
<div class="refsect1 description" id="refsect1-function.levenshtein-description">
<h3 class="title">说明</h3>
<div class="methodsynopsis dc-description">
<span class="methodname"><strong>levenshtein</strong></span>
( <span class="methodparam"><span class="type">string</span> <code class="parameter">$str1</code></span>
, <span class="methodparam"><span class="type">string</span> <code class="parameter">$str2</code></span>
) : <span class="type">int</span></div>
<div class="methodsynopsis dc-description">
<span class="methodname"><strong>levenshtein</strong></span>
( <span class="methodparam"><span class="type">string</span> <code class="parameter">$str1</code></span>
, <span class="methodparam"><span class="type">string</span> <code class="parameter">$str2</code></span>
, <span class="methodparam"><span class="type">int</span> <code class="parameter">$cost_ins</code></span>
, <span class="methodparam"><span class="type">int</span> <code class="parameter">$cost_rep</code></span>
, <span class="methodparam"><span class="type">int</span> <code class="parameter">$cost_del</code></span>
) : <span class="type">int</span></div>
<p class="para rdfs-comment">
编辑距离,是指两个字串之间,通过替换、插入、删除等操作将字符串<code class="parameter">str1</code>转换成<code class="parameter">str2</code>所需要操作的最少字符数量。
该算法的复杂度是 <em>O(m*n)</em>,其中 <em>n</em><em>m</em> 分别是<code class="parameter">str1</code><code class="parameter">str2</code>的长度 当和算法复杂度为O(max(n,m)**3)的<span class="function"><a href="similar_text.html" class="function">similar_text()</a></span>相比时,此函数还是相当不错的,尽管仍然很耗时。)。
</p>
<p class="para">
在最简单的形式中,该函数只以两个字符串作为参数,并计算通过插入、替换和删除等操作将<code class="parameter">str1</code>转换成<code class="parameter">str2</code>所需要的操作次数。
</p>
<p class="para">
第二种变体将采用三个额外的参数来定义插入、替换和删除操作的次数。此变体比第一种更加通用和适应,但效率不高。
</p>
</div>
<div class="refsect1 parameters" id="refsect1-function.levenshtein-parameters">
<h3 class="title">参数</h3>
<p class="para">
<dl>
<dt>
<code class="parameter">str1</code></dt>
<dd>
<p class="para">
求编辑距离中的其中一个字符串
</p>
</dd>
<dt>
<code class="parameter">str2</code></dt>
<dd>
<p class="para">
求编辑距离中的另一个字符串
</p>
</dd>
<dt>
<code class="parameter">cost_ins</code></dt>
<dd>
<p class="para">
定义插入次数
</p>
</dd>
<dt>
<code class="parameter">cost_rep</code></dt>
<dd>
<p class="para">
定义替换次数
</p>
</dd>
<dt>
<code class="parameter">cost_del</code></dt>
<dd>
<p class="para">
定义删除次数
</p>
</dd>
</dl>
</p>
</div>
<div class="refsect1 returnvalues" id="refsect1-function.levenshtein-returnvalues">
<h3 class="title">返回值</h3>
<p class="para">
此函数返回两个字符串参数之间的编辑距离如果其中一个字符串参数长度大于限制的255个字符时返回-1。
</p>
</div>
<div class="refsect1 examples" id="refsect1-function.levenshtein-examples">
<h3 class="title">范例</h3>
<p class="para">
<div class="example" id="example-5917">
<p><strong>Example #1 <span class="function"><strong>levenshtein()</strong></span> 例子:</strong></p>
<div class="example-contents">
<div class="phpcode"><pre><span style="color: #000000">
<span style="color: #0000BB">&lt;?php<br /></span><span style="color: #FF8000">//&nbsp;输入拼写错误的单词<br /></span><span style="color: #0000BB">$input&nbsp;</span><span style="color: #007700">=&nbsp;</span><span style="color: #DD0000">'carrrot'</span><span style="color: #007700">;<br /><br /></span><span style="color: #FF8000">//&nbsp;要检查的单词数组<br /></span><span style="color: #0000BB">$words&nbsp;&nbsp;</span><span style="color: #007700">=&nbsp;array(</span><span style="color: #DD0000">'apple'</span><span style="color: #007700">,</span><span style="color: #DD0000">'pineapple'</span><span style="color: #007700">,</span><span style="color: #DD0000">'banana'</span><span style="color: #007700">,</span><span style="color: #DD0000">'orange'</span><span style="color: #007700">,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #DD0000">'radish'</span><span style="color: #007700">,</span><span style="color: #DD0000">'carrot'</span><span style="color: #007700">,</span><span style="color: #DD0000">'pea'</span><span style="color: #007700">,</span><span style="color: #DD0000">'bean'</span><span style="color: #007700">,</span><span style="color: #DD0000">'potato'</span><span style="color: #007700">);<br /><br /></span><span style="color: #FF8000">//&nbsp;目前没有找到最短距离<br /></span><span style="color: #0000BB">$shortest&nbsp;</span><span style="color: #007700">=&nbsp;-</span><span style="color: #0000BB">1</span><span style="color: #007700">;<br /><br /></span><span style="color: #FF8000">//&nbsp;遍历单词来找到最接近的<br /></span><span style="color: #007700">foreach&nbsp;(</span><span style="color: #0000BB">$words&nbsp;</span><span style="color: #007700">as&nbsp;</span><span style="color: #0000BB">$word</span><span style="color: #007700">)&nbsp;{<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #FF8000">//&nbsp;计算输入单词与当前单词的距离<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000BB">$lev&nbsp;</span><span style="color: #007700">=&nbsp;</span><span style="color: #0000BB">levenshtein</span><span style="color: #007700">(</span><span style="color: #0000BB">$input</span><span style="color: #007700">,&nbsp;</span><span style="color: #0000BB">$word</span><span style="color: #007700">);<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #FF8000">//&nbsp;检查完全的匹配<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">if&nbsp;(</span><span style="color: #0000BB">$lev&nbsp;</span><span style="color: #007700">==&nbsp;</span><span style="color: #0000BB">0</span><span style="color: #007700">)&nbsp;{<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #FF8000">//&nbsp;最接近的单词是这个(完全匹配)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000BB">$closest&nbsp;</span><span style="color: #007700">=&nbsp;</span><span style="color: #0000BB">$word</span><span style="color: #007700">;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000BB">$shortest&nbsp;</span><span style="color: #007700">=&nbsp;</span><span style="color: #0000BB">0</span><span style="color: #007700">;<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #FF8000">//&nbsp;退出循环;我们已经找到一个完全的匹配<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">break;<br />&nbsp;&nbsp;&nbsp;&nbsp;}<br /><br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #FF8000">//&nbsp;如果此次距离比上次找到的要短<br />&nbsp;&nbsp;&nbsp;&nbsp;//&nbsp;或者还没找到接近的单词<br />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #007700">if&nbsp;(</span><span style="color: #0000BB">$lev&nbsp;</span><span style="color: #007700">&lt;=&nbsp;</span><span style="color: #0000BB">$shortest&nbsp;</span><span style="color: #007700">||&nbsp;</span><span style="color: #0000BB">$shortest&nbsp;</span><span style="color: #007700">&lt;&nbsp;</span><span style="color: #0000BB">0</span><span style="color: #007700">)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #FF8000">//&nbsp;设置最接近的匹配以及它的最短距离<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000BB">$closest&nbsp;&nbsp;</span><span style="color: #007700">=&nbsp;</span><span style="color: #0000BB">$word</span><span style="color: #007700">;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000BB">$shortest&nbsp;</span><span style="color: #007700">=&nbsp;</span><span style="color: #0000BB">$lev</span><span style="color: #007700">;<br />&nbsp;&nbsp;&nbsp;&nbsp;}<br />}<br /><br />echo&nbsp;</span><span style="color: #DD0000">"Input&nbsp;word:&nbsp;</span><span style="color: #0000BB">$input</span><span style="color: #DD0000">\n"</span><span style="color: #007700">;<br />if&nbsp;(</span><span style="color: #0000BB">$shortest&nbsp;</span><span style="color: #007700">==&nbsp;</span><span style="color: #0000BB">0</span><span style="color: #007700">)&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;echo&nbsp;</span><span style="color: #DD0000">"Exact&nbsp;match&nbsp;found:&nbsp;</span><span style="color: #0000BB">$closest</span><span style="color: #DD0000">\n"</span><span style="color: #007700">;<br />}&nbsp;else&nbsp;{<br />&nbsp;&nbsp;&nbsp;&nbsp;echo&nbsp;</span><span style="color: #DD0000">"Did&nbsp;you&nbsp;mean:&nbsp;</span><span style="color: #0000BB">$closest</span><span style="color: #DD0000">?\n"</span><span style="color: #007700">;<br />}<br /><br /></span><span style="color: #0000BB">?&gt;</span>
</span>
</pre></div>
</div>
<div class="example-contents"><p>以上例程会输出:</p></div>
<div class="example-contents screen">
<div class="cdata"><pre>
Input word: carrrot
Did you mean: carrot?
</pre></div>
</div>
</div>
</p>
</div>
<div class="refsect1 seealso" id="refsect1-function.levenshtein-seealso">
<h3 class="title">参见</h3>
<p class="para">
<ul class="simplelist">
<li class="member"><span class="function"><a href="soundex.html" class="function" rel="rdfs-seeAlso">soundex()</a> - Calculate the soundex key of a string</span></li>
<li class="member"><span class="function"><a href="similar_text.html" class="function" rel="rdfs-seeAlso">similar_text()</a> - 计算两个字符串的相似度</span></li>
<li class="member"><span class="function"><a href="metaphone.html" class="function" rel="rdfs-seeAlso">metaphone()</a> - Calculate the metaphone key of a string</span></li>
</ul>
</p>
</div>
</div></div></div></body></html>