mirror of
https://github.com/fofolee/uTools-Manuals.git
synced 2025-06-08 23:14:06 +08:00
9 lines
6.0 KiB
HTML
9 lines
6.0 KiB
HTML
<div class="body" role="main"><div class="section" id="module-urllib.robotparser"><h1><span class="yiyi-st" id="yiyi-10">21.10. <a class="reference internal" href="#module-urllib.robotparser" title="urllib.robotparser: Load a robots.txt file and answer questions about fetchability of other URLs."><code class="xref py py-mod docutils literal"><span class="pre">urllib.robotparser</span></code></a> - 用于robots.txt的解析器</span></h1><p><span class="yiyi-st" id="yiyi-11"><strong>源代码:</strong> <a class="reference external" href="https://hg.python.org/cpython/file/3.5/Lib/urllib/robotparser.py">Lib / urllib / robotparser.py</a></span></p><p><span class="yiyi-st" id="yiyi-12">此模块提供了一个类<a class="reference internal" href="#urllib.robotparser.RobotFileParser" title="urllib.robotparser.RobotFileParser"><code class="xref py py-class docutils literal"><span class="pre">RobotFileParser</span></code></a>,用于回答有关特定用户代理是否可以在发布了<code class="file docutils literal"><span class="pre">robots.txt</span></code>文件的网站上提取网址的问题。</span><span class="yiyi-st" id="yiyi-13">有关<code class="file docutils literal"><span class="pre">robots.txt</span></code>文件结构的详情,请参阅<a class="reference external" href="http://www.robotstxt.org/orig.html">http://www.robotstxt.org/orig.html</a>。</span></p><dl class="class"><dt id="urllib.robotparser.RobotFileParser"><span class="yiyi-st" id="yiyi-14"> <em class="property">class </em><code class="descclassname">urllib.robotparser.</code><code class="descname">RobotFileParser</code><span class="sig-paren">(</span><em>url=''</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-15">此类提供了读取,解析和回答有关<em>网址</em>的<code class="file docutils literal"><span class="pre">robots.txt</span></code>文件的问题的方法。</span></p><dl class="method"><dt id="urllib.robotparser.RobotFileParser.set_url"><span class="yiyi-st" id="yiyi-16"> <code class="descname">set_url</code><span class="sig-paren">(</span><em>url</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-17">设置指向<code class="file docutils literal"><span class="pre">robots.txt</span></code>文件的网址。</span></p></dd></dl><dl class="method"><dt id="urllib.robotparser.RobotFileParser.read"><span class="yiyi-st" id="yiyi-18"> <code class="descname">read</code><span class="sig-paren">(</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-19">读取<code class="file docutils literal"><span class="pre">robots.txt</span></code>网址,并将其提供给解析器。</span></p></dd></dl><dl class="method"><dt id="urllib.robotparser.RobotFileParser.parse"><span class="yiyi-st" id="yiyi-20"> <code class="descname">parse</code><span class="sig-paren">(</span><em>lines</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-21">解析线参数。</span></p></dd></dl><dl class="method"><dt id="urllib.robotparser.RobotFileParser.can_fetch"><span class="yiyi-st" id="yiyi-22"> <code class="descname">can_fetch</code><span class="sig-paren">(</span><em>useragent</em>, <em>url</em><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-23">如果允许<em>useragent</em>根据解析的<code class="file docutils literal"><span class="pre">robots.txt</span></code>中包含的规则提取<em>url</em>,则返回<code class="docutils literal"><span class="pre">True</span></code>文件。</span></p></dd></dl><dl class="method"><dt id="urllib.robotparser.RobotFileParser.mtime"><span class="yiyi-st" id="yiyi-24"> <code class="descname">mtime</code><span class="sig-paren">(</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-25">返回上次抓取<code class="docutils literal"><span class="pre">robots.txt</span></code>文件的时间。</span><span class="yiyi-st" id="yiyi-26">这对需要定期检查新的<code class="docutils literal"><span class="pre">robots.txt</span></code>文件的长时间运行的网络蜘蛛非常有用。</span></p></dd></dl><dl class="method"><dt id="urllib.robotparser.RobotFileParser.modified"><span class="yiyi-st" id="yiyi-27"> <code class="descname">modified</code><span class="sig-paren">(</span><span class="sig-paren">)</span></span></dt><dd><p><span class="yiyi-st" id="yiyi-28">将上次抓取的<code class="docutils literal"><span class="pre">robots.txt</span></code>文件的时间设置为当前时间。</span></p></dd></dl></dd></dl><p><span class="yiyi-st" id="yiyi-29">以下示例演示了RobotFileParser类的基本使用。</span></p><pre><code class="language-python"><span></span><span class="gp">>>> </span><span class="kn">import</span> <span class="nn">urllib.robotparser</span>
|
|
<span class="gp">>>> </span><span class="n">rp</span> <span class="o">=</span> <span class="n">urllib</span><span class="o">.</span><span class="n">robotparser</span><span class="o">.</span><span class="n">RobotFileParser</span><span class="p">()</span>
|
|
<span class="gp">>>> </span><span class="n">rp</span><span class="o">.</span><span class="n">set_url</span><span class="p">(</span><span class="s2">"http://www.musi-cal.com/robots.txt"</span><span class="p">)</span>
|
|
<span class="gp">>>> </span><span class="n">rp</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
|
|
<span class="gp">>>> </span><span class="n">rp</span><span class="o">.</span><span class="n">can_fetch</span><span class="p">(</span><span class="s2">"*"</span><span class="p">,</span> <span class="s2">"http://www.musi-cal.com/cgi-bin/search?city=San+Francisco"</span><span class="p">)</span>
|
|
<span class="go">False</span>
|
|
<span class="gp">>>> </span><span class="n">rp</span><span class="o">.</span><span class="n">can_fetch</span><span class="p">(</span><span class="s2">"*"</span><span class="p">,</span> <span class="s2">"http://www.musi-cal.com/"</span><span class="p">)</span>
|
|
<span class="go">True</span>
|
|
</code></pre></div></div> |