小编典典

提取正则表达式匹配的一部分

all

我想要一个正则表达式来从 HTML 页面中提取标题。目前我有这个:

title = re.search('<title>.*</title>', html, re.IGNORECASE).group()
if title:
    title = title.replace('<title>', '').replace('</title>', '')

是否有正则表达式可以仅提取 的内容,因此我不必删除标签?</p> </div> <br> <span>阅读 66 </span> <br><br> <div class="ui button"> <i class="remove bookmark icon"></i> 收藏 </div> 2022-06-29 </div> <h2> 共1个答案</h2> <div class="ui segment" style="margin-bottom:20px;"> <div class="stackable"> <strong>小编典典</strong> <br/><br/> <p><code>(</code> <code>)</code>在 regexp 和python<br /> 中使用<a href="https://docs.python.org/3.8/library/re.html#re.Match.group"><code>group(1)</code></a>来检索捕获的字符串(如果没有找到结果<a href="https://docs.python.org/3.8/library/re.html#re.search"><code>re.search</code></a>会返回,所以<br /> <em>不要</em> <em>直接</em> 使用):<code>None</code> <em><code>group()</code></em></p> <pre><code>title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE) if title_search: title = title_search.group(1) </code></pre> <div style="font-size:12px"> <span>2022-06-29 </span> </div> </div> </div> </div> </div> </div> </div> <footer class="es-footer"> <div class="copyright"> <div class="container"> Powered by <a href="http://www.codingdict.com/" target="_blank">CodingDict</a> ©2014-2020 <a class="mlm" href="http://www.codingdict.com/" target="_blank">编程字典</a> <a class="mlm" href="http://www.codingdict.com/courses">课程存档</a> <div class="mts"> 课程内容版权均归 <a href="http://www.codingdict.com/"> CodingDict </a> 所有 <a class="mlm" href="https://beian.miit.gov.cn/" target="_blank"> 京ICP备18030172号 </a> </div> </div> </div> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?30b336128641baa43b1404dd15891277"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> <script charset="UTF-8" id="LA_COLLECT" src="//sdk.51.la/js-sdk-pro.min.js"></script> <script>LA.init({id: "JpzbFo2d3IEdIRuU",ck: "JpzbFo2d3IEdIRuU"})</script> </footer> <script type="text/javascript" src="/static/plugins/js/jquery.min.js"></script> <script type="text/javascript" src="/static/assets/js/bootstrap.min.js"></script> <script type="text/javascript" src="/static/plugins/js/ace.js"></script> <script type="text/javascript" src="/static/plugins/js/resizable.min.js"></script> <script type="text/javascript" src="/static/plugins/js/semantic.min.js"></script> <script type="text/javascript" src="/static/plugins/js/emojis.min.js"></script> <script type="text/javascript" src="/static/plugins/js/highlight.min.js"></script> <script type="text/javascript" src="/static/martor/js/martor.min.js"></script> </div> </body> </html>