我有各种HTML字符串可以切成100个字符(已剥离内容的内容,而不是原始内容),而不剥离标签且不破坏HTML。
原始HTML字符串 (288个字符):
$content = "<div>With a <span class='spanClass'>span over here</span> and a <div class='divClass'>nested div over <div class='nestedDivClass'>there</div> </div> and a lot of other nested <strong><em>texts</em> and tags in the air <span>everywhere</span>, it's a HTML taggy kind of day.</strong></div>";
标准修剪: 修剪至100个字符和HTML中断,剥离的内容约40个字符:
$content = substr($content, 0, 100)."..."; /* output: <div>With a <span class='spanClass'>span over here</span> and a <div class='divClass'>nested div ove... */
剥离的HTML: 输出正确的字符数,但显然失去了格式设置:
$content = substr(strip_tags($content)), 0, 100)."..."; /* output: With a span over here and a nested div over there and a lot of other nested texts and tags in the ai... */
局部解决方案: 使用HTML Tidy或Purifier关闭标签可输出干净的HTML,但100个字符的HTML不会显示内容。
$content = substr($content, 0, 100)."..."; $tidy = new tidy; $tidy->parseString($content); $tidy->cleanRepair(); /* output: <div>With a <span class='spanClass'>span over here</span> and a <div class='divClass'>nested div ove</div></div>... */
挑战: 要输出干净的HTML和 n个 字符(不包括HTML元素的字符数):
$content = cutHTML($content, 100); /* output: <div>With a <span class='spanClass'>span over here</span> and a <div class='divClass'>nested div over <div class='nestedDivClass'>there</div> </div> and a lot of other nested <strong><em>texts</em> and tags in the ai</strong></div>...";
并不令人惊讶,但可以。
function html_cut($text, $max_length) { $tags = array(); $result = ""; $is_open = false; $grab_open = false; $is_close = false; $in_double_quotes = false; $in_single_quotes = false; $tag = ""; $i = 0; $stripped = 0; $stripped_text = strip_tags($text); while ($i < strlen($text) && $stripped < strlen($stripped_text) && $stripped < $max_length) { $symbol = $text{$i}; $result .= $symbol; switch ($symbol) { case '<': $is_open = true; $grab_open = true; break; case '"': if ($in_double_quotes) $in_double_quotes = false; else $in_double_quotes = true; break; case "'": if ($in_single_quotes) $in_single_quotes = false; else $in_single_quotes = true; break; case '/': if ($is_open && !$in_double_quotes && !$in_single_quotes) { $is_close = true; $is_open = false; $grab_open = false; } break; case ' ': if ($is_open) $grab_open = false; else $stripped++; break; case '>': if ($is_open) { $is_open = false; $grab_open = false; array_push($tags, $tag); $tag = ""; } else if ($is_close) { $is_close = false; array_pop($tags); $tag = ""; } break; default: if ($grab_open || $is_close) $tag .= $symbol; if (!$is_open && !$is_close) $stripped++; } $i++; } while ($tags) $result .= "</".array_pop($tags).">"; return $result; }
用法示例:
$content = html_cut($content, 100);