我使用以下代码从特定位置获取PDF数据。我想在该位置显示粗体文本。
Rectangle rect = new Rectangle(0,0,250,250); RenderFilter filter = new RegiontextRenderFilter(rect); fontBasedTextExtractionStrategy strategy = new fontBasedTextExtractionStrategy(); strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter); //Throws Error.
首先,创建一个称为fontBasedTextExtractionStrategy的新方法来代替简单的TextExtractionStrategy帮助文本吗?像下面这样
public class fontBasedTextExtractionStrategy implements TextExtractionStrategy { private String text; @Override public void beginTextBlock() { } @Override public void renderText(TextRenderInfo renderInfo) { text = renderInfo.getText(); System.out.println(renderInfo.getFont().getFontType()); System.out.print(text); } @Override public void endTextBlock() { } @Override public void renderImage(ImageRenderInfo renderInfo) { } @Override public String getResultantText() { return text; } }
但是又如何正确地称呼它呢?
请看一个ParseCustom例子。在此示例中,我们创建了一个自定义RenderFilter(不是TextExtractionStrategy):
ParseCustom
RenderFilter
TextExtractionStrategy
class FontRenderFilter extends RenderFilter { public boolean allowText(TextRenderInfo renderInfo) { String font = renderInfo.getFont().getPostscriptFontName(); return font.endsWith("Bold") || font.endsWith("Oblique"); } }
该文本将过滤所有文本,以便仅Postscript字体名称以粗体或斜体结尾的文本。
这是您使用此过滤器的方式:
public void parse(String filename) throws IOException { PdfReader reader = new PdfReader(filename); Rectangle rect = new Rectangle(36, 750, 559, 806); RenderFilter regionFilter = new RegionTextRenderFilter(rect); FontRenderFilter fontFilter = new FontRenderFilter(); TextExtractionStrategy strategy = new FilteredTextRenderListener( new LocationTextExtractionStrategy(), regionFilter, fontFilter); System.out.println(PdfTextExtractor.getTextFromPage(reader, 1, strategy)); reader.close(); }
如您所见,我们创建了一个FilteredTextRenderListener包含两个过滤器的,一个RegionTextRenderFilter和我们基于字体的自制过滤器。
FilteredTextRenderListener
RegionTextRenderFilter