Phantomjs实现后端将URL转换为图片

PhantomJS简介
了解rasterize.js
使用方法

今天，给大家分享一个Java后端利用Phantomjs将URL访问页转换为图片功能，同学们使用的时候，可以参考下！

PhantomJS简介

首先，什么是PhantomJS？

根据官网介绍：

PhantomJS is a command-line tool. -- 其实就是一个命令行工具。

PhantomJS的下载地址：

Windows :phantomjs-2.1.1-windows.zip

Linux :phantomjs-2.1.1-linux-x86_64.tar.bz2;phantomjs-2.1.1-linux-i686.tar.bz2

MacOS :phantomjs-2.1.1-macosx.zip

下载下来后，我们看到bin目录下就是可执行文件phantomjs.exe，我们可以将它配置到环境变量中，方便命令使用！

还有一个examples目录，它下面是很多js样例，关于这些样例作用，参考官网解释，给大家做个简单翻译：

1. Basic examples

arguments.js：显示传递给脚本的参数

countdown.js：打印10秒倒计时

echoToFile.js：将命令行参数写入文件

fibo.js：列出了斐波那契数列中的前几个数字

hello.js：显示著名消息

module.js：并universe.js演示模块系统的使用

outputEncoding.js：显示各种编码的字符串

printenv.js：显示系统的环境变量

scandir.js：列出目录及其子目录中的所有文件

sleepsort.js：对整数进行排序并根据其值延迟显示

version.js：打印出PhantomJS版本号

page_events.js：打印出页面事件触发：有助于更好地掌握page.on*回调
2. Rendering/rasterization

colorwheel.js：使用HTML5画布创建色轮

rasterize.js：将网页光栅化为图像或PDF

render_multi_url.js：将多个网页渲染为图像
3. Page automation

injectme.js：将自身注入到网页上下文中

phantomwebintro.js：使用jQuery从phantomjs.org读取.version元素文本

unrandomize.js：在页面初始化时修改全局对象

waitfor.js：等待直到测试条件为真或发生超时
4. Network

detectniff.js：检测网页是否嗅探用户代理

loadspeed.js：计算网站的加载速度

netlog.js：转储所有网络请求和响应

netsniff.js：以HAR格式捕获网络流量

post.js：将HTTP POST请求发送到测试服务器

postserver.js：启动Web服务器并向其发送HTTP POST请求

server.js：启动Web服务器并向其发送HTTP GET请求

serverkeepalive.js：启动Web服务器，以纯文本格式回答

simpleserver.js：启动Web服务器，以HTML格式回答
5. Testing

run-jasmine.js：运行基于Jasmine的测试

run-qunit.js：运行基于QUnit的测试
6. Browser

features.js：检测浏览器功能使用modernizr.js

useragent.js：更改浏览器的用户代理属性

今天，我们根据网页URL生成图片，使用的就是rasterize.js：将网页光栅化为图像或PDF。

了解rasterize.js

我们来看一下rasterize.js的内容（源文件对size的处理有错误，这里已修正！）：

"use strict";
var page = require('webpage').create(),
    system = require('system'),
    address, output, size;

if (system.args.length < 3 || system.args.length > 5) {
    console.log('Usage: rasterize.js URL filename [paperwidth*paperheight|paperformat] [zoom]');
    console.log('  paper (pdf output) examples: "5in*7.5in", "10cm*20cm", "A4", "Letter"');
    console.log('  image (png/jpg output) examples: "1920px" entire page, window width 1920px');
    console.log('                                   "800px*600px" window, clipped to 800x600');
    phantom.exit(1);
} else {
    address = system.args[1];
    output = system.args[2];
    page.viewportSize = { width: 800, height: 200 };
    if (system.args.length > 3 && system.args[2].substr(-4) === ".pdf") {
        size = system.args[3].split('*');
        page.paperSize = size.length === 2 ? { width: size[0], height: size[1], margin: '0px' }
                                           : { format: system.args[3], orientation: 'portrait', margin: '1cm' };
    } else if (system.args.length > 3 && system.args[3].substr(-2) === "px") {
        size = system.args[3].split('*');
        if (size.length === 2) {
            var pageWidth = parseInt(size[0].substr(0,size[0].indexOf("px")), 10);
            var pageHeight = parseInt(size[1].substr(0,size[1].indexOf("px")), 10);
            page.viewportSize = { width: pageWidth, height: pageHeight };
            page.clipRect = { top: 0, left: 0, width: pageWidth, height: pageHeight };
        } else {
            var pageWidth = parseInt(system.args[3].substr(0,system.args[3].indexOf("px")), 10);
            var pageHeight = parseInt(pageWidth * 3/4, 10); // it's as good an assumption as any
            page.viewportSize = { width: pageWidth, height: pageHeight };
        }
    }
    if (system.args.length > 4) {
        page.zoomFactor = system.args[4];
    }
    page.open(address, function (status) {
        if (status !== 'success') {
            console.log('Unable to load the address!');
            phantom.exit(1);
        } else {
            window.setTimeout(function () {
                page.render(output);
                phantom.exit();
            }, 200);
        }
    });
}

有过终端开发的人，对这段命令理解起来都不会太难，这里我就不多说了，后面，我们重点介绍它的使用！

使用方法

首先，我们将Phantom的包引入工程，放在resources目录下。因为我们要保证本地windows开发与服务器linux环境开发打包后都能运行，所以，我们将windows和linux两个包都引入。

然后，我们创建Phantom的使用工具类PhantomTools.class：

package test;

import lombok.extern.slf4j.Slf4j;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.lang3.StringUtils;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.util.UUID;

/**
 * Title 网页转图片处理类
 * 
 * @author Ason(18078490)
 * @date 2020-08-01
 */
@Slf4j
public class PhantomTools {

    /**
     * 可执行文件phantomjs.exe路径
     */
    private final String phantomjsPath;
    /**
     * 快照图生成JS路径
     */
    private final String rasterizePath;
    /**
     * 临时图片前缀
     */
    private static final String FILE_PREFIX = "TIG-AE-";
    /**
     * 临时图片后缀
     */
    private static final String FILE_SUFFIX = ".jpg";

    /**
     * 构造参数
     * 获取phantomjs路径
     */
    public PhantomTools() {
        String bootPath = new File(this.getClass().getResource("/").getPath()).getPath();
        phantomjsPath = String.join(File.separator, bootPath, "phantomjs-2.1.1-windows", "bin", "phantomjs");
        rasterizePath = String.join(File.separator, bootPath, "phantomjs-2.1.1-windows", "examples", "rasterize.js");
    }

    /**
     * url 中需要转义的字符
     * 1. +  URL 中+号表示空格 %2B
     * 2. 空格 URL中的空格可以用+号或者编码 %20
     * 3. /  分隔目录和子目录 %2F
     * 4. ?  分隔实际的 URL 和参数 %3F
     * 5. % 指定特殊字符 %25
     * 6. # 表示书签 %23
     * 7. & URL 中指定的参数间的分隔符 %26
     * 8. = URL 中指定参数的值 %3D
     *
     * @param url 需要转义的URL
     * @return 转义后的URL
     */
    public String parseUrl(String url) {
        String parsedUrl = StringUtils.replace(url, "&", "%26");
        log.info("[解析后的URL：{}]", parsedUrl);
        return parsedUrl;
    }

    /**
     * 根据URL生成指定fileName的字节数组
     *
     * @param url 请求URL
     * @return 图片字节数组
     */
    public byte[] create(String url) {
        return create(url, null);
    }

    /**
     * 根据URL生成指定fileName的字节数组
     *
     * @param url  请求URL
     * @param size 指定图片尺寸，例如：1000px*800px
     * @return 图片字节数组
     */
    public byte[] create(String url, String size) {
        // 服务器文件存放地址
        String filePath = FileUtils.getTempDirectoryPath() + FILE_PREFIX + UUID.randomUUID().toString() + FILE_SUFFIX;
        try {
            // 执行快照命令
            String command = String.join(StringUtils.SPACE, phantomjsPath, rasterizePath, url, filePath, size);
            log.info("[执行命令：{}]", command);
            // 执行命令操作
            Process process = Runtime.getRuntime().exec(command);
            // 一直挂起，直到子进程执行结束，返回值0表示正常退出
            if (process.waitFor() != 0) {
                log.error("[执行本地Command命令失败] [Command：{}]", command);
                return new byte[0];
            }
            // 判断生成的图片是否存在
            File file = FileUtils.getFile(filePath);
            if (!file.exists()) {
                log.error("[本地文件\"{}\"不存在]", file.getName());
                return new byte[0];
            }
            // 将快照图片生成字节数组
            byte[] bytes = IOUtils.toByteArray(new FileInputStream(file));
            log.info("[图片生成结束] [图片大小：{}KB]", bytes.length / 1024);
            return bytes;
        } catch (IOException | InterruptedException e) {
            log.error("[图片生成失败]", e);
        } finally {
            FileUtils.deleteQuietly(FileUtils.getFile(filePath));
        }
        return new byte[0];
    }
}

上面工具类，通过构造方法初始化了命令包路径，调用parseUrl()方法对URL中含有的&符号做了替换，最核心的命令执行，采用Process对象完成，最后输出到临时目录下的图片文件。这就是phantomjs对Web访问页的图片生成流程。

其中，Process对象底层调用的其实就是ProcessBuilder。

public Process exec(String[] cmdarray, String[] envp, File dir)
    throws IOException {
    return new ProcessBuilder(cmdarray)
        .environment(envp)
        .directory(dir)
        .start();
}

ProcessBuilder会调用ProcessImpl的许多底层native方法完成URL访问与图片生成。

测试方法：

public static void main(String[] arg) throws IOException {
    String url = "https://www.cnblogs.com/ason-wxs/";
    PhantomTools phantomTools = new PhantomTools();
    String parsedUrl = phantomTools.parseUrl(url);
    byte[] byteImg = phantomTools.create(parsedUrl);
    File descFile = new File(FileUtils.getTempDirectoryPath() + "test.png");
    FileUtils.touch(descFile);
    FileUtils.writeByteArrayToFile(descFile, byteImg);
}

测试结果我就不贴出来了，无非将我的博客首页生成图片保存到指定文件test.png中。

好了，希望上面对PhantomJS的介绍对你今后工作有所帮助！

原文链接：https://www.cnblogs.com/ason-wxs/p/13411271.html

Phantomjs实现后端将URL转换为图片

PhantomJS简介

了解rasterize.js

使用方法

热门标签