Data Pipeline 是一个Java的数据转换工具包,主要的功能包括:
典型的应用场景包括: 1. 读取 CSV 文件 2. 删除重复的记录 3. 添加计算列 4. 删除无用的列 5. 数据保存到数据库
代码示例:
DataReader reader = new CSVReader(new File("credit-balance.csv")) .setFieldNamesInFirstRow(true); // Use only the "Rating" and "CreditLimit" fields in duplicate test reader = new RemoveDuplicatesReader(reader, new FieldList("Rating", "CreditLimit")); // Add AvailableCredit field, remove "CreditLimit", "Balance" fields reader = new TransformingReader(reader) .add(new SetCalculatedField("AvailableCredit", "parseDouble(CreditLimit) - parseDouble(Balance)")) .add(new ExcludeFields("CreditLimit", "Balance")); DataWriter writer = new JdbcWriter(getJdbcConnection(), "dp_credit_balance") .setAutoCloseConnection(true); JobTemplate.DEFAULT.transfer(reader, writer);