我想_RARE_使用 JAVA 在JSON树中替换稀有词。
_RARE_
我的罕见单词列表包含
late populate convicts
所以对于下面的JSON
["S", ["PP", ["ADP", "In"], ["NP", ["DET", "the"], ["NP", ["ADJ", "late"], ["NOUN", "1700<s"]]]], ["S", ["NP", ["ADJ", "British"], ["NOUN", "convicts"]], ["S", ["VP", ["VERB", "were"], ["VP", ["VERB", "used"], ["S+VP", ["PRT", "to"], ["VP", ["VERB", "populate"], ["WHNP", ["DET", "which"], ["NOUN", "colony"]]]]]], [".", "?"]]]]
我应该得到
["S", ["PP", ["ADP", "In"], ["NP", ["DET", "the"], ["NP", ["ADJ", "_RARE_"], ["NOUN", "1700<s"]]]], ["S", ["NP", ["ADJ", "British"], ["NOUN", "_RARE_"]], ["S", ["VP", ["VERB", "were"], ["VP", ["VERB", "used"], ["S+VP", ["PRT", "to"], ["VP", ["VERB", "populate"], ["WHNP", ["DET", "which"], ["NOUN", "colony"]]]]]], [".", "?"]]]]
注意如何
["ADJ","late"]
被替换为
["ADJ","_RARE_"]
到目前为止,我的代码如下:
我递归地遍历树,一旦发现稀有单词,我就创建一个新的JSON数组,并尝试用它替换现有树的节点。看到// this Doesn't work下面,这就是我被卡住的地方。在此功能之外,树保持不变。
// this Doesn't work
public static void traverseTreeAndReplaceWithRare(JsonArray tree){ //System.out.println(tree.getAsJsonArray()); for (int x = 0; x < tree.getAsJsonArray().size(); x++) { if(!tree.get(x).isJsonArray()) { if(tree.size()==2) { //beware it will get here twice for same word String word= tree.get(1).toString(); word=word.replaceAll("\"", ""); // removing double quotes if(rareWords.contains(word)) { JsonParser parser = new JsonParser(); //This works perfectly System.out.println("Orig:"+tree); JsonElement jsonElement = parser.parse("["+tree.get(0)+","+"_RARE_"+"]"); JsonArray newRareArray = jsonElement.getAsJsonArray(); //This works perfectly System.out.println("New:"+newRareArray); tree=newRareArray; // this Doesn't work } } continue; } traverseTreeAndReplaceWithRare(tree.get(x).getAsJsonArray()); } }
上面调用的代码,我用的是谷歌的gson
JsonParser parser = new JsonParser(); JsonElement jsonElement = parser.parse(strJSON); JsonArray tree = jsonElement.getAsJsonArray();
这是C ++中的直接方法:
#include <fstream> #include "JSON.hpp" #include <boost/algorithm/string/regex.hpp> #include <boost/range/adaptors.hpp> #include <boost/phoenix.hpp> static std::vector<std::wstring> readRareWordList() { std::vector<std::wstring> result; std::wifstream ifs("testcases/rarewords.txt"); std::wstring line; while (std::getline(ifs, line)) result.push_back(std::move(line)); return result; } struct RareWords : boost::static_visitor<> { ///////////////////////////////////// // do nothing by default template <typename T> void operator()(T&&) const { /* leave all other things unchanged */ } ///////////////////////////////////// // recurse arrays and objects void operator()(JSON::Object& obj) const { for(auto& v : obj.values) { //RareWords::operator()(v.first); /* to replace in field names (?!) */ boost::apply_visitor(*this, v.second); } } void operator()(JSON::Array& arr) const { int i = 0; for(auto& v : arr.values) { if (i++) // skip the first element in all arrays boost::apply_visitor(*this, v); } } ///////////////////////////////////// // do replacements on strings void operator()(JSON::String& s) const { using namespace boost; const static std::vector<std::wstring> rareWords = readRareWordList(); const static std::wstring replacement = L"__RARE__"; for (auto&& word : rareWords) if (word == s.value) s.value = replacement; } }; int main() { auto document = JSON::readFrom(std::ifstream("testcases/test3.json")); boost::apply_visitor(RareWords(), document); std::cout << document; }
假设您要替换所有字符串值,并且仅匹配整个字符串。 您可以通过更改regex或regex标志轻松地使这种大小写不敏感,匹配字符串中的单词等。略微适应了评论。
包括JSON.hpp / cpp的完整代码在这里:https : //github.com/sehe/spirit-v2-json/tree/16093940