我在ES 5.1中为数据建立索引时遇到的第一个错误是完成建议映射,其中包含一个输出字段。
message [MapperParsingException[failed to parse]; nested: IllegalArgumentException[unknown field name [output], must be one of [input, weight, contexts]];]
所以我删除了它,但是现在我的许多自动补全都不正确,因为它返回匹配的输入而不是单个输出String。
经过一番谷歌搜索后,我发现ES中的这篇文章提到了以下内容:
由于建议是面向文档的,因此建议元数据(例如输出)现在应指定为文档中的字段。删除了对建立索引建议条目时指定输出的支持。现在,建议结果条目的文本始终是建议输入的未分析值(与在5.0之前的索引中为建议建立索引时未指定输出相同)。
我发现原始值与随建议返回的_source字段一起使用,但是对我来说这并不是真正的解决方案,因为键和结构会根据其原始对象而变化。
我可以在原始对象上添加一个额外的“输出”字段,但这对我来说也不是解决方案,因为在某些情况下,我具有这样的结构:
{ "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0", "synonyms": ["All available colours", "Colors"], "autoComplete": [{ "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"] }, { "input": ["colors"] }] }
在ES 2.4中,结构如下:
{ "id": "c2358e0c-7399-4665-ac2c-0bdd44597ac0", "synonyms": ["All available colours", "Colors"], "SmartSynonym": [{ "input": ["colours available all", "available colours all", "available all colours", "colours all available", "all available colours", "all colours available"], "output": ["All available colours"] }, { "input": ["colors"], "output": ["Colors"] }] }
当每个自动完成对象中都存在’输出’字段时,这没有任何问题。
当以简单的方式询问“所有可用颜色”时,如何进行ES 5.1(例如,所有可用颜色)的原始值,而无需进行大量手动查找。
我们最终从原始答案中删除了自定义插件,因为很难使其在Elastic Cloud中正常工作。相反,我们只是为自动填充创建了一个单独的文档,并将其从所有其他文档中删除了。
物体
public class Suggest{ /* * Contains the actual value it needs to return * iphone 8 plus, plus iphone 8, 8 plus iphone, ... * will all result into iphone 8 plus for example */ private String autocompleteOutput; /* * Contains the field and all the values of that field to autocomplete */ private Map<String, AutoComplete> autoComplete; @JsonCreator Suggest() { } public Suggest(String autocompleteOutput, Map<String, AutoComplete> autoComplete) { this.autocompleteOutput = autocompleteOutput; this.autoComplete = autoComplete; } public String getAutocompleteOutput() { return autocompleteOutput; } public void setAutocompleteOutput(String autocompleteOutput) { this.autocompleteOutput = autocompleteOutput; } public Map<String, AutoComplete> getAutoComplete() { return autoComplete; } public void setAutoComplete(Map<String, AutoComplete> autoComplete) { this.autoComplete = autoComplete; } } public class AutoComplete { /* * Contains the permutation values from the lucene filter (see original answer */ private String[] input; @JsonCreator AutoComplete() { } public AutoComplete(String[] input) { this.input = input; } public String[] getInput() { return input; } }
具有以下映射
{ "suggest": { "dynamic_templates": [ { "autocomplete": { "path_match": "autoComplete.*", "match_mapping_type": "*", "mapping": { "type": "completion", "analyzer": "lowercase_keyword_analyzer" } } } ], "properties": {} } }
这使我们可以使用_source中的autocompleteOutput字段
经过一番研究,我最终创建了一个新的Elasticsearch 5.1.1插件
import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.PositionLengthAttribute; import java.io.IOException; import java.util.*; /** * Created by glenn on 13.01.17. */ public class PermutationTokenFilter extends TokenFilter { private final CharTermAttribute charTermAtt; private final PositionIncrementAttribute posIncrAtt; private final OffsetAttribute offsetAtt; private Iterator<String> permutations; private int origOffset; /** * Construct a token stream filtering the given input. * * @param input */ protected PermutationTokenFilter(TokenStream input) { super(input); this.charTermAtt = addAttribute(CharTermAttribute.class); this.posIncrAtt = addAttribute(PositionIncrementAttribute.class); this.offsetAtt = addAttribute(OffsetAttribute.class); } @Override public final boolean incrementToken() throws IOException { while (true) { //see if permutations have been created already if (permutations == null) { //see if more tokens are available if (!input.incrementToken()) { return false; } else { //Get value String value = String.valueOf(charTermAtt); //permute over buffer value and create iterator permutations = permutation(value).iterator(); origOffset = posIncrAtt.getPositionIncrement(); } } //see if there are remaining permutations if (permutations.hasNext()) { //Reset the attribute to starting point clearAttributes(); //use the next permutation String permutation = permutations.next(); //add te permutation to the attributes and remove old attributes charTermAtt.setEmpty().append(permutation); posIncrAtt.setPositionIncrement(origOffset); offsetAtt.setOffset(0,permutation.length()); //remove permutation from iterator permutations.remove(); origOffset = 0; return true; } permutations = null; } } /** * Changes the order of a multi value keyword so the completion suggester still knows the original value without * tokenizing it if the users asks the words in a different order. * * @param value unpermuted value ex: Yellow Crazy Banana * @return Permuted values ex: * Yellow Crazy Banana, * Yellow Banana Crazy, * Crazy Yellow Banana, * Crazy Banana Yellow, * Banana Crazy Yellow, * Banana Yellow Crazy */ private Set<String> permutation(String value) { value = value.trim().replaceAll(" +", " "); // Use sets to eliminate semantic duplicates (a a b is still a a b even if you switch the two 'a's in case one word occurs multiple times in a single value) // Switch to HashSet for better performance Set<String> set = new HashSet<String>(); String[] words = value.split(" "); // Termination condition: only 1 permutation for a array of 1 word if (words.length == 1) { set.add(value); } else if (words.length <= 6) { // Give each word a chance to be the first in the permuted array for (int i = 0; i < words.length; i++) { // Remove the word at index i from the array String pre = ""; for (int j = 0; j < i; j++) { pre += words[j] + " "; } String post = " "; for (int j = i + 1; j < words.length; j++) { post += words[j] + " "; } String remaining = (pre + post).trim(); // Recurse to find all the permutations of the remaining words for (String permutation : permutation(remaining)) { // Concatenate the first word with the permutations of the remaining words set.add(words[i] + " " + permutation); } } } else { Collections.addAll(set, words); set.add(value); } return set; } }
该过滤器将采用原始输入令牌“所有可用颜色”并将其置换为所有可能的组合(请参阅原始问题)
import org.apache.lucene.analysis.TokenStream; import org.elasticsearch.index.analysis.AbstractTokenFilterFactory; import org.elasticsearch.common.settings.Settings; import org.elasticsearch.env.Environment; import org.elasticsearch.index.IndexSettings; /** * Created by glenn on 16.01.17. */ public class PermutationTokenFilterFactory extends AbstractTokenFilterFactory { public PermutationTokenFilterFactory(IndexSettings indexSettings, Environment environment, String name, Settings settings) { super(indexSettings, name, settings); } public PermutationTokenFilter create(TokenStream input) { return new PermutationTokenFilter(input); } }
需要此类来为Elasticsearch插件提供过滤器。
请遵循本指南为Elasticsearch插件设置所需的配置。
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>be.smartspoken</groupId> <artifactId>permutation-plugin</artifactId> <version>5.1.1-SNAPSHOT</version> <packaging>jar</packaging> <name>Plugin: Permutation</name> <description>Permutation plugin for elasticsearch</description> <properties> <lucene.version>6.3.0</lucene.version> <elasticsearch.version>5.1.1</elasticsearch.version> <java.version>1.8</java.version> <log4j2.version>2.7</log4j2.version> </properties> <dependencies> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-api</artifactId> <version>${log4j2.version}</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-core</artifactId> <version>${log4j2.version}</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-test-framework</artifactId> <version>${lucene.version}</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>${lucene.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers-common</artifactId> <version>${lucene.version}</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.elasticsearch</groupId> <artifactId>elasticsearch</artifactId> <version>${elasticsearch.version}</version> <scope>provided</scope> </dependency> </dependencies> <build> <resources> <resource> <directory>src/main/resources</directory> <filtering>false</filtering> <excludes> <exclude>*.properties</exclude> </excludes> </resource> </resources> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.6</version> <configuration> <appendAssemblyId>false</appendAssemblyId> <outputDirectory>${project.build.directory}/releases/</outputDirectory> <descriptors> <descriptor>${basedir}/src/main/assemblies/plugin.xml</descriptor> </descriptors> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.3</version> <configuration> <source>${java.version}</source> <target>${java.version}</target> </configuration> </plugin> </plugins> </build> </project>
确保在pom.xml文件中使用正确的Elasticsearch,Lucene和Log4J(2)版本。并提供正确的配置文件
import be.smartspoken.plugin.permutation.filter.PermutationTokenFilterFactory; import org.elasticsearch.index.analysis.TokenFilterFactory; import org.elasticsearch.indices.analysis.AnalysisModule; import org.elasticsearch.plugins.AnalysisPlugin; import org.elasticsearch.plugins.Plugin; import java.util.HashMap; import java.util.Map; /** * Created by glenn on 13.01.17. */ public class PermutationPlugin extends Plugin implements AnalysisPlugin{ @Override public Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> getTokenFilters() { Map<String, AnalysisModule.AnalysisProvider<TokenFilterFactory>> extra = new HashMap<>(); extra.put("permutation", PermutationTokenFilterFactory::new); return extra; } }
向插件提供工厂。
安装新插件后,您需要重新启动Elasticsearch。
添加一个新的自定义分析器,以“修改” 2.x的功能
Settings.builder() .put("number_of_shards", 2) .loadFromSource(jsonBuilder() .startObject() .startObject("analysis") .startObject("analyzer") .startObject("permutation_analyzer") .field("tokenizer", "keyword") .field("filter", new String[]{"permutation","lowercase"}) .endObject() .endObject() .endObject() .endObject().string()) .loadFromSource(jsonBuilder() .startObject() .startObject("analysis") .startObject("analyzer") .startObject("lowercase_keyword_analyzer") .field("tokenizer", "keyword") .field("filter", new String[]{"lowercase"}) .endObject() .endObject() .endObject() .endObject().string()) .build();
现在,您要做的就是为对象映射提供自定义分析器
{ "my_object": { "dynamic_templates": [{ "autocomplete": { "path_match": "my.autocomplete.object.path", "match_mapping_type": "*", "mapping": { "type": "completion", "analyzer": "permutation_analyzer", /* custom analyzer */ "search_analyzer": "lowercase_keyword_analyzer" /* custom analyzer */ } } }], "properties": { /*your other properties*/ } } }
这也将提高性能,因为您不必再等待构建排列了。