如何找到给定文本中给定单词的所有排列？

小编典典

如何找到给定文本中给定单词的所有排列？

algorithm

这是一个采访问题（电话屏幕）：编写一个函数（使用Java）以查找出现在给定文本中的给定单词的所有排列。例如，对于单词abc和文本，abcxyaxbcayxycab该函数应返回abc, bca, cab。

我将回答以下问题：

显然，我可以遍历给定单词的所有排列并使用标准substring函数。但是（现在对我来说）编写代码来生成所有单词排列可能很困难。
更容易遍历单词大小的所有文本子字符串，对每个子字符串进行排序，然后将其与“已排序”的给定单词进行比较。我可以立即编写这样的函数。
我可能可以修改一些子字符串搜索算法，但是我现在不记得这些算法了。

您将如何回答这个问题？

阅读 235

2020-07-28

共1个答案

小编典典

从算法上看，这可能不是最有效的解决方案，但是从类设计的角度来看，这是干净的。该解决方案采用比较“排序的”给定单词的方法。

如果一个单词包含相同数字的相同字母，我们可以说它是另一个单词的排列。这意味着您可以将单词从a转换String为a
Map<Character,Integer>。此类转换将具有复杂度O（n），其中n是的长度String，假设实现中的插入Map成本为O（1）。

Map将包含单词中找到的所有字符作为键，并包含字符频率的值。

实例。 abbc转换为[a->1, b->2, c->1]

bacb转换为[a->1, b->2, c->1]

因此，如果您必须知道两个单词是否是另一个单词的置换，可以将它们都转换为maps，然后调用Map.equals。

然后，您必须遍历文本字符串，并将转换应用于要查找的单词长度相同的所有子字符串。

Inerdial提出的改进

通过以“滚动”方式更新地图可以改进此方法。

也就是说，如果您要i=3在OP中的示例干草堆中的索引处（子字符串xya）进行匹配，则地图将为[a->1, x->1, y->1]。在大海捞针中前进时，请减少的字符计数haystack[i]，然后增加的计数haystack[i+needle.length()]。

（删除零以确保Map.equals()有效，或仅实施自定义比较。）

Max提出的改进

如果我们还引入matchedCharactersCnt变量怎么办？在大海捞针的开始0。每次将地图更改为所需值时，都会增加变量。每次将其更改为偏离所需值时，都会减小变量。每次迭代都检查变量是否等于针的长度。如果是-
您找到了匹配项。这比每次比较完整地图要快。

Max提供的伪代码：

needle = "abbc"
text = "abbcbbabbcaabbca"

needleSize = needle.length()
//Map of needle character counts
targetMap = [a->1, b->2, c->1]

matchedLength = 0
curMap = [a->0, b->0, c->0]
//Initial map initialization
for (int i=0;i<needle.length();i++) {
    if (curMap.contains(haystack[i])) {
        matchedLength++
        curMap[haystack[i]]++
    }
}

if (matchedLength == needleSize) {
    System.out.println("Match found at: 0");
}

//Search itself
for (int i=0;i<haystack.length()-needle.length();i++) {
    int targetValue1 = targetMap[haystack[i]]; //Reading from hashmap, O(1)
    int curValue1 = curMap[haystack[i]]; //Another read
    //If we are removing beneficial character
    if (targetValue1 > 0 && curValue1 > 0 && curValue1 <= targetValue1) {       
        matchedLength--;
    }
    curMap[haystack[i]] = curValue1 + 1; //Write to hashmap, O(1)


    int targetValue2 = targetMap[haystack[i+needle.length()]] //Read
    int curValue2 = curMap[haystack[i+needle.length()]] //Read
    //We are adding a beneficial character
    if (targetValue2 > 0 && curValue2 < targetValue2) { //If we don't need this letter at all, the amount of matched letters decreases
        matchedLength++;
    }
    curMap[haystack[i+needle.length()]] = curValue2 + 1; //Write

    if (matchedLength == needleSize) {
        System.out.println("Match found at: "+(i+1));
    }
}

//Basically with 4 reads and 2 writes which are 
//independent of the size of the needle,
//we get to the maximal possible performance: O(n)

2020-07-28