我有一个很大的List命名项(> = 1,000,000个项),并且用表示的某些条件选择要删除的项,并且对于列表中的许多(也许一半)项都是正确的。
我的目标是有效删除选择的项目并保留所有其他项目,可以修改源列表,可以创建新列表-应该考虑性能来选择最佳方法。
这是我的测试代码:
System.out.println("preparing items"); List<Integer> items = new ArrayList<Integer>(); // Integer is for demo for (int i = 0; i < 1000000; i++) { items.add(i * 3); // just for demo } System.out.println("deleting items"); long startMillis = System.currentTimeMillis(); items = removeMany(items); long endMillis = System.currentTimeMillis(); System.out.println("after remove: items.size=" + items.size() + " and it took " + (endMillis - startMillis) + " milli(s)");
和天真的实现:
public static <T> List<T> removeMany(List<T> items) { int i = 0; Iterator<T> iter = items.iterator(); while (iter.hasNext()) { T item = iter.next(); // <cond> goes here if (/*<cond>: */i % 2 == 0) { iter.remove(); } i++; } return items; }
如您所见,我使用项索引模2 == 0作为删除条件()-仅用于演示目的。
removeMany可以提供哪个更好的版本,为什么这个更好的版本实际上更好呢?
removeMany
好的,现在该是提出的方法的测试结果了。这里是我测试过的方法(每种方法的名称在我的资源中也是类名):
NaiveRemoveManyPerformer
ArrayList
BetterNaiveRemoveManyPerformer
LinkedRemoveManyPerformer
LinkedList
CreateNewRemoveManyPerformer
SmartCreateNewRemoveManyPerformer
FasterSmartCreateNewRemoveManyPerformer
items.get(idx)
MagicRemoveManyPerformer
ForwardInPlaceRemoveManyPerformer
GuavaArrayListRemoveManyPerformer
Iterables.removeIf
完整的源代码在此答案的末尾给出。
使用不同的列表大小(从10,000项到10,000,000项)和不同的删除因子(指定必须从列表中删除多少个项目)执行测试。
正如我在此处的评论中发布的其他答案一样,我认为将项目从复制ArrayList到秒ArrayList将比迭代LinkedList和删除项目要快。Sun的Java文档说,ArrayList与LinkedList实现相比,常数因子低,但是令人惊讶的是,我的问题并非如此。
实际上LinkedList,在大多数情况下,简单的迭代和删除具有最佳性能(此方法在中实现LinkedRemoveManyPerformer)。通常,只有MagicRemoveManyPerformer性能可与媲美LinkedRemoveManyPerformer,而其他方法则要慢得多。Google Guava GuavaArrayListRemoveManyPerformer比手工制作的类似代码慢(因为我的代码不会删除列表末尾的不必要项目)。
从1,000,000个源项目中删除500,000个项目的示例结果:
从1,000,000个源项目中删除1个项目的示例结果(删除第一个项目):
从1,000,000个源项目中删除333,334个项目的示例结果:
从1,000,000个源项目中删除1,000,000个(所有)项目的示例结果(所有项目都被删除,但经过一个接一个的处理,如果您先验地知道要删除所有项目,则应该简单地清除列表):
我的最终结论是:使用混合方法-如果处理LinkedList-最好进行简单的迭代和删除,如果处理ArrayList-取决于项目顺序是否重要- 然后使用ForwardInPlaceRemoveManyPerformer-如果项目顺序可能更改- 最佳选择是MagicRemoveManyPerformer。如果先验地知道移除因子(您知道要移除多少个项目还是保留多少个项目),那么在某些情况下可以采用更多条件选择效果更好的方法。但是已知的去除因子不是通常的情况… Google Guava Iterables.removeIf是一种混合解决方案,但是假设稍有不同(必须更改原始列表,无法创建新列表,并且始终按物料顺序排列)-这些是最常见的假设,因此removeIf最好大多数现实生活中的选择。
removeIf
还要注意,所有好的方法(天真的不好!)都足够好-在实际应用中,其中任何一个都做得很好,但是必须避免使用天真的方法。
最后-我的测试源代码。
package WildWezyrListRemovalTesting; import com.google.common.base.Predicate; import com.google.common.collect.Iterables; import java.util.ArrayList; import java.util.Iterator; import java.util.LinkedList; import java.util.List; public class RemoveManyFromList { public static abstract class BaseRemoveManyPerformer { protected String performerName() { return getClass().getSimpleName(); } protected void info(String msg) { System.out.println(performerName() + ": " + msg); } protected void populateList(List<Integer> items, int itemCnt) { for (int i = 0; i < itemCnt; i++) { items.add(i); } } protected boolean mustRemoveItem(Integer itemVal, int itemIdx, int removeFactor) { if (removeFactor == 0) { return false; } return itemIdx % removeFactor == 0; } protected abstract List<Integer> removeItems(List<Integer> items, int removeFactor); protected abstract List<Integer> createInitialList(); public void testMe(int itemCnt, int removeFactor) { List<Integer> items = createInitialList(); populateList(items, itemCnt); long startMillis = System.currentTimeMillis(); items = removeItems(items, removeFactor); long endMillis = System.currentTimeMillis(); int chksum = 0; for (Integer item : items) { chksum += item; } info("removing took " + (endMillis - startMillis) + " milli(s), itemCnt=" + itemCnt + ", removed items: " + (itemCnt - items.size()) + ", remaining items: " + items.size() + ", checksum: " + chksum); } } private List<BaseRemoveManyPerformer> rmps = new ArrayList<BaseRemoveManyPerformer>(); public void addPerformer(BaseRemoveManyPerformer rmp) { rmps.add(rmp); } private Runtime runtime = Runtime.getRuntime(); private void runGc() { for (int i = 0; i < 5; i++) { runtime.gc(); } } public void testAll(int itemCnt, int removeFactor) { runGc(); for (BaseRemoveManyPerformer rmp : rmps) { rmp.testMe(itemCnt, removeFactor); } runGc(); System.out.println("\n--------------------------\n"); } public static class NaiveRemoveManyPerformer extends BaseRemoveManyPerformer { @Override public List<Integer> removeItems(List<Integer> items, int removeFactor) { if (items.size() > 300000 && items instanceof ArrayList) { info("this removeItems is too slow, returning without processing"); return items; } int i = 0; Iterator<Integer> iter = items.iterator(); while (iter.hasNext()) { Integer item = iter.next(); if (mustRemoveItem(item, i, removeFactor)) { iter.remove(); } i++; } return items; } @Override public List<Integer> createInitialList() { return new ArrayList<Integer>(); } } public static class BetterNaiveRemoveManyPerformer extends NaiveRemoveManyPerformer { @Override public List<Integer> removeItems(List<Integer> items, int removeFactor) { // if (items.size() > 300000 && items instanceof ArrayList) { // info("this removeItems is too slow, returning without processing"); // return items; // } for (int i = items.size(); --i >= 0;) { Integer item = items.get(i); if (mustRemoveItem(item, i, removeFactor)) { items.remove(i); } } return items; } } public static class LinkedRemoveManyPerformer extends NaiveRemoveManyPerformer { @Override public List<Integer> createInitialList() { return new LinkedList<Integer>(); } } public static class CreateNewRemoveManyPerformer extends NaiveRemoveManyPerformer { @Override public List<Integer> removeItems(List<Integer> items, int removeFactor) { List<Integer> res = createResultList(items, removeFactor); int i = 0; for (Integer item : items) { if (mustRemoveItem(item, i, removeFactor)) { // no-op } else { res.add(item); } i++; } return res; } protected List<Integer> createResultList(List<Integer> items, int removeFactor) { return new ArrayList<Integer>(); } } public static class SmartCreateNewRemoveManyPerformer extends CreateNewRemoveManyPerformer { @Override protected List<Integer> createResultList(List<Integer> items, int removeFactor) { int newCapacity = removeFactor == 0 ? items.size() : (int) (items.size() * (removeFactor - 1L) / removeFactor + 1); //System.out.println("newCapacity=" + newCapacity); return new ArrayList<Integer>(newCapacity); } } public static class FasterSmartCreateNewRemoveManyPerformer extends SmartCreateNewRemoveManyPerformer { @Override public List<Integer> removeItems(List<Integer> items, int removeFactor) { List<Integer> res = createResultList(items, removeFactor); for (int i = 0; i < items.size(); i++) { Integer item = items.get(i); if (mustRemoveItem(item, i, removeFactor)) { // no-op } else { res.add(item); } } return res; } } public static class ForwardInPlaceRemoveManyPerformer extends NaiveRemoveManyPerformer { @Override public List<Integer> removeItems(List<Integer> items, int removeFactor) { int j = 0; // destination idx for (int i = 0; i < items.size(); i++) { Integer item = items.get(i); if (mustRemoveItem(item, i, removeFactor)) { // no-op } else { if (j < i) { items.set(j, item); } j++; } } return items.subList(0, j); } } public static class MagicRemoveManyPerformer extends NaiveRemoveManyPerformer { @Override public List<Integer> removeItems(List<Integer> items, int removeFactor) { for (int i = 0; i < items.size(); i++) { if (mustRemoveItem(items.get(i), i, removeFactor)) { Integer retainedItem = removeSomeFromEnd(items, removeFactor, i); if (retainedItem == null) { items.remove(i); break; } items.set(i, retainedItem); } } return items; } private Integer removeSomeFromEnd(List<Integer> items, int removeFactor, int lowerBound) { for (int i = items.size(); --i > lowerBound;) { Integer item = items.get(i); items.remove(i); if (!mustRemoveItem(item, i, removeFactor)) { return item; } } return null; } } public static class GuavaArrayListRemoveManyPerformer extends BaseRemoveManyPerformer { @Override protected List<Integer> removeItems(List<Integer> items, final int removeFactor) { Iterables.removeIf(items, new Predicate<Integer>() { public boolean apply(Integer input) { return mustRemoveItem(input, input, removeFactor); } }); return items; } @Override protected List<Integer> createInitialList() { return new ArrayList<Integer>(); } } public void testForOneItemCnt(int itemCnt) { testAll(itemCnt, 0); testAll(itemCnt, itemCnt); testAll(itemCnt, itemCnt - 1); testAll(itemCnt, 3); testAll(itemCnt, 2); testAll(itemCnt, 1); } public static void main(String[] args) { RemoveManyFromList t = new RemoveManyFromList(); t.addPerformer(new NaiveRemoveManyPerformer()); t.addPerformer(new BetterNaiveRemoveManyPerformer()); t.addPerformer(new LinkedRemoveManyPerformer()); t.addPerformer(new CreateNewRemoveManyPerformer()); t.addPerformer(new SmartCreateNewRemoveManyPerformer()); t.addPerformer(new FasterSmartCreateNewRemoveManyPerformer()); t.addPerformer(new MagicRemoveManyPerformer()); t.addPerformer(new ForwardInPlaceRemoveManyPerformer()); t.addPerformer(new GuavaArrayListRemoveManyPerformer()); t.testForOneItemCnt(1000); t.testForOneItemCnt(10000); t.testForOneItemCnt(100000); t.testForOneItemCnt(200000); t.testForOneItemCnt(300000); t.testForOneItemCnt(500000); t.testForOneItemCnt(1000000); t.testForOneItemCnt(10000000); } }