我正在寻找一种压缩基于ascii的字符串的方法,有帮助吗?
我还需要解压缩它。我尝试了zlib,但没有帮助。
如何将字符串压缩为较短的长度?
码:
def compress(request): if request.POST: data = request.POST.get('input') if is_ascii(data): result = zlib.compress(data) return render_to_response('index.html', {'result': result, 'input':data}, context_instance = RequestContext(request)) else: result = "Error, the string is not ascii-based" return render_to_response('index.html', {'result':result}, context_instance = RequestContext(request)) else: return render_to_response('index.html', {}, context_instance = RequestContext(request))
使用压缩并不总是会减少字符串的长度!
考虑下面的代码;
import zlib import bz2 def comptest(s): print 'original length:', len(s) print 'zlib compressed length:', len(zlib.compress(s)) print 'bz2 compressed length:', len(bz2.compress(s))
让我们在一个空字符串上尝试一下;
In [15]: comptest('') original length: 0 zlib compressed length: 8 bz2 compressed length: 14
这样就zlib产生了额外的8个字符和bz214个字符。压缩方法通常在压缩数据前放置一个“标头”,以供解压缩程序使用。该头增加了输出的长度。
zlib
bz2
让我们测试一个单词;
In [16]: comptest('test') original length: 4 zlib compressed length: 12 bz2 compressed length: 40
即使减去标题的长度,压缩也不会使单词变短。这是因为在这种情况下几乎没有压缩。字符串中的大多数字符仅出现一次。现在简短一句话;
In [17]: comptest('This is a compression test of a short sentence.') original length: 47 zlib compressed length: 52 bz2 compressed length: 73
同样,压缩输出 大于 输入文本。由于文本的长度有限,因此重复很少,因此压缩效果不佳。
您需要相当长的文本块才能进行压缩,才能真正起作用。
In [22]: rings = ''' ....: Three Rings for the Elven-kings under the sky, ....: Seven for the Dwarf-lords in their halls of stone, ....: Nine for Mortal Men doomed to die, ....: One for the Dark Lord on his dark throne ....: In the Land of Mordor where the Shadows lie. ....: One Ring to rule them all, One Ring to find them, ....: One Ring to bring them all and in the darkness bind them ....: In the Land of Mordor where the Shadows lie.''' In [23]: comptest(rings) original length: 410 zlib compressed length: 205 bz2 compressed length: 248