smallgfw: 一个基于DFA的敏感词检测和替换模块,用法如doctest所示。
gfw = GFW() gfw.set([“sexy”,”girl”,”love”,”shit”])#设置敏感词列表 s = gfw.replace(“shit!,Cherry is a sexy girl. She loves python.”,””) print s !,Cherry is a * . She s python. #屏蔽后的效果 gfw = GFW() gfw.set([“abd”,”defz”,”bcz”]) print gfw.check(“xabdabczabdxaadefz”) #检测敏感词的出现位置 [(1, 3, ‘abd’), (5, 3, ‘bcz’), (8, 3, ‘abd’), (14, 4, ‘defz’)] #例如,(5, 3, ‘bcz’)表示下标5之后长度为3的子串
gfw = GFW() gfw.set([“sexy”,”girl”,”love”,”shit”])#设置敏感词列表 s = gfw.replace(“shit!,Cherry is a sexy girl. She loves python.”,””) print s !,Cherry is a * . She s python. #屏蔽后的效果
gfw = GFW() gfw.set([“abd”,”defz”,”bcz”]) print gfw.check(“xabdabczabdxaadefz”) #检测敏感词的出现位置 [(1, 3, ‘abd’), (5, 3, ‘bcz’), (8, 3, ‘abd’), (14, 4, ‘defz’)] #例如,(5, 3, ‘bcz’)表示下标5之后长度为3的子串