我必须画一个wordcloud。“ tweets.csv”是熊猫数据框,其中有一列名为“文本”。所绘制的图形并非基于最常用的词语,艰难。字大小如何与数据帧中的频率联系起来?
text = df_final.text.values wordcloud = WordCloud( #mask = logomask, max_words = 1000, width = 600, height = 400, #max_font_size = 1000, #min_font_size = 100, normalize_plurals = True, #scale = 5, #relative_scaling = 0, background_color = 'black', stopwords = STOPWORDS.union(stopwords) ).generate(str(text)) fig = plt.figure( figsize = (50,40), facecolor = 'k', edgecolor = 'k') plt.imshow(wordcloud, interpolation = 'bilinear') plt.axis('off') plt.tight_layout(pad=0) plt.show()
我的数据框如下所示:
0 RT @Pontifex_pt: Temos que descobrir as riquezezas ... 1 RT @Pontifex_pt: Todos estamos em viagem rumo ... 2 RT @Pontifex_pt: Unamos as forças, em todos ... 3 RT @GeneralMourao: #Segurançapública, preocupa ... 4 RT @FIFAcom: The Brasileirao U-17 final provided ...
另请参见DataCamp:在Python中生成WordCloud
import pandas as pd
df = pd.DataFrame({‘word’: [‘how’, ‘are’, ‘you’, ‘doing’, ‘this’, ‘afternoon’], ‘count’: [7, 10, 4, 1, 20, 100]})
word
count
dict
WordCloud().generate_from_frequencies() 需要一个 dict
WordCloud().generate_from_frequencies()
data = dict(zip(df[‘word’].tolist(), df[‘count’].tolist()))
print(data)
{‘how’: 7, ‘are’: 10, ‘you’: 4, ‘doing’: 1, ‘this’: 20, ‘afternoon’: 100}
.generate_from_frequencies
generate_from_frequencies(frequencies, max_font_size=None)
from wordcloud import WordCloud
wc = WordCloud(width=800, height=400, max_words=200).generate_from_frequencies(data)
import matplotlib.pyplot as plt plt.figure(figsize=(10, 10)) plt.imshow(wc, interpolation='bilinear') plt.axis('off') plt.show()
twitter_mask = np.array(Image.open('twitter.png')) wc = WordCloud(background_color='white', width=800, height=400, max_words=200, mask=twitter_mask).generate_from_frequencies(data_nyt) plt.figure(figsize=(10, 10)) plt.imshow(wc, interpolation='bilinear') plt.axis("off") plt.figure() plt.imshow(twitter_mask, cmap=plt.cm.gray, interpolation='bilinear') plt.axis("off") plt.show()