小编典典

如何根据pandas数据框中的频率创建wordcloud

python

我必须画一个wordcloud。“
tweets.csv”是熊猫数据框,其中有一列名为“文本”。所绘制的图形并非基于最常用的词语,艰难。字大小如何与数据帧中的频率联系起来?

text = df_final.text.values
wordcloud = WordCloud(
    #mask = logomask,
    max_words = 1000,
    width = 600,
    height = 400,
    #max_font_size = 1000,
    #min_font_size = 100,
    normalize_plurals = True,
    #scale = 5,
    #relative_scaling = 0,
    background_color = 'black',
    stopwords = STOPWORDS.union(stopwords)
).generate(str(text))
fig = plt.figure(
    figsize = (50,40),
    facecolor = 'k',
    edgecolor = 'k')
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.axis('off')
plt.tight_layout(pad=0)
plt.show()

我的数据框如下所示:

0   RT @Pontifex_pt: Temos que descobrir as riquezezas ...
1   RT @Pontifex_pt: Todos estamos em viagem rumo ...
2   RT @Pontifex_pt: Unamos as forças, em todos ...
3   RT @GeneralMourao: #Segurançapública, preocupa ...
4   RT @FIFAcom: The Brasileirao U-17 final provided ...

阅读 215

收藏
2020-12-20

共1个答案

小编典典

设置一个示例数据框:

  • 另请参见DataCamp:在Python中生成WordCloud

    import pandas as pd

    df = pd.DataFrame({‘word’: [‘how’, ‘are’, ‘you’, ‘doing’, ‘this’, ‘afternoon’],
    ‘count’: [7, 10, 4, 1, 20, 100]})

wordcount列转换为dict

  • WordCloud().generate_from_frequencies() 需要一个 dict

    data = dict(zip(df[‘word’].tolist(), df[‘count’].tolist()))

    print(data)

    {‘how’: 7, ‘are’: 10, ‘you’: 4, ‘doing’: 1, ‘this’: 20, ‘afternoon’: 100}

Wordcloud:

情节

import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

使用图像遮罩:

twitter_mask = np.array(Image.open('twitter.png'))
wc = WordCloud(background_color='white', width=800, height=400, max_words=200, mask=twitter_mask).generate_from_frequencies(data_nyt)

plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation='bilinear')
plt.axis("off")
plt.figure()
plt.imshow(twitter_mask, cmap=plt.cm.gray, interpolation='bilinear')
plt.axis("off")
plt.show()
2020-12-20