如何制作词云 - How to make wordcloud

下图是本文词云的照片。

The image below is a photo of a wordcloud.


首先,安装wordcloud包。对于汉语用户,需额外安装jieba包。

Firstly, install the wordcloud package. For Chinese users, install an additional package, jieba.

1
2
pip install jieba
pip install wordcloud


上图词云的文本来源是本篇博客,通过爬虫爬取内容,再用BeautifulSoup包解码。读者也可以从文本中读取,或者直接向程序中粘贴文本来源。

The text for the word cloud above was obtained from this blog post, extracted through web scraping, and then decoded with the BeautifulSoup package. Readers could also read from a text file or directly paste the text into the program.

1
2
3
4
5
6
7
8
url = 'https://blog.tennisatw.com/post/26/'
r = requests.get(url=url).text
soup = BeautifulSoup(r, 'lxml')
paragraphs = soup.find_all('p')

blog_text = ''
for text in paragraphs:
blog_text += text.text
1
blog_text = '文本 text'


如果是汉语用户,由于汉语词汇中间没有空格,需使用jieba分词,执行以下代码:

For Chinese users, as Chinese vocabulary does not contain spaces in between words, the jieba package is needed for word segmentation. Run the following code.

1
2
ls = jieba.lcut(blog_text)
text = ' '.join(ls)


以下为全部代码:

Below is the complete code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import requests
from bs4 import BeautifulSoup
import jieba
import wordcloud

url = 'https://blog.tennisatw.com/post/26/'
r = requests.get(url=url).text
soup = BeautifulSoup(r, 'lxml')
paragraphs = soup.find_all('p')

blog_text = ''
for text in paragraphs:
blog_text += text.text

ls = jieba.lcut(blog_text)
text = ' '.join(ls)

stopwords = wordcloud.STOPWORDS | {"的", "是", "了", "我"}

wc = wordcloud.WordCloud(font_path="msyh.ttc",
width=800,
height=600,
background_color='white',
max_words=80,
max_font_size=150,
stopwords=stopwords)

wc.generate(text)
wc.to_file("wordcloud.png")