最近有点疲乏了 弄一点点就好今天学着把字给切割丢给机器
import jieba
#分词
text = "我星期天早上想去爬山,我是台中人所以想去玩"
words = jieba.lcut(text)
print(words)
#过滤不需要的词
with open(\'stopwords.txt\',\'r\',encoding=\'utf-8\')as file:
stopwords = set(word.strip() for line in file for word in line.split(\',\'))
filtered_word = [word for word in words if word not in stopwords]
print(filtered_word)
输出结果没有过滤[\'我\', \'星期天\', \'早上\', \'想\', \'去\', \'爬山\', \',\', \'我\', \'是\', \'台中人\', \'所以\', \'想\', \'去\', \'玩\']有过滤[\'我\', \'星期天\', \'早上\', \'想\', \'去\', \'爬山\', \',\', \'我\', \'台中人\', \'想\', \'去\', \'玩\']
不过我觉得最难的是选过滤词 不太确定要怎么弄明天好想休息q_q