简介:本文详细讲解了如何爬取某站视频评论并进行情感分析,通过实战案例和简明易懂的步骤,让读者掌握爬虫技术和情感分析在数据处理中的应用。
随着互联网内容的爆炸式增长,视频评论已成为用户反馈和意见表达的重要渠道。通过对视频评论的爬取与情感分析,我们可以了解用户的喜好、态度及情感倾向,为企业决策、内容创作提供有力支持。本文将以某站(以B站为例)为例,详细介绍视频评论的爬取与情感分析过程。
import requestsimport pandas as pd# 示例:爬取B站视频评论headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}video_id = '你的视频ID'page = 1comments = []while True:url = f'https://api.bilibili.com/x/v2/reply/main?csrf=xxx&mode=3&next={page}&oid={video_id}&plat=1&type=1'response = requests.get(url, headers=headers)if response.status_code != 200:breakdata = response.json()if not data['data']['replies']:breakfor reply in data['data']['replies']:comments.append(reply['content']['message'])page += 1df = pd.DataFrame(comments, columns=['comment'])df.to_csv('video_comments.csv', index=False)
import jieba# 示例:对评论进行分词comments_seg = [jieba.lcut(comment) for comment in df['comment']]
```python
from snownlp import SnowNLP
sentiments = []
for comment_seg in comments_seg:
s = ‘’.join(comment_seg)
snow = SnowNLP(s)
sentiments.append(snow.sentiments) # 获取情感值,范围在[0, 1]之间,越接近1表示越积极
df[‘sentiment’] = sentiments
df[‘sentiment_category’] = df[‘sentiment’].apply(lambda x: ‘positive’ if x > 0.5 else ‘negative’)
positive_count = df[df[‘sentiment_category’] == ‘positive’].shape[0]
negative_count = df[df[‘sentiment_category’] == ‘negative’].shape[0]
print(f’Positive comments: {positive