python处理美团商家数据(redis)实例Redis-美团商家信息存取与分析原创案例,处理美团商家信息数据集,进行数
Redis-美团商家信息存取与分析
- 处理美团商家信息数据集,进行数据清洗并存入mongodb数据库
- 读取数据并进行数据分析,挖掘其中存在的价值,为运营策略提供有力支持
安装redis操作模组
pip install redis #安装redis操作包
数据存入
数据初始化
import pandas as pd
df = pd.read_csv('温州美团店铺爬取.csv', encoding='utf-8')
df=df.dropna().drop(['id'],axis=1) #过滤空值
df
数据库连接
import redis
pool = redis.ConnectionPool(host='127.0.0.1', port=6379,decode_responses=True, encoding='UTF-8')
r = redis.StrictRedis(connection_pool=pool) #连接redis
必要的时候进行此操作,将清空redis数据库!!!
数据写入数据库
这里将会把每个店家的信息以List列表写入redis
for a in zip(df['店名'], df['店铺ID'], df['评分'], df['地址'], df['评论数'], df['平均价格']):
r.lpush(a[0],a[1],a[2],a[3],a[4],a[5])
print(r.lrange(a[0],0,-1))
数据分析
处理数据,挖掘其中存在的价值,为运营策略提供有力支持
查看各个key
keys = r.keys()
print(keys)
全部读取
df=pd.DataFrame()
keys = r.keys()
id=[]
score=[]
dir=[]
num=[]
price=[]
for key in keys:
key_list = r.lrange(key,0,-1)
id.append(key_list[4])
score.append(key_list[3])
dir.append(key_list[2])
num.append(key_list[1])
price.append(key_list[0])
df['name']=keys #商家名称
df['id']=id #商家id
df['score']=score #商家评分
df['dir']=dir #商家地址
df['num']=num #评论数
df['price']=price #平均消费
df
高分商家
商家评分大于4.5
df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=0]
df2=df1[df1['price']>0]
df3=df2[df2['score']>4.5]
df4=df3[df3['score']<=5]
df4.index = range(len(df4))
df4
热门商家
评论数num超过3000条
df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=3000]
df2=df1[df1['price']>0]
df3=df2[df2['score']>0]
df4=df3[df3['score']<=5]
df4.index = range(len(df4))
df4
顶级商家
同时满足热门商家与高分商家条件
df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=3000]
df2=df1[df1['price']>0]
df3=df2[df2['score']>4.5]
df4=df3[df3['score']<=5]
df4.index = range(len(df4))
df4
平价餐厅
人均消费水平在5至50之间
df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=0]
df2=df1[df1['price']>5]
df3=df2[df2['score']>0]
df4=df3[df3['score']<=5]
df5=df4[df4['price']<50]
df5.index = range(len(df5))
df5
高档餐厅
人均消费水平在200元以上
df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=0]
df2=df1[df1['price']>200]
df3=df2[df2['score']>0]
df4=df3[df3['score']<=5]
df4.index = range(len(df4))
df4
高档优选餐厅
平均消费高于200元,评分高于4.5
df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=0]
df2=df1[df1['price']>200]
df3=df2[df2['score']>4.5]
df4=df3[df3['score']<=5]
df4.index = range(len(df4))
df4
满分优选商家
评分为5.0,且评论过千的商家
df['num']=df['num'].astype('int')
df['price']=df['price'].astype('double')
df['score']=df['score'].astype('double')
df1=df[df['num']>=1000]
df2=df1[df1['price']>0]
df3=df2[df2['score']==5]
df3.index = range(len(df3))
df3
转载自:https://juejin.cn/post/6975417490666946567