Pandas的透视表


目录:

透视表

  • 透视表是一种可以对数据动态排布并且分类汇总的表格格式。或许大多数人都在Excel使用过数据透视表,也体会到它的强大功能,而在pandas中它被称作pivot_table。

  • 透视表的优点:

    • 灵活性高,可以随意定制你的分析计算要求
    • 脉络清晰易于理解数据
    • 操作性强,报表神器
# 使用pivot_table创建透视表
pivot_table(data,         # DataFrame
      values=None,        # 值
      index=None,         # 分类汇总依据
      columns=None,       # 列
      aggfunc='mean',     # 聚合函数
      fill_value=None,    # 对缺失值的填充
      margins=False,      # 是否启用总计行/列
      dropna=True,        # 删除缺失
      margins_name='All'  # 总计行/列的名称
      )

pivot_table有四个最重要的参数index、values、columns、aggfunc

  • index参数:分类汇总的分类条件

    • 每个pivot_table必须拥有一个index。如果想查看哈登对阵每个队伍的得分则需要对每一个队进行分类并计算其各类得分的平均值:
  • Columns:可以设置列层次字段

    • 对values字段进行分类

数据透视表

import pandas as pd
import numpy as np
from sqlalchemy import create_engine

engine = create_engine(
    'mysql+pymysql://root:root123@127.0.0.1:3306/testdb?charset=utf8')


detail = pd.read_sql_table('meal_order_detail1',con = engine)

detailPivot = pd.pivot_table(detail[[
      'order_id','counts','amounts']],
      index = 'order_id')
print('以order_id作为分组键创建的订单透视表为:\n',
       detailPivot.head())

detailPivot1 = pd.pivot_table(detail[[
      'order_id','counts','amounts']],
      index = 'order_id',aggfunc = np.sum)
print('以order_id作为分组键创建的订单销量与售价总和透视表为:\n',
       detailPivot1.head())

detailPivot2 = pd.pivot_table(detail[[
      'order_id','dishes_name',
      'counts','amounts']],
      index = ['order_id','dishes_name'],
      aggfunc = np.sum)
print('以order_id和dishes_name作为分组键创建的订单\
销量与售价总和透视表为:\n',detailPivot2.head())

detailPivot3 = pd.pivot_table(detail[[
      'order_id','dishes_name','counts','amounts']],
      index = 'order_id',
      columns = 'dishes_name',
      aggfunc = np.sum)
print('以order_id和dishes_name作为行列分组键创建的\
透视表前5行4列为:\n',detailPivot3.iloc[:5,:4])

detailPivot4 = pd.pivot_table(detail[[
      'order_id','dishes_name','counts','amounts']],
      index = 'order_id',
      values = 'counts',
      aggfunc = np.sum)
print('以order_id作为行分组键counts作为值创建的\
透视表前5行为:\n',detailPivot4.head())

detailPivot5 = pd.pivot_table(detail[[
      'order_id','dishes_name','counts','amounts']],
      index = 'order_id',
      columns = 'dishes_name',
      aggfunc = np.sum,fill_value = 0)
print('空值填0后以order_id和dishes_name为行列分组键\
创建透视表前5行4列为:\n',detailPivot5.iloc[:5,:4])

detailPivot6 = pd.pivot_table(detail[[
      'order_id','dishes_name','counts','amounts']],
      index = 'order_id',columns = 'dishes_name',
      aggfunc = np.sum,fill_value = 0,
      margins = True)
print('添加margins后以order_id和dishes_name为分组键\
的透视表前5行后4列为:\n',detailPivot6.iloc[:5,-4:])

detailCross = pd.crosstab(
      index=detail['order_id'],
      columns=detail['dishes_name'],
      values = detail['counts'],aggfunc = np.sum)
print('以order_id和dishes_name为分组键\
counts为值的透视表前5行5列为:\n',detailCross.iloc[:5,:5])


detail = pd.read_sql_table('meal_order_detail1',con = engine)
detail['place_order_time'] = pd.to_datetime(
      detail['place_order_time'])
detail['date'] = [i.date() for i in detail['place_order_time']]
PivotDetail = pd.pivot_table(detail[[
      'date','dishes_name','counts','amounts']],
      index ='date',aggfunc = np.sum,
      margins = True)
print('订单详情表单日菜品成交总额与总数透视表前5行5列为:\n',
      PivotDetail.head())

CrossDetail = pd.crosstab(
      index=detail['date'],columns=detail['dishes_name'],
      values = detail['amounts'],
      aggfunc = np.sum,margins = True)
print('订单详情表单日单个菜品成交总额交叉表后5行5列为:\n',
      CrossDetail.iloc[-5:,-5:])