Pandas：如何按索引分组、计算

站长

2024年04月09日 14:02 · 阅读数 52

你可以使用以下方法在pandas中按一个或多个索引列分组并进行一些计算。

方法1：按一个索引列分组

df.groupby('index1')['numeric_column'].max()

方法2：按多个索引列分组

df.groupby(['index1', 'index2'])['numeric_column'].sum()

方法3：按索引列和常规列分组

df.groupby(['index1', 'numeric_column1'])['numeric_column2'].nunique()

下面的例子展示了如何用下面这个有多指标的pandas DataFrame来使用每种方法。

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'position': ['G', 'G', 'G', 'F', 'F', 'G', 'G', 'F', 'F', 'F'],
                   'points': [7, 7, 7, 19, 16, 9, 10, 10, 8, 8],
                   'rebounds': [8, 8, 8, 10, 11, 12, 13, 13, 15, 11]})

#set 'team' column to be index column
df.set_index(['team', 'position'], inplace=True)

#view DataFrame
df

		 points	 rebounds
team	position		
A	G	 7	 8
        G	 7	 8
        G	 7	 8
        F	 19	 10
        F	 16	 11
B	G	 9	 12
        G	 10	 13
        F	 10	 13
        F	 8	 15
        F	 8	 11

方法1：按一个索引列分组

下面的代码显示了如何找到'points'列的最大值，并通过'position'索引列进行分组。

#find max value of 'points' grouped by 'position index column
df.groupby('position')['points'].max()

position
F    19
G    10
Name: points, dtype: int64

方法2：按多个索引列分组

下面的代码显示了如何通过 "球队 "和 "位置 "索引列来找到 "积分 "列的总和。

#find max value of 'points' grouped by 'position index column
df.groupby(['team', 'position'])['points'].sum()

team  position
A     F           35
      G           21
B     F           26
      G           19
Name: points, dtype: int64

方法3：按索引列和常规列分组

下面的代码显示了如何在'篮板'列中找到唯一值的数量，并通过索引列'团队'和普通列'积分'分组。

#find max value of 'points' grouped by 'position index column
df.groupby(['team', 'points'])['rebounds'].nunique()

team  points
A     7         1
      16        1
      19        1
B     8         2
      9         1
      10        1
Name: rebounds, dtype: int64

转载自:https://juejin.cn/post/7023306397097918477