pandas pivot table exercises

Basic Data

The test file I use for this looks like this

name,company,amount,product
tim,ibm,22,sw
tim,ibm,22,sw
tim,ibm,22,hw
tim,gbm,35,sw
juliet,indeff,10,net
juliet,indeff,20,net
juliet,indeff,30,sw
juliet,indeff,40,hw
andrew,ibm,4,hw
andrew,gbm,6,hw
andrew,ba,200,sw

Load and make sure it all looks ok

import pandas as pd
import numpy as np

df=pd.read_csv('test.csv',sep=',')
df
name company amount product
0 tim ibm 22 sw
1 tim ibm 22 sw
2 tim ibm 22 hw
3 tim gbm 35 sw
4 juliet indeff 10 net
5 juliet indeff 20 net
6 juliet indeff 30 sw
7 juliet indeff 40 hw
8 andrew ibm 4 hw
9 andrew gbm 6 hw
10 andrew ba 200 sw

Basic Pivot

This provides the mean amount - not very useful

pd.pivot_table(df,index=["name"],values=["amount"])
amount
name
andrew 70.00
juliet 25.00
tim 25.25

Pivot with a sum

#Ah ha ... now we have the sum
pd.pivot_table(df,index=["name"],values=["amount"],aggfunc=np.sum)
amount
name
andrew 210
juliet 100
tim 101

Pivot with occurances

#Now with the number of occurances
pd.pivot_table(df,index=["name"],values=["amount"],aggfunc=np.count_nonzero)
amount
name
andrew 3
juliet 4
tim 4
#Amount per Company
pd.pivot_table(df,index=['company'],values=["amount"],aggfunc=np.sum)
amount
company
ba 200
gbm 41
ibm 70
indeff 100

Pivot per item with a sum

This is now more useful.

#Amount per Company sorted Desc
pd.pivot_table(df,index=["name"],values=["amount"],aggfunc=np.sum).sort_values(by=['amount'],ascending=False)
amount
name
andrew 210
tim 101
juliet 100
#Amount per Person per company
pd.pivot_table(df,index=["name","company"],values=["amount"],aggfunc=np.sum)
amount
name company
andrew ba 200
gbm 6
ibm 4
juliet indeff 100
tim gbm 35
ibm 66

Amount per person per company

Again more details and breakdown

#Lets try and show this another way
#This time with the company's going across the page
pd.pivot_table(df,index=["name"],values=["amount"],aggfunc=np.sum,columns=["company"])
amount
company ba gbm ibm indeff
name
andrew 200.0 6.0 4.0 NaN
juliet NaN NaN NaN 100.0
tim NaN 35.0 66.0 NaN

Per company per person V2

We wish to list all the companies across the top of the document.

The trick here is the columns setting.

#Lets try and show this another way
#This time with the company's going across the page
#This time lets remove the 0's
pd.pivot_table(df,index=["name"],values=["amount"],aggfunc=np.sum,columns=["company"],fill_value=0)
amount
company ba gbm ibm indeff
name
andrew 200 6 4 0
juliet 0 0 0 100
tim 0 35 66 0
#Lets try and show this another way
#This time with the company's going across the page
#This time lets remove the 0's
pd.pivot_table(df,index=["name"],values=["amount"],aggfunc=np.sum,columns=["company"],fill_value=0,margins=True)
amount
company ba gbm ibm indeff All
name
andrew 200.0 6.0 4.0 0.0 210.0
juliet 0.0 0.0 0.0 100.0 100.0
tim 0.0 35.0 66.0 0.0 101.0
All 200.0 41.0 70.0 100.0 411.0
pd.pivot_table(df,index=["name"],values=["amount"],
               aggfunc={"amount":"sum"},
               columns=["company"],fill_value=0,margins=True)
amount
company ba gbm ibm indeff All
name
andrew 200.0 6.0 4.0 0.0 210.0
juliet 0.0 0.0 0.0 100.0 100.0
tim 0.0 35.0 66.0 0.0 101.0
All 200.0 41.0 70.0 100.0 411.0

person and company but with a total

We have added the margins=True clause

#We now want to use the "product"
#We will get the Sum of the values per company/product
#and also display the count per person
pd.pivot_table(df,index=['company','product'],aggfunc={'name':'count',"amount":"sum"},columns=["name"],fill_value=0)
amount name
name andrew juliet tim andrew juliet tim
company product
ba sw 200 0 0 1 0 0
gbm hw 6 0 0 1 0 0
sw 0 0 35 0 0 1
ibm hw 4 0 22 1 0 1
sw 0 0 44 0 0 2
indeff hw 0 40 0 0 1 0
net 0 30 0 0 2 0
sw 0 30 0 0 1 0

Prepare the query and save as a DataFrame

We have build a so we can query it some more.

#Finally as this output is a DataFrame we can filter on this
#Using the above example
res=pd.pivot_table(df,index=['company','product'],aggfunc={'name':'count',"amount":"sum"},columns=["name"],fill_value=0)

Query for a Columns to be a set value

#You want only hw info
res.query('product == ["hw"]')
amount name
name andrew juliet tim andrew juliet tim
company product
gbm hw 6 0 0 1 0 0
ibm hw 4 0 22 1 0 1
indeff hw 0 40 0 0 1 0

Query for hw and sw

#HW and Sw ?
res.query('product == ["hw","sw"]')
amount name
name andrew juliet tim andrew juliet tim
company product
ba sw 200 0 0 1 0 0
gbm hw 6 0 0 1 0 0
sw 0 0 35 0 0 1
ibm hw 4 0 22 1 0 1
sw 0 0 44 0 0 2
indeff hw 0 40 0 0 1 0
sw 0 30 0 0 1 0

Query for a Company

#Just 1 company
res.query('company == ["ibm"]')
amount name
name andrew juliet tim andrew juliet tim
company product
ibm hw 4 0 22 1 0 1
sw 0 0 44 0 0 2