cd/gt03/Summary Table

To import the module

import pandas as pd
from mtbp3cd.util.gt03summary import crosstab_from_lists

Example - 1 Generate data summary tables

The output includes:

  1. Count of rows of input data by output-col-columns and output-row-columns

  2. Normalized counts by perct_within_index

  3. Report with format “count (%)”

To create a dataset:

data = {
    'A': ['foo','foo', 'foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'baz', 'baz', 'baz'],
    'B': ['one','one','one','one', 'one', 'two', 'two', 'one', 'one', 'two', 'two'],
    'C': ['y','x','x','x', 'y', 'x', 'y', 'x', 'y', 'x', 'y'],
    'D': ['apple','apple','apple','apple', 'banana', 'apple', 'banana', 'apple', 'banana', 'apple', 'banana'],
    'E': ['red','blue','red','red', 'red', 'blue', 'blue', 'red', 'blue', 'red', 'blue'],
    'value': [0,1,1,1, 2, 3, 4, 5, 6, 7, 8]
}
df = pd.DataFrame(data)

# Use crosstab_from_lists with 3-level multi-index for rows and columns
rows = ['A', 'B', 'C']
cols = ['D', 'E']

To create a frequency (count) table:

(crosstab_from_lists(df, rows, cols))['count']
D apple banana All
E blue red blue red
A B C
bar one x 0 1 0 0 1
two y 0 0 1 0 1
baz one y 0 0 1 0 1
two x 0 1 0 0 1
y 0 0 1 0 1
foo one x 1 2 0 0 3
y 0 1 0 1 2
two x 1 0 0 0 1
All 2 5 3 1 11

To create a table with row-wise percentage:

(crosstab_from_lists(df, rows, cols, rows, col_margin_perct=True))['report']
D apple banana All
E blue red blue red
A B C
bar one x 0 (0.0%) 1 (100.0%) 0 (0.0%) 0 (0.0%) 1
two y 0 (0.0%) 0 (0.0%) 1 (100.0%) 0 (0.0%) 1
baz one y 0 (0.0%) 0 (0.0%) 1 (100.0%) 0 (0.0%) 1
two x 0 (0.0%) 1 (100.0%) 0 (0.0%) 0 (0.0%) 1
y 0 (0.0%) 0 (0.0%) 1 (100.0%) 0 (0.0%) 1
foo one x 1 (33.3%) 2 (66.7%) 0 (0.0%) 0 (0.0%) 3
y 0 (0.0%) 1 (50.0%) 0 (0.0%) 1 (50.0%) 2
two x 1 (100.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 1
All 2 (18.2%) 5 (45.5%) 3 (27.3%) 1 (9.1%) 11

To create a table with grouped row-wise percentage:

(crosstab_from_lists(df, rows, cols, ['A','B'], col_margin_perct=True))['report']
D apple banana All
E blue red blue red
A B C
bar one x 0 (0.0%) 1 (100.0%) 0 (0.0%) 0 (0.0%) 1
two y 0 (0.0%) 0 (0.0%) 1 (100.0%) 0 (0.0%) 1
baz one y 0 (0.0%) 0 (0.0%) 1 (100.0%) 0 (0.0%) 1
two x 0 (0.0%) 1 (50.0%) 0 (0.0%) 0 (0.0%) 1
y 0 (0.0%) 0 (0.0%) 1 (50.0%) 0 (0.0%) 1
foo one x 1 (20.0%) 2 (40.0%) 0 (0.0%) 0 (0.0%) 3
y 0 (0.0%) 1 (20.0%) 0 (0.0%) 1 (20.0%) 2
two x 1 (100.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 1
All 2 (18.2%) 5 (45.5%) 3 (27.3%) 1 (9.1%) 11

To create a table with column-wise percentage:

(crosstab_from_lists(df, rows, cols, cols, row_margin_perct=True))['report']
D apple banana All
E blue red blue red
A B C
bar one x 0 (0.0%) 1 (20.0%) 0 (0.0%) 0 (0.0%) 1 (9.1%)
two y 0 (0.0%) 0 (0.0%) 1 (33.3%) 0 (0.0%) 1 (9.1%)
baz one y 0 (0.0%) 0 (0.0%) 1 (33.3%) 0 (0.0%) 1 (9.1%)
two x 0 (0.0%) 1 (20.0%) 0 (0.0%) 0 (0.0%) 1 (9.1%)
y 0 (0.0%) 0 (0.0%) 1 (33.3%) 0 (0.0%) 1 (9.1%)
foo one x 1 (50.0%) 2 (40.0%) 0 (0.0%) 0 (0.0%) 3 (27.3%)
y 0 (0.0%) 1 (20.0%) 0 (0.0%) 1 (100.0%) 2 (18.2%)
two x 1 (50.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 1 (9.1%)
All 2 5 3 1 11

To create a table with grouped column-wise percentage:

(crosstab_from_lists(df, rows, cols, ['D'], row_margin_perct=True))['report']
D apple banana All
E blue red blue red
A B C
bar one x 0 (0.0%) 1 (14.3%) 0 (0.0%) 0 (0.0%) 1 (9.1%)
two y 0 (0.0%) 0 (0.0%) 1 (25.0%) 0 (0.0%) 1 (9.1%)
baz one y 0 (0.0%) 0 (0.0%) 1 (25.0%) 0 (0.0%) 1 (9.1%)
two x 0 (0.0%) 1 (14.3%) 0 (0.0%) 0 (0.0%) 1 (9.1%)
y 0 (0.0%) 0 (0.0%) 1 (25.0%) 0 (0.0%) 1 (9.1%)
foo one x 1 (14.3%) 2 (28.6%) 0 (0.0%) 0 (0.0%) 3 (27.3%)
y 0 (0.0%) 1 (14.3%) 0 (0.0%) 1 (25.0%) 2 (18.2%)
two x 1 (14.3%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 1 (9.1%)
All 2 5 3 1 11

To create a table with grouped row-and-column-wise percentage:

(crosstab_from_lists(df, rows, cols, ['A', 'D']))['report']
D apple banana All
E blue red blue red
A B C
bar one x 0 (0.0%) 1 (100.0%) 0 (0.0%) 0 (0.0%) 1
two y 0 (0.0%) 0 (0.0%) 1 (100.0%) 0 (0.0%) 1
baz one y 0 (0.0%) 0 (0.0%) 1 (50.0%) 0 (0.0%) 1
two x 0 (0.0%) 1 (100.0%) 0 (0.0%) 0 (0.0%) 1
y 0 (0.0%) 0 (0.0%) 1 (50.0%) 0 (0.0%) 1
foo one x 1 (20.0%) 2 (40.0%) 0 (0.0%) 0 (0.0%) 3
y 0 (0.0%) 1 (20.0%) 0 (0.0%) 1 (100.0%) 2
two x 1 (20.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 1
All 2 5 3 1 11

To create a table with grouped row-and-column-wise percentage:

(crosstab_from_lists(df, rows, cols, ['A', 'D'], row_margin_perct=True, col_margin_perct=True))['report']
D apple banana All
E blue red blue red
A B C
bar one x 0 (0.0%) 1 (100.0%) 0 (0.0%) 0 (0.0%) 1 (50.0%)
two y 0 (0.0%) 0 (0.0%) 1 (100.0%) 0 (0.0%) 1 (50.0%)
baz one y 0 (0.0%) 0 (0.0%) 1 (50.0%) 0 (0.0%) 1 (33.3%)
two x 0 (0.0%) 1 (100.0%) 0 (0.0%) 0 (0.0%) 1 (33.3%)
y 0 (0.0%) 0 (0.0%) 1 (50.0%) 0 (0.0%) 1 (33.3%)
foo one x 1 (20.0%) 2 (40.0%) 0 (0.0%) 0 (0.0%) 3 (50.0%)
y 0 (0.0%) 1 (20.0%) 0 (0.0%) 1 (100.0%) 2 (33.3%)
two x 1 (20.0%) 0 (0.0%) 0 (0.0%) 0 (0.0%) 1 (16.7%)
All 2 (28.6%) 5 (71.4%) 3 (75.0%) 1 (25.0%) 11 (100.0%)

To create a table with grouped row-wise percentage and total:

(crosstab_from_lists(df, rows, cols, ['A','B'], col_margin_perct=True, report_type=2))['report']
D apple banana All
E blue red blue red
A B C
bar one x 0/1 (0.0%) 1/1 (100.0%) 0/1 (0.0%) 0/1 (0.0%) 1
two y 0/1 (0.0%) 0/1 (0.0%) 1/1 (100.0%) 0/1 (0.0%) 1
baz one y 0/1 (0.0%) 0/1 (0.0%) 1/1 (100.0%) 0/1 (0.0%) 1
two x 0/2 (0.0%) 1/2 (50.0%) 0/2 (0.0%) 0/2 (0.0%) 1
y 0/2 (0.0%) 0/2 (0.0%) 1/2 (50.0%) 0/2 (0.0%) 1
foo one x 1/5 (20.0%) 2/5 (40.0%) 0/5 (0.0%) 0/5 (0.0%) 3
y 0/5 (0.0%) 1/5 (20.0%) 0/5 (0.0%) 1/5 (20.0%) 2
two x 1/1 (100.0%) 0/1 (0.0%) 0/1 (0.0%) 0/1 (0.0%) 1
All 2/11 (18.2%) 5/11 (45.5%) 3/11 (27.3%) 1/11 (9.1%) 11