cd/gt03/Summary Table
To import the module
import pandas as pd
from mtbp3cd.util.gt03summary import crosstab_from_lists
Example - 1 Generate data summary tables
The output includes:
Count of rows of input data by output-col-columns and output-row-columns
Normalized counts by perct_within_index
Report with format “count (%)”
To create a dataset:
data = {
'A': ['foo','foo', 'foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'baz', 'baz', 'baz'],
'B': ['one','one','one','one', 'one', 'two', 'two', 'one', 'one', 'two', 'two'],
'C': ['y','x','x','x', 'y', 'x', 'y', 'x', 'y', 'x', 'y'],
'D': ['apple','apple','apple','apple', 'banana', 'apple', 'banana', 'apple', 'banana', 'apple', 'banana'],
'E': ['red','blue','red','red', 'red', 'blue', 'blue', 'red', 'blue', 'red', 'blue'],
'value': [0,1,1,1, 2, 3, 4, 5, 6, 7, 8]
}
df = pd.DataFrame(data)
# Use crosstab_from_lists with 3-level multi-index for rows and columns
rows = ['A', 'B', 'C']
cols = ['D', 'E']
To create a frequency (count) table:
(crosstab_from_lists(df, rows, cols))['count']
| D | apple | banana | All | ||||
|---|---|---|---|---|---|---|---|
| E | blue | red | blue | red | |||
| A | B | C | |||||
| bar | one | x | 0 | 1 | 0 | 0 | 1 |
| two | y | 0 | 0 | 1 | 0 | 1 | |
| baz | one | y | 0 | 0 | 1 | 0 | 1 |
| two | x | 0 | 1 | 0 | 0 | 1 | |
| y | 0 | 0 | 1 | 0 | 1 | ||
| foo | one | x | 1 | 2 | 0 | 0 | 3 |
| y | 0 | 1 | 0 | 1 | 2 | ||
| two | x | 1 | 0 | 0 | 0 | 1 | |
| All | 2 | 5 | 3 | 1 | 11 | ||
To create a table with row-wise percentage:
(crosstab_from_lists(df, rows, cols, rows, col_margin_perct=True))['report']
| D | apple | banana | All | ||||
|---|---|---|---|---|---|---|---|
| E | blue | red | blue | red | |||
| A | B | C | |||||
| bar | one | x | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 1 |
| two | y | 0 (0.0%) | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 1 | |
| baz | one | y | 0 (0.0%) | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 1 |
| two | x | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 1 | |
| y | 0 (0.0%) | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 1 | ||
| foo | one | x | 1 (33.3%) | 2 (66.7%) | 0 (0.0%) | 0 (0.0%) | 3 |
| y | 0 (0.0%) | 1 (50.0%) | 0 (0.0%) | 1 (50.0%) | 2 | ||
| two | x | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 1 | |
| All | 2 (18.2%) | 5 (45.5%) | 3 (27.3%) | 1 (9.1%) | 11 | ||
To create a table with grouped row-wise percentage:
(crosstab_from_lists(df, rows, cols, ['A','B'], col_margin_perct=True))['report']
| D | apple | banana | All | ||||
|---|---|---|---|---|---|---|---|
| E | blue | red | blue | red | |||
| A | B | C | |||||
| bar | one | x | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 1 |
| two | y | 0 (0.0%) | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 1 | |
| baz | one | y | 0 (0.0%) | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 1 |
| two | x | 0 (0.0%) | 1 (50.0%) | 0 (0.0%) | 0 (0.0%) | 1 | |
| y | 0 (0.0%) | 0 (0.0%) | 1 (50.0%) | 0 (0.0%) | 1 | ||
| foo | one | x | 1 (20.0%) | 2 (40.0%) | 0 (0.0%) | 0 (0.0%) | 3 |
| y | 0 (0.0%) | 1 (20.0%) | 0 (0.0%) | 1 (20.0%) | 2 | ||
| two | x | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 1 | |
| All | 2 (18.2%) | 5 (45.5%) | 3 (27.3%) | 1 (9.1%) | 11 | ||
To create a table with column-wise percentage:
(crosstab_from_lists(df, rows, cols, cols, row_margin_perct=True))['report']
| D | apple | banana | All | ||||
|---|---|---|---|---|---|---|---|
| E | blue | red | blue | red | |||
| A | B | C | |||||
| bar | one | x | 0 (0.0%) | 1 (20.0%) | 0 (0.0%) | 0 (0.0%) | 1 (9.1%) |
| two | y | 0 (0.0%) | 0 (0.0%) | 1 (33.3%) | 0 (0.0%) | 1 (9.1%) | |
| baz | one | y | 0 (0.0%) | 0 (0.0%) | 1 (33.3%) | 0 (0.0%) | 1 (9.1%) |
| two | x | 0 (0.0%) | 1 (20.0%) | 0 (0.0%) | 0 (0.0%) | 1 (9.1%) | |
| y | 0 (0.0%) | 0 (0.0%) | 1 (33.3%) | 0 (0.0%) | 1 (9.1%) | ||
| foo | one | x | 1 (50.0%) | 2 (40.0%) | 0 (0.0%) | 0 (0.0%) | 3 (27.3%) |
| y | 0 (0.0%) | 1 (20.0%) | 0 (0.0%) | 1 (100.0%) | 2 (18.2%) | ||
| two | x | 1 (50.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 1 (9.1%) | |
| All | 2 | 5 | 3 | 1 | 11 | ||
To create a table with grouped column-wise percentage:
(crosstab_from_lists(df, rows, cols, ['D'], row_margin_perct=True))['report']
| D | apple | banana | All | ||||
|---|---|---|---|---|---|---|---|
| E | blue | red | blue | red | |||
| A | B | C | |||||
| bar | one | x | 0 (0.0%) | 1 (14.3%) | 0 (0.0%) | 0 (0.0%) | 1 (9.1%) |
| two | y | 0 (0.0%) | 0 (0.0%) | 1 (25.0%) | 0 (0.0%) | 1 (9.1%) | |
| baz | one | y | 0 (0.0%) | 0 (0.0%) | 1 (25.0%) | 0 (0.0%) | 1 (9.1%) |
| two | x | 0 (0.0%) | 1 (14.3%) | 0 (0.0%) | 0 (0.0%) | 1 (9.1%) | |
| y | 0 (0.0%) | 0 (0.0%) | 1 (25.0%) | 0 (0.0%) | 1 (9.1%) | ||
| foo | one | x | 1 (14.3%) | 2 (28.6%) | 0 (0.0%) | 0 (0.0%) | 3 (27.3%) |
| y | 0 (0.0%) | 1 (14.3%) | 0 (0.0%) | 1 (25.0%) | 2 (18.2%) | ||
| two | x | 1 (14.3%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 1 (9.1%) | |
| All | 2 | 5 | 3 | 1 | 11 | ||
To create a table with grouped row-and-column-wise percentage:
(crosstab_from_lists(df, rows, cols, ['A', 'D']))['report']
| D | apple | banana | All | ||||
|---|---|---|---|---|---|---|---|
| E | blue | red | blue | red | |||
| A | B | C | |||||
| bar | one | x | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 1 |
| two | y | 0 (0.0%) | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 1 | |
| baz | one | y | 0 (0.0%) | 0 (0.0%) | 1 (50.0%) | 0 (0.0%) | 1 |
| two | x | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 1 | |
| y | 0 (0.0%) | 0 (0.0%) | 1 (50.0%) | 0 (0.0%) | 1 | ||
| foo | one | x | 1 (20.0%) | 2 (40.0%) | 0 (0.0%) | 0 (0.0%) | 3 |
| y | 0 (0.0%) | 1 (20.0%) | 0 (0.0%) | 1 (100.0%) | 2 | ||
| two | x | 1 (20.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 1 | |
| All | 2 | 5 | 3 | 1 | 11 | ||
To create a table with grouped row-and-column-wise percentage:
(crosstab_from_lists(df, rows, cols, ['A', 'D'], row_margin_perct=True, col_margin_perct=True))['report']
| D | apple | banana | All | ||||
|---|---|---|---|---|---|---|---|
| E | blue | red | blue | red | |||
| A | B | C | |||||
| bar | one | x | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 1 (50.0%) |
| two | y | 0 (0.0%) | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 1 (50.0%) | |
| baz | one | y | 0 (0.0%) | 0 (0.0%) | 1 (50.0%) | 0 (0.0%) | 1 (33.3%) |
| two | x | 0 (0.0%) | 1 (100.0%) | 0 (0.0%) | 0 (0.0%) | 1 (33.3%) | |
| y | 0 (0.0%) | 0 (0.0%) | 1 (50.0%) | 0 (0.0%) | 1 (33.3%) | ||
| foo | one | x | 1 (20.0%) | 2 (40.0%) | 0 (0.0%) | 0 (0.0%) | 3 (50.0%) |
| y | 0 (0.0%) | 1 (20.0%) | 0 (0.0%) | 1 (100.0%) | 2 (33.3%) | ||
| two | x | 1 (20.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 1 (16.7%) | |
| All | 2 (28.6%) | 5 (71.4%) | 3 (75.0%) | 1 (25.0%) | 11 (100.0%) | ||
To create a table with grouped row-wise percentage and total:
(crosstab_from_lists(df, rows, cols, ['A','B'], col_margin_perct=True, report_type=2))['report']
| D | apple | banana | All | ||||
|---|---|---|---|---|---|---|---|
| E | blue | red | blue | red | |||
| A | B | C | |||||
| bar | one | x | 0/1 (0.0%) | 1/1 (100.0%) | 0/1 (0.0%) | 0/1 (0.0%) | 1 |
| two | y | 0/1 (0.0%) | 0/1 (0.0%) | 1/1 (100.0%) | 0/1 (0.0%) | 1 | |
| baz | one | y | 0/1 (0.0%) | 0/1 (0.0%) | 1/1 (100.0%) | 0/1 (0.0%) | 1 |
| two | x | 0/2 (0.0%) | 1/2 (50.0%) | 0/2 (0.0%) | 0/2 (0.0%) | 1 | |
| y | 0/2 (0.0%) | 0/2 (0.0%) | 1/2 (50.0%) | 0/2 (0.0%) | 1 | ||
| foo | one | x | 1/5 (20.0%) | 2/5 (40.0%) | 0/5 (0.0%) | 0/5 (0.0%) | 3 |
| y | 0/5 (0.0%) | 1/5 (20.0%) | 0/5 (0.0%) | 1/5 (20.0%) | 2 | ||
| two | x | 1/1 (100.0%) | 0/1 (0.0%) | 0/1 (0.0%) | 0/1 (0.0%) | 1 | |
| All | 2/11 (18.2%) | 5/11 (45.5%) | 3/11 (27.3%) | 1/11 (9.1%) | 11 | ||