.. # Copyright (C) 2023-2024 Y Hsu # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public license as published by # the Free software Foundation, either version 3 of the License, or # any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details # # You should have received a copy of the GNU General Public license # along with this program. If not, see .. role:: red-b .. role:: red ############# Reliability/cate/Fleiss's Kappa ############# :red-b:`Disclaimer:` :red:`This page is provided only for studying and practicing. The author does not intend to promote or advocate any particular analysis method or software.` ************* Background ************* Fleiss's kappa (:math:`\kappa`) is a statistic used for describing inter-rater reliability of multiple independent raters with categorical rating outcomes [1]_ [2]_. ************* Notation ************* Assume there are the same :math:`R+N_0` (:math:`\geq 2+N_0`) raters and each of :math:`n` samples were rated by :math:`R` randomly selected raters and were not rated by the rest of :math:`N_0` raters. For :math:`J` categories rating, let :math:`Y_{r,i} \in \{v_0, v_1,v_2,\ldots, v_J \}` represent rating from rater :math:`r=1,2,\ldots,R+N_0` for sample :math:`i = 1, \ldots, n`. Let :math:`N_{ij}` represent the total number of raters gave rating :math:`(v_j)` to sample :math:`i`, where :math:`j \in \{0, 1,\ldots,J\}`. The value :math:`v_0` represent raters did not rate the sample :math:`i` and :math:`N_{i0}=N_0` is a fixed number for all :math:`i`. Therefore, :math:`v_0` will not be included in the discussion below. .. list-table:: Count of Ratings :widths: 10 10 10 10 10 10 :header-rows: 1 :name: tbl_count1 * - - :math:`v_1` - :math:`v_2` - :math:`\ldots` - :math:`v_J` - Row Total * - **Sample:** 1 - :math:`N_{11}` - :math:`N_{12}` - :math:`\ldots` - :math:`N_{1J}` - :math:`R` * - **Sample:** 2 - :math:`N_{21}` - :math:`N_{22}` - :math:`\ldots` - :math:`N_{2J}` - :math:`R` * - **Sample:** 3 - :math:`N_{31}` - :math:`N_{32}` - :math:`\ldots` - :math:`N_{3J}` - :math:`R` * - :math:`\vdots` - :math:`\vdots` - :math:`\vdots` - :math:`\ddots` - :math:`\vdots` - :math:`\vdots` * - **Sample:** :math:`n` - :math:`N_{n1}` - :math:`N_{n2}` - :math:`\ldots` - :math:`N_{nJ}` - :math:`R` * - **Column total** - :math:`N_{\bullet 1}` - :math:`N_{\bullet 2}` - :math:`\ldots` - :math:`N_{\bullet J}` - :math:`nR` The observed averaged agreement is calculated as .. math:: :label: eq_obs1 \bar{p}_O = \frac{1}{n} \sum_{i=1}^n p_{O,i}, where :math:`p_{O,i} = \frac{1}{R(R-1)} \left(\sum_{j=1}^J N_{ij}(N_{ij}-1)\right)= \frac{1}{R(R-1)} \left(\sum_{j=1}^J N_{ij}^2 - R\right)`. The expected agreement is calculated as .. math:: :label: eq_exp1 \bar{p}_E = \sum_{j=1}^J p_{E,j}^2, where :math:`p_{E,j} = \frac{N_{\bullet j}}{nR}`. The Fleiss's :math:`\kappa` statistic is calculated from :eq:`eq_obs1` and :eq:`eq_exp1` as .. math:: :label: eq_kappa1 \kappa = \frac{\bar{p}_O - \bar{p}_E}{1-\bar{p}_E}. ************* Example - Group-1 ************* .. list-table:: Fleiss's :math:`\kappa = 1.0` :widths: 10 10 10 10 10 :header-rows: 1 * - - :math:`v_1` - :math:`v_2` - :math:`v_3` - :math:`v_4` * - **Sample 1** - 12 - 0 - 0 - 0 * - **Sample 2** - 0 - 12 - 0 - 0 * - **Sample 3** - 0 - 0 - 12 - 0 * - **Sample 4** - 0 - 0 - 12 - 0 * - **Sample 5** - 0 - 0 - 0 - 12 * - **Column Total** - 12 - 12 - 24 - 12 .. list-table:: Fleiss's :math:`\kappa` = -0.0909090909090909 :widths: 10 10 10 10 10 :header-rows: 1 * - - :math:`v_1` - :math:`v_2` - :math:`v_3` - :math:`v_4` * - **Sample 1** - 3 - 3 - 3 - 3 * - **Sample 2** - 3 - 3 - 3 - 3 * - **Sample 3** - 3 - 3 - 3 - 3 * - **Sample 4** - 3 - 3 - 3 - 3 * - **Sample 5** - 3 - 3 - 3 - 3 * - **Column Total** - 15 - 15 - 15 - 15 ************* How-to ************* To use both ``statsmodels.stats.inter_rater`` and ``mtbp3Lab.statlab``: .. testcode:: import statsmodels.stats.inter_rater as ir from mtbp3Lab.statlab import kappa r1 = ['NA'] * 20 + ['B'] * 50 + ['A'] * 30 r2 = ['A'] * 20 + ['NA'] * 20 + ['B'] * 60 r3 = ['A'] * 40 + ['NA'] * 20 + ['B'] * 30 + ['C'] * 10 r4 = ['B'] * 60 + ['NA'] * 20 + ['C'] * 10 + ['A'] * 10 r5 = ['C'] * 60 + ['A'] * 10 + ['B'] * 10 + ['NA'] * 20 data = [r1, r2, r3, r4, r5] kappa = KappaCalculator(data, stringna='NA') print("Fleiss's kappa (stasmodels.stats.inter_rater): "+str(ir.fleiss_kappa(kappa.y_count))) print("Fleiss's kappa (mtbp3Lab.statlab): "+str(kappa.fleiss_kappa)) print("Number of raters per sample: "+str(kappa.n_rater)) print("Number of rating categories: "+str(kappa.n_category)) print("Number of sample: "+str(kappa.y_count.shape[0])) Output: .. testoutput:: Fleiss's kappa (stasmodels.stats.inter_rater): -0.14989733059548255 Fleiss's kappa (mtbp3Lab.statlab): -0.14989733059548255 Number of raters per sample: 4.0 Number of rating categories: 3 Number of sample: 100 ************* Lab Exercise ************* 1. Find Bootstrap CI of Fleiss's kappa. (see the function of Cohen's kappa CI) ************* More Details ************* :eq:`eq_obs1` corresponds to the observed probability of having agreement for a sample from two randomly selected raters estimated from :numref:`Tabel %s `. :eq:`eq_exp1` corresponds to the expected probability of having agreement for a sample from two randomly selected raters under the assumption of no agreement, which corresponds to the assumption of :math:`(N_{i1},\ldots, N_{iJ}) \sim multi(R, (p_1,\ldots, p_J))` where :math:`R>4`. Let :math:`S_{p2} = \sum_j p_j^2`, :math:`S_{p3} = \sum_j p_j^3`, and :math:`S_{p4} = \sum_j p_j^4`. The equation :eq:`eq_kappa1` can be expressed as [2]_ :sup:`(Eq. 9)`, .. math:: \kappa = \frac{\sum_{i=1}^{n}\sum_{j=1}^J N_{ij}^2 - nR\left(1+(R-1) S_{p2} \right)}{nR(R-1)(1- S_{p2} )} Note that Fleiss (1971) assumed large :math:`n` and fixed :math:`p_j` while deriving variance of kappa. Please see the Fleiss (1971) for more discussions. The variance of :math:`\kappa` under the assumption of no agreement beyond chance can be approximated as: .. math:: :label: eq_kappa2_vk var(\kappa) = c(n,R,\{p_j\}) var\left(\sum_{j=1}^J N_{1j}^2 \right), where .. math:: c(n,R,\{p_j\}) = n^{-1}\left(R(R-1)\left(1-S_{p2}\right)\right)^{-2}, and .. math:: :label: eq_kappa2_vn2 var\left(\sum_{j} N_{ij}^2 \right) =& E\left( \left(\sum_{j} N_{ij}^2\right)^2\right) - \left(E\left(\sum_{j} N_{ij}^2\right)\right)^2 \\ =& E\left(\sum_{j} N_{ij}^4\right) + E\left(\sum_j\sum_{k \neq j} N_{ij}^2 N_{ik}^2 \right) - \left(E\left(\sum_{j} N_{ij}^2\right)\right)^2. To calculate :eq:`eq_kappa2_vn2`, we can use the MGF, :math:`\left(\sum_{j}p_je^{t_j}\right)^R`, to derive :math:`E\left(N_{ij}^2\right) = Rp_j + R(R-1)p_j^2`, and :math:`E\left(N_{ij}^3\right) = Rp_j + 3R(R-1)p_j^2 + R(R-1)(R-2)p_j^3`. The first element of :eq:`eq_kappa2_vn2` can be calculated as [2]_ :sup:`(Eq. 12)` .. math:: :label: eq_kappa2_vn3 E\left(\sum_{j} N_{ij}^4\right) = R + 7R(R-1)S_{p2} + 6R(R-1)(R-2)S_{p3} + R(R-1)(R-2)(R-3)S_{p4} The third element of :eq:`eq_kappa2_vn2` can be calculated as [2]_ :sup:`(Eq. 14)` .. math:: :label: eq_kappa2_vn4 \left(E\left(\sum_{j} N_{ij}^2\right)\right)^2 =& R^2\left(1 + (R-1)S_{p2} \right)^2 \\ =& R^2 + R^2(R-1)\left(2 S_{p2} + (R-1)S_{p2}^2\right) The second element of :eq:`eq_kappa2_vn2` can be calculated, using :math:`E\left( N_{ij}^2 N_{ik}^2 \right) = R(R-1)p_j(p_k+(R-2)p_k^2) + R(R-1)(R-2)p_j^2(p_k+(R-3)p_k^2)`, as .. math:: :label: eq_kappa2_vn5 E\left( \sum_j\sum_{k \neq j} N_{ij}^2 N_{ik}^2 \right) =& R(R-1) + R(R-1)(2R-5)S_{p2} - 2R(R-1)(R-2)S_{p3} \\ &- R(R-1)(R-2)(R-3)S_{p4} + R(R-1)(R-2)(R-3) S_{p2}^2 Combining :eq:`eq_kappa2_vn3`, :eq:`eq_kappa2_vn4`, and :eq:`eq_kappa2_vn5`, :eq:`eq_kappa2_vn2` can be calculated as [2]_ :sup:`(Eq. 15)` .. math:: var\left(\sum_{j} N_{ij}^2 \right) = 2R(R-1)\left(S_{p2} - (2R-3)S_{p2}^2 + 2(R-2)S_{p3}\right). Let :math:`s^2` be the estimated variance of :math:`\kappa` using :eq:`eq_kappa2_vk`. Under the hypothesis of no agreement beyond chances, the limit distribution :math:`\kappa/s` would be a standard normal distribution. The value of :math:`\kappa/s` then could be used to describe if the overall agreement is greater then by chance alone [2]_. ************* Lab Exercise ************* 2. Find :math:`Cov(N_{i1},N_{i2})` under no agreement assumption. ************* Reference ************* .. [1] Wikipedia. (year). Fleiss's kappa. https://en.wikipedia.org/wiki/Fleiss%27_kappa .. [2] Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378-382. https://doi.org/10.1037/h0031619