{ "cells": [ { "cell_type": "markdown", "id": "b3cb023c", "metadata": {}, "source": [ "## Survival analysis" ] }, { "cell_type": "markdown", "id": "aedae0d7", "metadata": {}, "source": [ "sparsesurv [1] operates on survival analysis data. Below, we quote the notation from the supplementary section of our manuscript to ensure we are on the same page in terms of notation and language.\n", "\n", "> In particular survival concerns the analysis and modeling of a non-negative random variable $T > 0$, that is used to model the time until an event of interest occurs. In observational survival datasets, we let $T_i$ and $C_i$ denote the event and right-censoring times of patient $i$. In right-censored survival analysis, we observe triplets $(x_i, \\delta_i, O_i)$, where $O_i = \\text{min}(T_i, C_i)$ and $\\delta_i = {1}(T_i \\leq C_i)$. Throughout we assume conditionally independent censoring and non-informative censoring. That is, $T \\perp\\!\\!\\!\\!\\perp C \\mid X$ and $C$ may not be a function of any of the parameters of $T$ \\citep{kalbfleisch2011statistical}. Further, let $\\lambda$ denote the hazard function, $\\Lambda$ be the cumulative hazard function, and $S(t) = 1 - F(t)$ be the survival function, where $F(t)$ denotes the cumulative distribution function. We let $\\tilde T$ be the set of unique, ascending-ordered death times. $R_i$ is the risk set at time $i$, that is, $R(i) = \\{j: O_j \\geq O_i\\}$. $D_i$ denotes the death set at time $i$, $D(i) = \\{j: O_j = i \\land \\delta_i = 1\\}$.\n", "\n", "For now, *sparsesurv* operats solely on right censored data, although we may consider an extension to other censoring and truncation schemes, if there is interest. We now briefly show an example right-censored survival dataset available in *scikit-survival* [4], another Python package for survival analysis." ] }, { "cell_type": "code", "execution_count": 1, "id": "abecf90f", "metadata": {}, "outputs": [], "source": [ "from sksurv.datasets import load_flchain\n", "X, y = load_flchain()" ] }, { "cell_type": "code", "execution_count": 2, "id": "e92dd399", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | age | \n", "chapter | \n", "creatinine | \n", "flc.grp | \n", "kappa | \n", "lambda | \n", "mgus | \n", "sample.yr | \n", "sex | \n", "
---|---|---|---|---|---|---|---|---|---|
0 | \n", "97.0 | \n", "Circulatory | \n", "1.7 | \n", "10 | \n", "5.700 | \n", "4.860 | \n", "no | \n", "1997 | \n", "F | \n", "
1 | \n", "92.0 | \n", "Neoplasms | \n", "0.9 | \n", "1 | \n", "0.870 | \n", "0.683 | \n", "no | \n", "2000 | \n", "F | \n", "
2 | \n", "94.0 | \n", "Circulatory | \n", "1.4 | \n", "10 | \n", "4.360 | \n", "3.850 | \n", "no | \n", "1997 | \n", "F | \n", "
3 | \n", "92.0 | \n", "Circulatory | \n", "1.0 | \n", "9 | \n", "2.420 | \n", "2.220 | \n", "no | \n", "1996 | \n", "F | \n", "
4 | \n", "93.0 | \n", "Circulatory | \n", "1.1 | \n", "6 | \n", "1.320 | \n", "1.690 | \n", "no | \n", "1996 | \n", "F | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
7869 | \n", "52.0 | \n", "NaN | \n", "1.0 | \n", "6 | \n", "1.210 | \n", "1.610 | \n", "no | \n", "1995 | \n", "F | \n", "
7870 | \n", "52.0 | \n", "NaN | \n", "0.8 | \n", "1 | \n", "0.858 | \n", "0.581 | \n", "no | \n", "1999 | \n", "F | \n", "
7871 | \n", "54.0 | \n", "NaN | \n", "NaN | \n", "8 | \n", "1.700 | \n", "1.720 | \n", "no | \n", "2002 | \n", "F | \n", "
7872 | \n", "53.0 | \n", "NaN | \n", "NaN | \n", "9 | \n", "1.710 | \n", "2.690 | \n", "no | \n", "1995 | \n", "F | \n", "
7873 | \n", "50.0 | \n", "NaN | \n", "0.7 | \n", "4 | \n", "1.190 | \n", "1.250 | \n", "no | \n", "1998 | \n", "F | \n", "
7874 rows × 9 columns
\n", "Pipeline(steps=[('standardscaler', StandardScaler()),\n", " ('coxphsurvivalanalysis', CoxPHSurvivalAnalysis())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('standardscaler', StandardScaler()),\n", " ('coxphsurvivalanalysis', CoxPHSurvivalAnalysis())])
StandardScaler()
CoxPHSurvivalAnalysis()