sparsesurv.utils module

Summary

Functions:

basic_cv_fold

Basic CV scoring function based on the scoring function used [1].

basic_mse

Mean-squared error based CV scoring function.

difference_kernels

Subtract predictor values from each other as well as calculate (integrated) Gaussian kernel.

gaussian_integrated_kernel

Obtain result of the integration of the Gaussian kernel.

gaussian_kernel

Obtain result of Gaussian kernel.

integrated_kernel

Subtract predictor values from each other and calculate integrated Gaussian kernel.

inverse_transform_survival

Transform input variable into separate time and event arrays.

inverse_transform_survival_kd

Obtain survival times, censoring information and eta (e.g. y train) from structuted array.

kernel

Subtract predictor values from each other and calculate Gaussian kernel.

linear_cv

CV score computation using linear predictors [1, 2].

logaddexp

Apply log-sum-exp trick when calculating the log addition for numerical stability.

logsubstractexp

Apply log-sum-exp trick when calculating the log difference for numerical stability.

numba_logsumexp_stable

Apply log-sum-exp trick.

transform_survival

Transform time and event variables into one variable.

transform_survival_kd

Transform survival times, censoring information and eta (e.g. y train) into one array.

vvh_cv_fold

Verweij and Van Houwelingen CV scoring function [1, 2].

Reference

inverse_transform_survival(y)[source]

Transform input variable into separate time and event arrays.

Parameters:

y (np.array) – Structured array containing time and censoring events.

Returns:

Survival time and event array.

Return type:

tuple[npt.NDArray[np.float64], npt.NDArray[np.float64]]

transform_survival(time, event)[source]

Transform time and event variables into one variable.

Parameters:
  • time (npt.NDArray[np.float64]) – Survival times.

  • event (npt.NDArray[np.float64]) – Censoring information.

Returns:

Structured array containing survival times and right-censored survival information.

Return type:

np.array

inverse_transform_survival_kd(y)[source]

Obtain survival times, censoring information and eta (e.g. y train) from structuted array.

Parameters:

y (npt.NDArray[np.float64]) – Structured array containing survival times, censoring information.

Returns:

survival times, censoring information, eta.

Return type:

tuple[npt.NDArray[np.float64], npt.NDArray[np.int64], npt.NDArray[np.float64]]

transform_survival_kd(time, event, eta_hat)[source]

Transform survival times, censoring information and eta (e.g. y train) into one array.

Parameters:
  • time (npt.NDArray[np.float64]) – Survival times.

  • event (npt.NDArray[np.float64]) – Censoring information.

  • eta_hat (npt.NDArray[np.float64]) – Estimated dependent variable.

Raises:

NotImplementedError – Checking for dimensions.

Returns:

Structured array containing survival times and censoring information.

Return type:

npt.NDArray

logsubstractexp(a, b)[source]

Apply log-sum-exp trick when calculating the log difference for numerical stability.

Parameters:
  • a (float) – Subtraction value first entity.

  • b (float) – Subtraction value second entity.

Returns:

Result of substraction with log-sum-exp trick.

Return type:

float

logaddexp(a, b)[source]

Apply log-sum-exp trick when calculating the log addition for numerical stability.

Parameters:
  • a (float) – Addition value first entity.

  • b (float) – Addition value second entity.

Returns:

Result of addition with log-sum-exp trick.

Return type:

float

numba_logsumexp_stable(a)[source]

Apply log-sum-exp trick.

Parameters:
  • a (npt.NDArray[np.float64]) – Input array to which the sum and then

  • applied. (log will be) –

Returns:

Result of log-sum-exp trick.

Return type:

float

gaussian_integrated_kernel(x)[source]

Obtain result of the integration of the Gaussian kernel.

Parameters:

x (float) – Difference of hazard predictions.

Returns:

Integrated value of Gaussian kernel.

Return type:

float

gaussian_kernel(x)[source]

Obtain result of Gaussian kernel.

Parameters:

x (float) – Difference of hazard predictions.

Returns:

Value of Gaussian kernel.

Return type:

float

kernel(a, b, bandwidth)[source]

Subtract predictor values from each other and calculate Gaussian kernel.

Parameters:
  • a (npt.NDArray[np.float64]) – First predictor value (hazard prediction).

  • b (npt.NDArray[np.float64]) – Second predictor value (hazard prediction).

  • bandwidth (float) – Fixed kernel bandwith.

Returns:

Kernel matrix.

Return type:

npt.NDArray[np.float64]

integrated_kernel(a, b, bandwidth)[source]

Subtract predictor values from each other and calculate integrated Gaussian kernel.

Parameters:
  • a (npt.NDArray[np.float64]) – First predictor value (hazard prediction).

  • b (npt.NDArray[np.float64]) – Second predictor value (hazard prediction).

  • bandwidth (float) – Fixed kernel bandwith.

Returns:

Integrated kernel matrix.

Return type:

npt.NDArray[np.float64]

difference_kernels(a, b, bandwidth)[source]

Subtract predictor values from each other as well as calculate (integrated) Gaussian kernel.

Parameters:
  • a (npt.NDArray[np.float64]) – First predictor value (hazard prediction).

  • b (npt.NDArray[np.float64]) – Second predictor value (hazard prediction).

  • bandwidth (float) – Fixed kernel bandwith.

Returns:

Predictor difference, kernel matrix, integrated kernel matrix

Return type:

Tuple[npt.NDArray[np.float64],npt.NDArray[np.float64],npt.NDArray[np.float64]]

basic_cv_fold(test_linear_predictor, test_time, test_event, score_function, test_eta_hat=None, train_linear_predictor=None, train_time=None, train_event=None)[source]

Basic CV scoring function based on the scoring function used [1].

Parameters:
  • test_linear_predictor (np.array) – Linear predictors of a given test fold. X@beta.

  • test_time (np.array) – Sorted time points of the test fold.

  • test_event (np.array) – Event indicator of the test fold.

  • test_eta_hat (np.array) – Predicted linear predictors of a given test fold.

  • train_linear_predictor (np.array) – Linear predictors of the training fold.

  • train_time (np.array) – Sorted time points of the training fold.

  • train_event (np.array) – Event indicator of the training fold.

  • score_function (Callable) – Scoring function used to compute the negative log-likelihood.

Returns:

Scalar value of the mean partial log-likelihood for a given test fold.

Return type:

float

Notes

All unused parameters kept for overall score function signature compatibility.

References

[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).

basic_mse(test_linear_predictor, test_eta_hat, test_time=None, test_event=None, train_linear_predictor=None, train_time=None, train_event=None, score_function=None)[source]

Mean-squared error based CV scoring function.

Parameters:
  • test_linear_predictor (np.array) – Linear predictors of a given test fold. X@beta.

  • test_time (np.array) – Sorted time points of the test fold.

  • test_event (np.array) – Event indicator of the test fold.

  • test_eta_hat (np.array) – Predicted linear predictors of a given test fold.

  • train_linear_predictor (np.array) – Linear predictors of the training fold.

  • train_time (np.array) – Sorted time points of the training fold.

  • train_event (np.array) – Event indicator of the training fold.

  • score_function (Callable) – Scoring function used to compute the negative log-likelihood.

Returns:

Scalar value of the mean partial log-likelihood for a given test fold.

Return type:

float

Notes

All unused parameters kept for overall score function signature compatibility.

vvh_cv_fold(test_linear_predictor, test_time, test_event, train_linear_predictor, train_time, train_event, score_function, test_eta_hat=None)[source]

Verweij and Van Houwelingen CV scoring function [1, 2].

Parameters:
  • test_linear_predictor (np.array) – Linear predictors of a given test fold. X@beta.

  • test_time (np.array) – Sorted time points of the test fold.

  • test_event (np.array) – Event indicator of the test fold.

  • test_eta_hat (np.array) – Predicted linear predictors of a given test fold.

  • train_linear_predictor (np.array) – Linear predictors of the training fold.

  • train_time (np.array) – Sorted time points of the training fold.

  • train_event (np.array) – Event indicator of the training fold.

  • score_function (Callable) – Scoring function used to compute the negative log-likelihood.

Returns:

Scalar value of the mean partial log-likelihood for a given test fold.

Return type:

float

Notes

All unused parameters kept for overall score function signature compatibility.

References

[1] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.

[2] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).

linear_cv(test_linear_predictor, test_time, test_event, score_function, test_eta_hat=None, train_linear_predictor=None, train_time=None, train_event=None)[source]

CV score computation using linear predictors [1, 2].

Parameters:
  • test_linear_predictor (np.array) – Linear predictors of a given test fold. X@beta.

  • test_time (np.array) – Sorted time points of the test fold.

  • test_event (np.array) – Event indicator of the test fold.

  • test_eta_hat (np.array) – Predicted linear predictors of a given test fold.

  • train_linear_predictor (np.array) – Linear predictors of the training fold.

  • train_time (np.array) – Sorted time points of the training fold.

  • train_event (np.array) – Event indicator of the training fold.

  • score_function (Callable) – Scoring function used to compute the negative log-likelihood.

Returns:

Scalar value of the mean partial log-likelihood for a given test fold.

Return type:

float

Notes

All unused parameters kept for overall score function signature compatibility.

References

[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).