sparsesurv.utils module

Summary

Functions:

`basic_cv_fold`	Basic CV scoring function based on the scoring function used [1].
`basic_mse`	Mean-squared error based CV scoring function.
`difference_kernels`	Subtract predictor values from each other as well as calculate (integrated) Gaussian kernel.
`gaussian_integrated_kernel`	Obtain result of the integration of the Gaussian kernel.
`gaussian_kernel`	Obtain result of Gaussian kernel.
`integrated_kernel`	Subtract predictor values from each other and calculate integrated Gaussian kernel.
`inverse_transform_survival`	Transform input variable into separate time and event arrays.
`inverse_transform_survival_kd`	Obtain survival times, censoring information and eta (e.g. y train) from structuted array.
`kernel`	Subtract predictor values from each other and calculate Gaussian kernel.
`linear_cv`	CV score computation using linear predictors [1, 2].
`logaddexp`	Apply log-sum-exp trick when calculating the log addition for numerical stability.
`logsubstractexp`	Apply log-sum-exp trick when calculating the log difference for numerical stability.
`numba_logsumexp_stable`	Apply log-sum-exp trick.
`transform_survival`	Transform time and event variables into one variable.
`transform_survival_kd`	Transform survival times, censoring information and eta (e.g. y train) into one array.
`vvh_cv_fold`	Verweij and Van Houwelingen CV scoring function [1, 2].

Reference

inverse_transform_survival(y)[source]

Transform input variable into separate time and event arrays.

Parameters:: y (np.array) – Structured array containing time and censoring events.
Returns:: Survival time and event array.
Return type:: tuple[npt.NDArray[np.float64], npt.NDArray[np.float64]]

transform_survival(time, event)[source]

Transform time and event variables into one variable.

Parameters:

time (npt.NDArray[np.float64]) – Survival times.
event (npt.NDArray[np.float64]) – Censoring information.

Returns:

Structured array containing survival times and right-censored survival information.

Return type:

np.array

inverse_transform_survival_kd(y)[source]

Obtain survival times, censoring information and eta (e.g. y train) from structuted array.

Parameters:: y (npt.NDArray[np.float64]) – Structured array containing survival times, censoring information.
Returns:: survival times, censoring information, eta.
Return type:: tuple[npt.NDArray[np.float64], npt.NDArray[np.int64], npt.NDArray[np.float64]]

transform_survival_kd(time, event, eta_hat)[source]

Transform survival times, censoring information and eta (e.g. y train) into one array.

Parameters:

time (npt.NDArray[np.float64]) – Survival times.
event (npt.NDArray[np.float64]) – Censoring information.
eta_hat (npt.NDArray[np.float64]) – Estimated dependent variable.

Raises:

NotImplementedError – Checking for dimensions.

Returns:

Structured array containing survival times and censoring information.

Return type:

npt.NDArray

logsubstractexp(a, b)[source]

Apply log-sum-exp trick when calculating the log difference for numerical stability.

Parameters:

a (float) – Subtraction value first entity.
b (float) – Subtraction value second entity.

Returns:

Result of substraction with log-sum-exp trick.

Return type:

float

logaddexp(a, b)[source]

Apply log-sum-exp trick when calculating the log addition for numerical stability.

Parameters:

a (float) – Addition value first entity.
b (float) – Addition value second entity.

Returns:

Result of addition with log-sum-exp trick.

Return type:

float

numba_logsumexp_stable(a)[source]

Apply log-sum-exp trick.

Parameters:

a (npt.NDArray[np.float64]) – Input array to which the sum and then
applied. (log will be) –

Returns:

Result of log-sum-exp trick.

Return type:

float

gaussian_integrated_kernel(x)[source]

Obtain result of the integration of the Gaussian kernel.

Parameters:: x (float) – Difference of hazard predictions.
Returns:: Integrated value of Gaussian kernel.
Return type:: float

gaussian_kernel(x)[source]

Obtain result of Gaussian kernel.

Parameters:: x (float) – Difference of hazard predictions.
Returns:: Value of Gaussian kernel.
Return type:: float

kernel(a, b, bandwidth)[source]

Subtract predictor values from each other and calculate Gaussian kernel.

Parameters:

a (npt.NDArray[np.float64]) – First predictor value (hazard prediction).
b (npt.NDArray[np.float64]) – Second predictor value (hazard prediction).
bandwidth (float) – Fixed kernel bandwith.

Returns:

Kernel matrix.

Return type:

npt.NDArray[np.float64]

integrated_kernel(a, b, bandwidth)[source]

Subtract predictor values from each other and calculate integrated Gaussian kernel.

Parameters:

a (npt.NDArray[np.float64]) – First predictor value (hazard prediction).
b (npt.NDArray[np.float64]) – Second predictor value (hazard prediction).
bandwidth (float) – Fixed kernel bandwith.

Returns:

Integrated kernel matrix.

Return type:

npt.NDArray[np.float64]

difference_kernels(a, b, bandwidth)[source]

Subtract predictor values from each other as well as calculate (integrated) Gaussian kernel.

Parameters:

a (npt.NDArray[np.float64]) – First predictor value (hazard prediction).
b (npt.NDArray[np.float64]) – Second predictor value (hazard prediction).
bandwidth (float) – Fixed kernel bandwith.

Returns:

Predictor difference, kernel matrix, integrated kernel matrix

Return type:

Tuple[npt.NDArray[np.float64],npt.NDArray[np.float64],npt.NDArray[np.float64]]

basic_cv_fold(test_linear_predictor, test_time, test_event, score_function, test_eta_hat=None, train_linear_predictor=None, train_time=None, train_event=None)[source]

Basic CV scoring function based on the scoring function used [1].

Parameters:

test_linear_predictor (np.array) – Linear predictors of a given test fold. X@beta.
test_time (np.array) – Sorted time points of the test fold.
test_event (np.array) – Event indicator of the test fold.
test_eta_hat (np.array) – Predicted linear predictors of a given test fold.
train_linear_predictor (np.array) – Linear predictors of the training fold.
train_time (np.array) – Sorted time points of the training fold.
train_event (np.array) – Event indicator of the training fold.
score_function (Callable) – Scoring function used to compute the negative log-likelihood.

Returns:

Scalar value of the mean partial log-likelihood for a given test fold.

Return type:

float

Notes

All unused parameters kept for overall score function signature compatibility.

References

[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).

basic_mse(test_linear_predictor, test_eta_hat, test_time=None, test_event=None, train_linear_predictor=None, train_time=None, train_event=None, score_function=None)[source]

Mean-squared error based CV scoring function.

Parameters:

test_linear_predictor (np.array) – Linear predictors of a given test fold. X@beta.
test_time (np.array) – Sorted time points of the test fold.
test_event (np.array) – Event indicator of the test fold.
test_eta_hat (np.array) – Predicted linear predictors of a given test fold.
train_linear_predictor (np.array) – Linear predictors of the training fold.
train_time (np.array) – Sorted time points of the training fold.
train_event (np.array) – Event indicator of the training fold.
score_function (Callable) – Scoring function used to compute the negative log-likelihood.

Returns:

Scalar value of the mean partial log-likelihood for a given test fold.

Return type:

float

Notes

All unused parameters kept for overall score function signature compatibility.

vvh_cv_fold(test_linear_predictor, test_time, test_event, train_linear_predictor, train_time, train_event, score_function, test_eta_hat=None)[source]

Verweij and Van Houwelingen CV scoring function [1, 2].

Parameters:

test_linear_predictor (np.array) – Linear predictors of a given test fold. X@beta.
test_time (np.array) – Sorted time points of the test fold.
test_event (np.array) – Event indicator of the test fold.
test_eta_hat (np.array) – Predicted linear predictors of a given test fold.
train_linear_predictor (np.array) – Linear predictors of the training fold.
train_time (np.array) – Sorted time points of the training fold.
train_event (np.array) – Event indicator of the training fold.
score_function (Callable) – Scoring function used to compute the negative log-likelihood.

Returns:

Scalar value of the mean partial log-likelihood for a given test fold.

Return type:

float

Notes

All unused parameters kept for overall score function signature compatibility.

References

[1] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.

[2] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).

linear_cv(test_linear_predictor, test_time, test_event, score_function, test_eta_hat=None, train_linear_predictor=None, train_time=None, train_event=None)[source]

CV score computation using linear predictors [1, 2].

Parameters:

test_linear_predictor (np.array) – Linear predictors of a given test fold. X@beta.
test_time (np.array) – Sorted time points of the test fold.
test_event (np.array) – Event indicator of the test fold.
test_eta_hat (np.array) – Predicted linear predictors of a given test fold.
train_linear_predictor (np.array) – Linear predictors of the training fold.
train_time (np.array) – Sorted time points of the training fold.
train_event (np.array) – Event indicator of the training fold.
score_function (Callable) – Scoring function used to compute the negative log-likelihood.

Returns:

Scalar value of the mean partial log-likelihood for a given test fold.

Return type:

float

Notes

All unused parameters kept for overall score function signature compatibility.

References

[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).