sparsesurv.cv module

Summary

Classes:

`BaseKDSurv`	Parent class to fit distilled sparse semi-parametric right-censored survival
`KDAFTElasticNetCV`	Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric AFT models.
`KDEHMultiTaskLassoCV`	Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric EH models.
`KDPHElasticNetCV`	Child-class of BaseKDSurv to perform knowledge distillation specifically for Cox PH models.

Reference

class BaseKDSurv(l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]

Bases: SurvivalMixin, ElasticNetCV

Parent class to fit distilled sparse semi-parametric right-censored survival: models using cross validation.

Notes

This class is largely adapted from the ElasticNetCV implementations in sklearn and celer.

See also

sklearn.linear_model.ElasticNetCV celer.ElasticNetCV

__init__(l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]

Constructor.

Parameters:

l1_ratio (Union[float, List[float]], optional) – Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1]. Defaults to 1.0.
eps (float, optional) – Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3. Defaults to 1e-3.
n_alphas (int, optional) – Number of alphas along the regularization path, used for each l1_ratio. Defaults to 100.
max_iter (int, optional) – The maximum number of iterations. Defaults to 100.
tol (float, optional) – The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol. Defaults to 1e-4.
cv (int, optional) – Number of folds to perform to select hyperparameters. Defaults to 5. See also stratify_cv.
verbose (int, optional) – Degree of verbosity. Defaults to 0.
max_epochs (int, optional) – Maximum number of coordinate descent epochs when solving a subproblem. Defaults to 50000.
p0 (int, optional) – Number of features in the first working set. Defaults to 10.
prune (bool, optional) – Whether to use pruning when growing the working sets. Defaults to True.
n_jobs (Optional[int], optional) – Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Defaults to None.
stratify_cv (bool, optional) – Whether to perform the cross-validation stratified on the event indicator or not. Defaults to True.
seed (Optional[int], optional) – Random seed. Defaults to 42.
shuffle_cv (bool, optional) – Whether to perform shuffling to generate CV fold indices. Defaults to False.
cv_score_method (str, optional) – Which scoring method to use. Defaults to “linear_predictor”. Must be one of “linear_predictor”, “mse”, “basic” and “vvh”. See Notes.
max_coef (float, optional) – Maximum number of non-zero covariates to be selected with chosen optimal regularization hyperparameter. Defaults to np.inf. See Notes.
alpha_type (str, optional) – How to select the optimal regularization hyperparameter. Defaults to “min”. Must be one of “min”, “1se” and “pcvl”. See Notes.

Notes

cv_score_method:

Decides how the score which is used to select the optimal regularization hyperparameter is selected. The basic approach may suffer from issues with small event sizes or for large number of folds [1]. Meanwhile, the mse approach may yield good teacher-student fidelity, but suboptimal survival predictions.

mse: Calculates the score as the mean squared error

of the teacher predictions and the student predictions. The MSE is calculated per test fold and aggregated across folds using the arithmetic mean. - linear_predictor: Calculates out of sample predictions for each test fold and caches them. Once out of sample predictions are produced for each sample, an appropriate survival loss between student predictions and the observed time and censoring indactor is calculated once, using only the cached out of sample predictions. See [1]. - basic: Calculates the score as an appropriate survival loss between student predictions and observed time and event indicators in each test fold. The overall loss is obtained as an arithmetic mean across all folds. - vvh: Calculates the test score in each test fold as the difference between the score across all samples in that fold and only the training samples in that fold.The overall loss is obtained as an arithmetic mean across all folds. See [1, 2].

max_coef:

Places an upper bound on the number of non-zero coefficients the selected model returned after cross validation may have.

In particular, if max_coef=k, during scoring, only models with a total number of non-zero coefficients less than k are considered.

Currently, we still calculate the solutions for these models, we just disregard them at scoring time.

alpha_type:

Decides how the regularization hyperparameter is selected. For a given cv_score_method and max_coef, we end up with a vector of length k > 1, that contains numeric scores, where lower is better (i.e., losses). alpha_type decides how we choose among the regularization hyperparameters corresponding to this loss vector.

min: Selects the regularization hyperparameter that

yields the minimum loss.

1se: Selects the highest regularization hyperparameter

that is within one standard error of the mean loss of the regularization hyperparameter with minimum loss [3].

pcvl: Selects a hyperparameter inbetween min and

1se via a penalization term. See [4].

References

[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).

[2] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.

[3] Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: Springer, 2009.

[4] Ternès, Nils, Federico Rotolo, and Stefan Michiels. “Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models.” Statistics in medicine 35.15 (2016): 2561-2573.

fit(X, y, sample_weight=None)[source]

Fit knowledge distilled semi-parametric survival model to given data.

Parameters:

X (npt.NDArray[np.float64]) – Design matrix.
y (npt.NDArray[np.float64]) – Linear predictor as returned by the teacher.
sample_weight (npt.NDArray[np.float64], optional) – Sample weight used during model fitting. Currently unused and kept for sklearn compatibility. Defaults to None.

Return type:

None

_is_multitask()[source]: Return whether the model instance in question is a multitask model.

predict(X)[source]

Calculate linear predictor corresponding to query design matrix X.

Parameters:: X (npt.NDArray[np.float64]) – Query design matrix.
Returns:: Linear predictor of the samples.
Return type:: npt.NDArray[np.float64]

__abstractmethods__ = frozenset({})

__doc__ = 'Parent class to fit distilled sparse semi-parametric right-censored survival\n models using cross validation.\n\n Notes:\n This class is largely adapted from the `ElasticNetCV` implementations\n in `sklearn` and `celer`.\n\n See Also:\n sklearn.linear_model.ElasticNetCV\n celer.ElasticNetCV\n '

__module__ = 'sparsesurv.cv'

_abc_impl = <_abc._abc_data object>

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BaseKDSurv

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → BaseKDSurv

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class KDPHElasticNetCV(tie_correction='efron', l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]

Bases: BaseKDSurv

Child-class of BaseKDSurv to perform knowledge distillation specifically for Cox PH models.

__init__(tie_correction='efron', l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]

Constructor.

Parameters:

tie_correction (str) – Which method to use to correct for ties
"efron". (in observed survival times. Must be one of "breslow" or) –
l1_ratio (Union[float, List[float]], optional) – Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1]. Defaults to 1.0.
eps (float, optional) – Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3. Defaults to 1e-3.
n_alphas (int, optional) – Number of alphas along the regularization path, used for each l1_ratio. Defaults to 100.
max_iter (int, optional) – The maximum number of iterations. Defaults to 100.
tol (float, optional) – The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol. Defaults to 1e-4.
cv (int, optional) – Number of folds to perform to select hyperparameters. Defaults to 5. See also stratify_cv.
verbose (int, optional) – Degree of verbosity. Defaults to 0.
max_epochs (int, optional) – Maximum number of coordinate descent epochs when solving a subproblem. Defaults to 50000.
p0 (int, optional) – Number of features in the first working set. Defaults to 10.
prune (bool, optional) – Whether to use pruning when growing the working sets. Defaults to True.
n_jobs (Optional[int], optional) – Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Defaults to None.
stratify_cv (bool, optional) – Whether to perform the cross-validation stratified on the event indicator or not. Defaults to True.
seed (Optional[int], optional) – Random seed. Defaults to 42.
shuffle_cv (bool, optional) – Whether to perform shuffling to generate CV fold indices. Defaults to False.
cv_score_method (str, optional) – Which scoring method to use. Defaults to “linear_predictor”. Must be one of “linear_predictor”, “mse”, “basic” and “vvh”. See Notes.
max_coef (float, optional) – Maximum number of non-zero covariates to be selected with chosen optimal regularization hyperparameter. Defaults to np.inf. See Notes.
alpha_type (str, optional) – How to select the optimal regularization hyperparameter. Defaults to “min”. Must be one of “min”, “1se” and “pcvl”. See Notes.

Notes

cv_score_method:

Decides how the score which is used to select the optimal regularization hyperparameter is selected. The basic approach may suffer from issues with small event sizes or for large number of folds [1]. Meanwhile, the mse approach may yield good teacher-student fidelity, but suboptimal survival predictions.

mse: Calculates the score as the mean squared error

of the teacher predictions and the student predictions. The MSE is calculated per test fold and aggregated across folds using the arithmetic mean. - linear_predictor: Calculates out of sample predictions for each test fold and caches them. Once out of sample predictions are produced for each sample, an appropriate survival loss between student predictions and the observed time and censoring indactor is calculated once, using only the cached out of sample predictions. See [1]. - basic: Calculates the score as an appropriate survival loss between student predictions and observed time and event indicators in each test fold. The overall loss is obtained as an arithmetic mean across all folds. - vvh: Calculates the test score in each test fold as the difference between the score across all samples in that fold and only the training samples in that fold.The overall loss is obtained as an arithmetic mean across all folds. See [1, 2].

max_coef:

Places an upper bound on the number of non-zero coefficients the selected model returned after cross validation may have.

In particular, if max_coef=k, during scoring, only models with a total number of non-zero coefficients less than k are considered.

Currently, we still calculate the solutions for these models, we just disregard them at scoring time.

alpha_type:

Decides how the regularization hyperparameter is selected. For a given cv_score_method and max_coef, we end up with a vector of length k > 1, that contains numeric scores, where lower is better (i.e., losses). alpha_type decides how we choose among the regularization hyperparameters corresponding to this loss vector.

min: Selects the regularization hyperparameter that

yields the minimum loss.

1se: Selects the highest regularization hyperparameter

that is within one standard error of the mean loss of the regularization hyperparameter with minimum loss [3].

pcvl: Selects a hyperparameter inbetween min and

1se via a penalization term. See [4].

References

[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).

[2] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.

[3] Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: Springer, 2009.

[4] Ternès, Nils, Federico Rotolo, and Stefan Michiels. “Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models.” Statistics in medicine 35.15 (2016): 2561-2573.

predict_cumulative_hazard_function(X, time)[source]

Predict cumulative hazard function for patients in X at times time.

Parameters:

X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
time (npt.NDArray[np.float64]) – Query times of dimension k. Assumed to be unique and ordered.

Raises:

ValueError – Raises ValueError when the event times are not unique and sorted in ascending order.

Returns:

Query cumulative hazard function for samples 1, …, u: and times 1, …, k. Thus, has u rows and k columns.

Return type:

npt.NDArray[np.float64]

__abstractmethods__ = frozenset({})

__doc__ = 'Child-class of BaseKDSurv to perform knowledge distillation specifically for Cox PH models.'

__module__ = 'sparsesurv.cv'

_abc_impl = <_abc._abc_data object>

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → KDPHElasticNetCV

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → KDPHElasticNetCV

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class KDAFTElasticNetCV(bandwidth=None, l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]

Bases: BaseKDSurv

Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric AFT models.

__init__(bandwidth=None, l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]

Constructor.

Parameters:

bandwidth (Optional[float]) – Bandwidth to use for kernel-smoothing the profile likelihood. If not provided, a theoretically motivated profile likelihood is estimated based on the data.
"efron". (in observed survival times. Must be one of "breslow" or) –
l1_ratio (Union[float, List[float]], optional) – Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1]. Defaults to 1.0.
eps (float, optional) – Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3. Defaults to 1e-3.
n_alphas (int, optional) – Number of alphas along the regularization path, used for each l1_ratio. Defaults to 100.
max_iter (int, optional) – The maximum number of iterations. Defaults to 100.
tol (float, optional) – The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol. Defaults to 1e-4.
cv (int, optional) – Number of folds to perform to select hyperparameters. Defaults to 5. See also stratify_cv.
verbose (int, optional) – Degree of verbosity. Defaults to 0.
max_epochs (int, optional) – Maximum number of coordinate descent epochs when solving a subproblem. Defaults to 50000.
p0 (int, optional) – Number of features in the first working set. Defaults to 10.
prune (bool, optional) – Whether to use pruning when growing the working sets. Defaults to True.
n_jobs (Optional[int], optional) – Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Defaults to None.
stratify_cv (bool, optional) – Whether to perform the cross-validation stratified on the event indicator or not. Defaults to True.
seed (Optional[int], optional) – Random seed. Defaults to 42.
shuffle_cv (bool, optional) – Whether to perform shuffling to generate CV fold indices. Defaults to False.
cv_score_method (str, optional) – Which scoring method to use. Defaults to “linear_predictor”. Must be one of “linear_predictor”, “mse”, “basic” and “vvh”. See Notes.
max_coef (float, optional) – Maximum number of non-zero covariates to be selected with chosen optimal regularization hyperparameter. Defaults to np.inf. See Notes.
alpha_type (str, optional) – How to select the optimal regularization hyperparameter. Defaults to “min”. Must be one of “min”, “1se” and “pcvl”. See Notes.

Notes

cv_score_method:

Decides how the score which is used to select the optimal regularization hyperparameter is selected. The basic approach may suffer from issues with small event sizes or for large number of folds [1]. Meanwhile, the mse approach may yield good teacher-student fidelity, but suboptimal survival predictions.

mse: Calculates the score as the mean squared error

of the teacher predictions and the student predictions. The MSE is calculated per test fold and aggregated across folds using the arithmetic mean. - linear_predictor: Calculates out of sample predictions for each test fold and caches them. Once out of sample predictions are produced for each sample, an appropriate survival loss between student predictions and the observed time and censoring indactor is calculated once, using only the cached out of sample predictions. See [1]. - basic: Calculates the score as an appropriate survival loss between student predictions and observed time and event indicators in each test fold. The overall loss is obtained as an arithmetic mean across all folds. - vvh: Calculates the test score in each test fold as the difference between the score across all samples in that fold and only the training samples in that fold.The overall loss is obtained as an arithmetic mean across all folds. See [1, 2].

max_coef:

Places an upper bound on the number of non-zero coefficients the selected model returned after cross validation may have.

In particular, if max_coef=k, during scoring, only models with a total number of non-zero coefficients less than k are considered.

Currently, we still calculate the solutions for these models, we just disregard them at scoring time.

alpha_type:

Decides how the regularization hyperparameter is selected. For a given cv_score_method and max_coef, we end up with a vector of length k > 1, that contains numeric scores, where lower is better (i.e., losses). alpha_type decides how we choose among the regularization hyperparameters corresponding to this loss vector.

min: Selects the regularization hyperparameter that

yields the minimum loss.

1se: Selects the highest regularization hyperparameter

that is within one standard error of the mean loss of the regularization hyperparameter with minimum loss [3].

pcvl: Selects a hyperparameter inbetween min and

1se via a penalization term. See [4].

References

[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).

[2] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.

[3] Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: Springer, 2009.

[4] Ternès, Nils, Federico Rotolo, and Stefan Michiels. “Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models.” Statistics in medicine 35.15 (2016): 2561-2573.

predict_cumulative_hazard_function(X, time)[source]

Predict cumulative hazard function for patients in X at times time.

Parameters:

X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
time (npt.NDArray[np.float64]) – Query times of dimension k. Assumed to be unique and ordered.

Raises:

ValueError – Raises ValueError when the event times are not unique and sorted in ascending order.

Returns:

Query cumulative hazard function for samples 1, …, u: and times 1, …, k. Thus, has u rows and k columns.

Return type:

npt.NDArray[np.float64]

__abstractmethods__ = frozenset({})

__doc__ = 'Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric AFT models.'

__module__ = 'sparsesurv.cv'

_abc_impl = <_abc._abc_data object>

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → KDAFTElasticNetCV

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → KDAFTElasticNetCV

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class KDEHMultiTaskLassoCV(bandwidth=None, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]

Bases: BaseKDSurv

Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric EH models.

__init__(bandwidth=None, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]

Constructor.

Parameters:

bandwidth (Optional[float]) – Bandwidth to use for kernel-smoothing the profile likelihood. If not provided, a theoretically motivated profile likelihood is estimated based on the data.
"efron". (in observed survival times. Must be one of "breslow" or) –
l1_ratio (Union[float, List[float]], optional) – Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in [.1, .5, .7, .9, .95, .99, 1]. Defaults to 1.0.
eps (float, optional) – Length of the path. eps=1e-3 means that alpha_min / alpha_max = 1e-3. Defaults to 1e-3.
n_alphas (int, optional) – Number of alphas along the regularization path, used for each l1_ratio. Defaults to 100.
max_iter (int, optional) – The maximum number of iterations. Defaults to 100.
tol (float, optional) – The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol. Defaults to 1e-4.
cv (int, optional) – Number of folds to perform to select hyperparameters. Defaults to 5. See also stratify_cv.
verbose (int, optional) – Degree of verbosity. Defaults to 0.
max_epochs (int, optional) – Maximum number of coordinate descent epochs when solving a subproblem. Defaults to 50000.
p0 (int, optional) – Number of features in the first working set. Defaults to 10.
prune (bool, optional) – Whether to use pruning when growing the working sets. Defaults to True.
n_jobs (Optional[int], optional) – Number of CPUs to use during the cross validation. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Defaults to None.
stratify_cv (bool, optional) – Whether to perform the cross-validation stratified on the event indicator or not. Defaults to True.
seed (Optional[int], optional) – Random seed. Defaults to 42.
shuffle_cv (bool, optional) – Whether to perform shuffling to generate CV fold indices. Defaults to False.
cv_score_method (str, optional) – Which scoring method to use. Defaults to “linear_predictor”. Must be one of “linear_predictor”, “mse”, “basic” and “vvh”. See Notes.
max_coef (float, optional) – Maximum number of non-zero covariates to be selected with chosen optimal regularization hyperparameter. Defaults to np.inf. See Notes.
alpha_type (str, optional) – How to select the optimal regularization hyperparameter. Defaults to “min”. Must be one of “min”, “1se” and “pcvl”. See Notes.

Notes

cv_score_method:

Decides how the score which is used to select the optimal regularization hyperparameter is selected. The basic approach may suffer from issues with small event sizes or for large number of folds [1]. Meanwhile, the mse approach may yield good teacher-student fidelity, but suboptimal survival predictions.

mse: Calculates the score as the mean squared error

of the teacher predictions and the student predictions. The MSE is calculated per test fold and aggregated across folds using the arithmetic mean. - linear_predictor: Calculates out of sample predictions for each test fold and caches them. Once out of sample predictions are produced for each sample, an appropriate survival loss between student predictions and the observed time and censoring indactor is calculated once, using only the cached out of sample predictions. See [1]. - basic: Calculates the score as an appropriate survival loss between student predictions and observed time and event indicators in each test fold. The overall loss is obtained as an arithmetic mean across all folds. - vvh: Calculates the test score in each test fold as the difference between the score across all samples in that fold and only the training samples in that fold.The overall loss is obtained as an arithmetic mean across all folds. See [1, 2].

max_coef:

Places an upper bound on the number of non-zero coefficients the selected model returned after cross validation may have.

In particular, if max_coef=k, during scoring, only models with a total number of non-zero coefficients less than k are considered.

Currently, we still calculate the solutions for these models, we just disregard them at scoring time.

alpha_type:

Decides how the regularization hyperparameter is selected. For a given cv_score_method and max_coef, we end up with a vector of length k > 1, that contains numeric scores, where lower is better (i.e., losses). alpha_type decides how we choose among the regularization hyperparameters corresponding to this loss vector.

min: Selects the regularization hyperparameter that

yields the minimum loss.

1se: Selects the highest regularization hyperparameter

that is within one standard error of the mean loss of the regularization hyperparameter with minimum loss [3].

pcvl: Selects a hyperparameter inbetween min and

1se via a penalization term. See [4].

References

[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).

[2] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.

[3] Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: Springer, 2009.

[4] Ternès, Nils, Federico Rotolo, and Stefan Michiels. “Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models.” Statistics in medicine 35.15 (2016): 2561-2573.

path(X, y, alphas, coef_init=None, **kwargs)[source]

Return type:: Tuple

Compute Lasso path with Celer. Function taken as-is from celer for compatibility: with parent class.