sparsesurv.cv module
Summary
Classes:
Parent class to fit distilled sparse semi-parametric right-censored survival |
|
Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric AFT models. |
|
Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric EH models. |
|
Child-class of BaseKDSurv to perform knowledge distillation specifically for Cox PH models. |
Reference
- class BaseKDSurv(l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]
Bases:
SurvivalMixin
,ElasticNetCV
- Parent class to fit distilled sparse semi-parametric right-censored survival
models using cross validation.
Notes
This class is largely adapted from the ElasticNetCV implementations in sklearn and celer.
See also
sklearn.linear_model.ElasticNetCV celer.ElasticNetCV
- __init__(l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]
Constructor.
- Parameters:
l1_ratio (Union[float, List[float]], optional) – Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For
l1_ratio = 0
the penalty is an L2 penalty. Forl1_ratio = 1
it is an L1 penalty. For0 < l1_ratio < 1
, the penalty is a combination of L1 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in[.1, .5, .7, .9, .95, .99, 1]
. Defaults to 1.0.eps (float, optional) – Length of the path.
eps=1e-3
means thatalpha_min / alpha_max = 1e-3
. Defaults to 1e-3.n_alphas (int, optional) – Number of alphas along the regularization path, used for each l1_ratio. Defaults to 100.
max_iter (int, optional) – The maximum number of iterations. Defaults to 100.
tol (float, optional) – The tolerance for the optimization: if the updates are smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
. Defaults to 1e-4.cv (int, optional) – Number of folds to perform to select hyperparameters. Defaults to 5. See also stratify_cv.
verbose (int, optional) – Degree of verbosity. Defaults to 0.
max_epochs (int, optional) – Maximum number of coordinate descent epochs when solving a subproblem. Defaults to 50000.
p0 (int, optional) – Number of features in the first working set. Defaults to 10.
prune (bool, optional) – Whether to use pruning when growing the working sets. Defaults to True.
n_jobs (Optional[int], optional) – Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details. Defaults to None.stratify_cv (bool, optional) – Whether to perform the cross-validation stratified on the event indicator or not. Defaults to True.
seed (Optional[int], optional) – Random seed. Defaults to 42.
shuffle_cv (bool, optional) – Whether to perform shuffling to generate CV fold indices. Defaults to False.
cv_score_method (str, optional) – Which scoring method to use. Defaults to “linear_predictor”. Must be one of “linear_predictor”, “mse”, “basic” and “vvh”. See Notes.
max_coef (float, optional) – Maximum number of non-zero covariates to be selected with chosen optimal regularization hyperparameter. Defaults to np.inf. See Notes.
alpha_type (str, optional) – How to select the optimal regularization hyperparameter. Defaults to “min”. Must be one of “min”, “1se” and “pcvl”. See Notes.
Notes
- cv_score_method:
Decides how the score which is used to select the optimal regularization hyperparameter is selected. The basic approach may suffer from issues with small event sizes or for large number of folds [1]. Meanwhile, the mse approach may yield good teacher-student fidelity, but suboptimal survival predictions.
mse: Calculates the score as the mean squared error
of the teacher predictions and the student predictions. The MSE is calculated per test fold and aggregated across folds using the arithmetic mean. - linear_predictor: Calculates out of sample predictions for each test fold and caches them. Once out of sample predictions are produced for each sample, an appropriate survival loss between student predictions and the observed time and censoring indactor is calculated once, using only the cached out of sample predictions. See [1]. - basic: Calculates the score as an appropriate survival loss between student predictions and observed time and event indicators in each test fold. The overall loss is obtained as an arithmetic mean across all folds. - vvh: Calculates the test score in each test fold as the difference between the score across all samples in that fold and only the training samples in that fold.The overall loss is obtained as an arithmetic mean across all folds. See [1, 2].
- max_coef:
Places an upper bound on the number of non-zero coefficients the selected model returned after cross validation may have.
In particular, if max_coef=k, during scoring, only models with a total number of non-zero coefficients less than k are considered.
Currently, we still calculate the solutions for these models, we just disregard them at scoring time.
- alpha_type:
Decides how the regularization hyperparameter is selected. For a given cv_score_method and max_coef, we end up with a vector of length k > 1, that contains numeric scores, where lower is better (i.e., losses). alpha_type decides how we choose among the regularization hyperparameters corresponding to this loss vector.
min: Selects the regularization hyperparameter that
yields the minimum loss.
1se: Selects the highest regularization hyperparameter
that is within one standard error of the mean loss of the regularization hyperparameter with minimum loss [3].
pcvl: Selects a hyperparameter inbetween min and
1se via a penalization term. See [4].
References
[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).
[2] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.
[3] Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: Springer, 2009.
[4] Ternès, Nils, Federico Rotolo, and Stefan Michiels. “Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models.” Statistics in medicine 35.15 (2016): 2561-2573.
- fit(X, y, sample_weight=None)[source]
Fit knowledge distilled semi-parametric survival model to given data.
- Parameters:
X (npt.NDArray[np.float64]) – Design matrix.
y (npt.NDArray[np.float64]) – Linear predictor as returned by the teacher.
sample_weight (npt.NDArray[np.float64], optional) – Sample weight used during model fitting. Currently unused and kept for sklearn compatibility. Defaults to None.
- Return type:
- predict(X)[source]
Calculate linear predictor corresponding to query design matrix X.
- Parameters:
X (npt.NDArray[np.float64]) – Query design matrix.
- Returns:
Linear predictor of the samples.
- Return type:
npt.NDArray[np.float64]
- __abstractmethods__ = frozenset({})
- __doc__ = 'Parent class to fit distilled sparse semi-parametric right-censored survival\n models using cross validation.\n\n Notes:\n This class is largely adapted from the `ElasticNetCV` implementations\n in `sklearn` and `celer`.\n\n See Also:\n sklearn.linear_model.ElasticNetCV\n celer.ElasticNetCV\n '
- __module__ = 'sparsesurv.cv'
- _abc_impl = <_abc._abc_data object>
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BaseKDSurv
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') BaseKDSurv
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class KDPHElasticNetCV(tie_correction='efron', l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]
Bases:
BaseKDSurv
Child-class of BaseKDSurv to perform knowledge distillation specifically for Cox PH models.
- __init__(tie_correction='efron', l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]
Constructor.
- Parameters:
tie_correction (str) – Which method to use to correct for ties
"efron". (in observed survival times. Must be one of "breslow" or) –
l1_ratio (Union[float, List[float]], optional) – Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For
l1_ratio = 0
the penalty is an L2 penalty. Forl1_ratio = 1
it is an L1 penalty. For0 < l1_ratio < 1
, the penalty is a combination of L1 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in[.1, .5, .7, .9, .95, .99, 1]
. Defaults to 1.0.eps (float, optional) – Length of the path.
eps=1e-3
means thatalpha_min / alpha_max = 1e-3
. Defaults to 1e-3.n_alphas (int, optional) – Number of alphas along the regularization path, used for each l1_ratio. Defaults to 100.
max_iter (int, optional) – The maximum number of iterations. Defaults to 100.
tol (float, optional) – The tolerance for the optimization: if the updates are smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
. Defaults to 1e-4.cv (int, optional) – Number of folds to perform to select hyperparameters. Defaults to 5. See also stratify_cv.
verbose (int, optional) – Degree of verbosity. Defaults to 0.
max_epochs (int, optional) – Maximum number of coordinate descent epochs when solving a subproblem. Defaults to 50000.
p0 (int, optional) – Number of features in the first working set. Defaults to 10.
prune (bool, optional) – Whether to use pruning when growing the working sets. Defaults to True.
n_jobs (Optional[int], optional) – Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details. Defaults to None.stratify_cv (bool, optional) – Whether to perform the cross-validation stratified on the event indicator or not. Defaults to True.
seed (Optional[int], optional) – Random seed. Defaults to 42.
shuffle_cv (bool, optional) – Whether to perform shuffling to generate CV fold indices. Defaults to False.
cv_score_method (str, optional) – Which scoring method to use. Defaults to “linear_predictor”. Must be one of “linear_predictor”, “mse”, “basic” and “vvh”. See Notes.
max_coef (float, optional) – Maximum number of non-zero covariates to be selected with chosen optimal regularization hyperparameter. Defaults to np.inf. See Notes.
alpha_type (str, optional) – How to select the optimal regularization hyperparameter. Defaults to “min”. Must be one of “min”, “1se” and “pcvl”. See Notes.
Notes
- cv_score_method:
Decides how the score which is used to select the optimal regularization hyperparameter is selected. The basic approach may suffer from issues with small event sizes or for large number of folds [1]. Meanwhile, the mse approach may yield good teacher-student fidelity, but suboptimal survival predictions.
mse: Calculates the score as the mean squared error
of the teacher predictions and the student predictions. The MSE is calculated per test fold and aggregated across folds using the arithmetic mean. - linear_predictor: Calculates out of sample predictions for each test fold and caches them. Once out of sample predictions are produced for each sample, an appropriate survival loss between student predictions and the observed time and censoring indactor is calculated once, using only the cached out of sample predictions. See [1]. - basic: Calculates the score as an appropriate survival loss between student predictions and observed time and event indicators in each test fold. The overall loss is obtained as an arithmetic mean across all folds. - vvh: Calculates the test score in each test fold as the difference between the score across all samples in that fold and only the training samples in that fold.The overall loss is obtained as an arithmetic mean across all folds. See [1, 2].
- max_coef:
Places an upper bound on the number of non-zero coefficients the selected model returned after cross validation may have.
In particular, if max_coef=k, during scoring, only models with a total number of non-zero coefficients less than k are considered.
Currently, we still calculate the solutions for these models, we just disregard them at scoring time.
- alpha_type:
Decides how the regularization hyperparameter is selected. For a given cv_score_method and max_coef, we end up with a vector of length k > 1, that contains numeric scores, where lower is better (i.e., losses). alpha_type decides how we choose among the regularization hyperparameters corresponding to this loss vector.
min: Selects the regularization hyperparameter that
yields the minimum loss.
1se: Selects the highest regularization hyperparameter
that is within one standard error of the mean loss of the regularization hyperparameter with minimum loss [3].
pcvl: Selects a hyperparameter inbetween min and
1se via a penalization term. See [4].
References
[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).
[2] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.
[3] Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: Springer, 2009.
[4] Ternès, Nils, Federico Rotolo, and Stefan Michiels. “Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models.” Statistics in medicine 35.15 (2016): 2561-2573.
- predict_cumulative_hazard_function(X, time)[source]
Predict cumulative hazard function for patients in X at times time.
- Parameters:
X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
time (npt.NDArray[np.float64]) – Query times of dimension k. Assumed to be unique and ordered.
- Raises:
ValueError – Raises ValueError when the event times are not unique and sorted in ascending order.
- Returns:
- Query cumulative hazard function for samples 1, …, u
and times 1, …, k. Thus, has u rows and k columns.
- Return type:
npt.NDArray[np.float64]
- __abstractmethods__ = frozenset({})
- __doc__ = 'Child-class of BaseKDSurv to perform knowledge distillation specifically for Cox PH models.'
- __module__ = 'sparsesurv.cv'
- _abc_impl = <_abc._abc_data object>
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KDPHElasticNetCV
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KDPHElasticNetCV
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class KDAFTElasticNetCV(bandwidth=None, l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]
Bases:
BaseKDSurv
Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric AFT models.
- __init__(bandwidth=None, l1_ratio=1.0, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]
Constructor.
- Parameters:
bandwidth (Optional[float]) – Bandwidth to use for kernel-smoothing the profile likelihood. If not provided, a theoretically motivated profile likelihood is estimated based on the data.
"efron". (in observed survival times. Must be one of "breslow" or) –
l1_ratio (Union[float, List[float]], optional) – Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For
l1_ratio = 0
the penalty is an L2 penalty. Forl1_ratio = 1
it is an L1 penalty. For0 < l1_ratio < 1
, the penalty is a combination of L1 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in[.1, .5, .7, .9, .95, .99, 1]
. Defaults to 1.0.eps (float, optional) – Length of the path.
eps=1e-3
means thatalpha_min / alpha_max = 1e-3
. Defaults to 1e-3.n_alphas (int, optional) – Number of alphas along the regularization path, used for each l1_ratio. Defaults to 100.
max_iter (int, optional) – The maximum number of iterations. Defaults to 100.
tol (float, optional) – The tolerance for the optimization: if the updates are smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
. Defaults to 1e-4.cv (int, optional) – Number of folds to perform to select hyperparameters. Defaults to 5. See also stratify_cv.
verbose (int, optional) – Degree of verbosity. Defaults to 0.
max_epochs (int, optional) – Maximum number of coordinate descent epochs when solving a subproblem. Defaults to 50000.
p0 (int, optional) – Number of features in the first working set. Defaults to 10.
prune (bool, optional) – Whether to use pruning when growing the working sets. Defaults to True.
n_jobs (Optional[int], optional) – Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details. Defaults to None.stratify_cv (bool, optional) – Whether to perform the cross-validation stratified on the event indicator or not. Defaults to True.
seed (Optional[int], optional) – Random seed. Defaults to 42.
shuffle_cv (bool, optional) – Whether to perform shuffling to generate CV fold indices. Defaults to False.
cv_score_method (str, optional) – Which scoring method to use. Defaults to “linear_predictor”. Must be one of “linear_predictor”, “mse”, “basic” and “vvh”. See Notes.
max_coef (float, optional) – Maximum number of non-zero covariates to be selected with chosen optimal regularization hyperparameter. Defaults to np.inf. See Notes.
alpha_type (str, optional) – How to select the optimal regularization hyperparameter. Defaults to “min”. Must be one of “min”, “1se” and “pcvl”. See Notes.
Notes
- cv_score_method:
Decides how the score which is used to select the optimal regularization hyperparameter is selected. The basic approach may suffer from issues with small event sizes or for large number of folds [1]. Meanwhile, the mse approach may yield good teacher-student fidelity, but suboptimal survival predictions.
mse: Calculates the score as the mean squared error
of the teacher predictions and the student predictions. The MSE is calculated per test fold and aggregated across folds using the arithmetic mean. - linear_predictor: Calculates out of sample predictions for each test fold and caches them. Once out of sample predictions are produced for each sample, an appropriate survival loss between student predictions and the observed time and censoring indactor is calculated once, using only the cached out of sample predictions. See [1]. - basic: Calculates the score as an appropriate survival loss between student predictions and observed time and event indicators in each test fold. The overall loss is obtained as an arithmetic mean across all folds. - vvh: Calculates the test score in each test fold as the difference between the score across all samples in that fold and only the training samples in that fold.The overall loss is obtained as an arithmetic mean across all folds. See [1, 2].
- max_coef:
Places an upper bound on the number of non-zero coefficients the selected model returned after cross validation may have.
In particular, if max_coef=k, during scoring, only models with a total number of non-zero coefficients less than k are considered.
Currently, we still calculate the solutions for these models, we just disregard them at scoring time.
- alpha_type:
Decides how the regularization hyperparameter is selected. For a given cv_score_method and max_coef, we end up with a vector of length k > 1, that contains numeric scores, where lower is better (i.e., losses). alpha_type decides how we choose among the regularization hyperparameters corresponding to this loss vector.
min: Selects the regularization hyperparameter that
yields the minimum loss.
1se: Selects the highest regularization hyperparameter
that is within one standard error of the mean loss of the regularization hyperparameter with minimum loss [3].
pcvl: Selects a hyperparameter inbetween min and
1se via a penalization term. See [4].
References
[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).
[2] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.
[3] Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: Springer, 2009.
[4] Ternès, Nils, Federico Rotolo, and Stefan Michiels. “Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models.” Statistics in medicine 35.15 (2016): 2561-2573.
- predict_cumulative_hazard_function(X, time)[source]
Predict cumulative hazard function for patients in X at times time.
- Parameters:
X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
time (npt.NDArray[np.float64]) – Query times of dimension k. Assumed to be unique and ordered.
- Raises:
ValueError – Raises ValueError when the event times are not unique and sorted in ascending order.
- Returns:
- Query cumulative hazard function for samples 1, …, u
and times 1, …, k. Thus, has u rows and k columns.
- Return type:
npt.NDArray[np.float64]
- __abstractmethods__ = frozenset({})
- __doc__ = 'Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric AFT models.'
- __module__ = 'sparsesurv.cv'
- _abc_impl = <_abc._abc_data object>
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KDAFTElasticNetCV
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KDAFTElasticNetCV
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- class KDEHMultiTaskLassoCV(bandwidth=None, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]
Bases:
BaseKDSurv
Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric EH models.
- __init__(bandwidth=None, eps=0.001, n_alphas=100, max_iter=100, tol=0.0001, cv=5, verbose=0, max_epochs=50000, p0=10, prune=True, n_jobs=None, stratify_cv=True, seed=42, shuffle_cv=False, cv_score_method='linear_predictor', max_coef=inf, alpha_type='min')[source]
Constructor.
- Parameters:
bandwidth (Optional[float]) – Bandwidth to use for kernel-smoothing the profile likelihood. If not provided, a theoretically motivated profile likelihood is estimated based on the data.
"efron". (in observed survival times. Must be one of "breslow" or) –
l1_ratio (Union[float, List[float]], optional) – Float between 0 and 1 passed to ElasticNet (scaling between l1 and l2 penalties). For
l1_ratio = 0
the penalty is an L2 penalty. Forl1_ratio = 1
it is an L1 penalty. For0 < l1_ratio < 1
, the penalty is a combination of L1 and L2. This parameter can be a list, in which case the different values are tested by cross-validation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in[.1, .5, .7, .9, .95, .99, 1]
. Defaults to 1.0.eps (float, optional) – Length of the path.
eps=1e-3
means thatalpha_min / alpha_max = 1e-3
. Defaults to 1e-3.n_alphas (int, optional) – Number of alphas along the regularization path, used for each l1_ratio. Defaults to 100.
max_iter (int, optional) – The maximum number of iterations. Defaults to 100.
tol (float, optional) – The tolerance for the optimization: if the updates are smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
. Defaults to 1e-4.cv (int, optional) – Number of folds to perform to select hyperparameters. Defaults to 5. See also stratify_cv.
verbose (int, optional) – Degree of verbosity. Defaults to 0.
max_epochs (int, optional) – Maximum number of coordinate descent epochs when solving a subproblem. Defaults to 50000.
p0 (int, optional) – Number of features in the first working set. Defaults to 10.
prune (bool, optional) – Whether to use pruning when growing the working sets. Defaults to True.
n_jobs (Optional[int], optional) – Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.-1
means using all processors. See Glossary for more details. Defaults to None.stratify_cv (bool, optional) – Whether to perform the cross-validation stratified on the event indicator or not. Defaults to True.
seed (Optional[int], optional) – Random seed. Defaults to 42.
shuffle_cv (bool, optional) – Whether to perform shuffling to generate CV fold indices. Defaults to False.
cv_score_method (str, optional) – Which scoring method to use. Defaults to “linear_predictor”. Must be one of “linear_predictor”, “mse”, “basic” and “vvh”. See Notes.
max_coef (float, optional) – Maximum number of non-zero covariates to be selected with chosen optimal regularization hyperparameter. Defaults to np.inf. See Notes.
alpha_type (str, optional) – How to select the optimal regularization hyperparameter. Defaults to “min”. Must be one of “min”, “1se” and “pcvl”. See Notes.
Notes
- cv_score_method:
Decides how the score which is used to select the optimal regularization hyperparameter is selected. The basic approach may suffer from issues with small event sizes or for large number of folds [1]. Meanwhile, the mse approach may yield good teacher-student fidelity, but suboptimal survival predictions.
mse: Calculates the score as the mean squared error
of the teacher predictions and the student predictions. The MSE is calculated per test fold and aggregated across folds using the arithmetic mean. - linear_predictor: Calculates out of sample predictions for each test fold and caches them. Once out of sample predictions are produced for each sample, an appropriate survival loss between student predictions and the observed time and censoring indactor is calculated once, using only the cached out of sample predictions. See [1]. - basic: Calculates the score as an appropriate survival loss between student predictions and observed time and event indicators in each test fold. The overall loss is obtained as an arithmetic mean across all folds. - vvh: Calculates the test score in each test fold as the difference between the score across all samples in that fold and only the training samples in that fold.The overall loss is obtained as an arithmetic mean across all folds. See [1, 2].
- max_coef:
Places an upper bound on the number of non-zero coefficients the selected model returned after cross validation may have.
In particular, if max_coef=k, during scoring, only models with a total number of non-zero coefficients less than k are considered.
Currently, we still calculate the solutions for these models, we just disregard them at scoring time.
- alpha_type:
Decides how the regularization hyperparameter is selected. For a given cv_score_method and max_coef, we end up with a vector of length k > 1, that contains numeric scores, where lower is better (i.e., losses). alpha_type decides how we choose among the regularization hyperparameters corresponding to this loss vector.
min: Selects the regularization hyperparameter that
yields the minimum loss.
1se: Selects the highest regularization hyperparameter
that is within one standard error of the mean loss of the regularization hyperparameter with minimum loss [3].
pcvl: Selects a hyperparameter inbetween min and
1se via a penalization term. See [4].
References
[1] Dai, Biyue, and Patrick Breheny. “Cross validation approaches for penalized Cox regression.” arXiv preprint arXiv:1905.10432 (2019).
[2] Verweij, Pierre JM, and Hans C. Van Houwelingen. “Cross‐validation in survival analysis.” Statistics in medicine 12.24 (1993): 2305-2314.
[3] Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: Springer, 2009.
[4] Ternès, Nils, Federico Rotolo, and Stefan Michiels. “Empirical extensions of the lasso penalty to reduce the false discovery rate in high‐dimensional Cox regression models.” Statistics in medicine 35.15 (2016): 2561-2573.
- path(X, y, alphas, coef_init=None, **kwargs)[source]
- Return type:
- Compute Lasso path with Celer. Function taken as-is from celer for compatibility
with parent class.
See also
celer.homotopy.mtl_path celer.dropin_sklearn.MultiTaskLassoCV
- predict(X)[source]
Calculate linear predictor corresponding to query design matrix X.
- Parameters:
X (npt.NDArray[np.float64]) – Query design matrix.
- Returns:
Linear predictor of the samples.
- Return type:
npt.NDArray[np.float64]
- _is_multitask()[source]
Return whether the model instance in question is a multitask model.
Needed for scikit-learn/celer compatability.
- Return type:
- Returns:
bool
- predict_cumulative_hazard_function(X, time)[source]
Predict cumulative hazard function for patients in X at times time.
- Parameters:
X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
time (npt.NDArray[np.float64]) – Query times of dimension k. Assumed to be unique and ordered.
- Raises:
ValueError – Raises ValueError when the event times are not unique and sorted in ascending order.
- Returns:
- Query cumulative hazard function for samples 1, …, u
and times 1, …, k. Thus, has u rows and k columns.
- Return type:
npt.NDArray[np.float64]
- __abstractmethods__ = frozenset({})
- __doc__ = 'Child-class of BaseKDSurv to perform knowledge distillation specifically for semiparametric EH models.'
- __module__ = 'sparsesurv.cv'
- _abc_impl = <_abc._abc_data object>
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KDEHMultiTaskLassoCV
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') KDEHMultiTaskLassoCV
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.