sparsesurv.eh module
Summary
Classes:
Linear Extended Hazards (EH) model based on kernel-smoothed PL [Tseng2011]. |
Reference
- class EH(bandwidth=None, tol=None, options=None)[source]
Bases:
SurvivalMixin
Linear Extended Hazards (EH) model based on kernel-smoothed PL [Tseng2011].
Fits a linear EH model based on the kernel smoothed profile likelihood as proposed by [1]. Uses the trust-ncg algorithm implementation from ‘scipy.optimize.minimize` for optimization using a BFGS [Fletcher2000] quasi-Newton strategy. Gradients are JIT-compiled using numba and implemented in an efficient manner (see pcsurv.gradients).
References
Tseng, Yi-Kuan, and Ken-Ning Shu. “Efficient estimation for a semiparametric extended hazards model.” Communications in Statistics—Simulation and Computation® 40.2 (2011): 258-273.
Fletcher, Roger. Practical methods of optimization. John Wiley & Sons, 2000.
Sheather, Simon J., and Michael C. Jones. “A reliable data‐based bandwidth selection method for kernel density estimation.” Journal of the Royal Statistical Society: Series B (Methodological) 53.3 (1991): 683-690.
Zhong, Qixian, Jonas W. Mueller, and Jane-Ling Wang. “Deep extended hazard models for survival analysis.” Advances in Neural Information Processing Systems 34 (2021): 15111-15124.
- __init__(bandwidth=None, tol=None, options=None)[source]
Constructor.
- Parameters:
bandwidth (Optional[float], optional) – Bandwidth to be used for kernel smoothing the profile likelihood. If left unspecified (i.e., None), optimal bandwidth will be estimted empirically, similar to previous work ([Sheather1991],[Zhong2021]_). Defaults to None.
tol (float, optional) – Tolerance for terminating the trust-ncg algorithm in scipy. Defaults to None.
options (Dict[str, Union[bool, int, float]], optional) – Solver-specific configuration options of the trust-ncg solver in scipy. Defaults to None.
- init_coefs(X)[source]
Initializes the coefficients of the EH model at all zeros.
- Parameters:
X (_type_) – Training design matrix with n rows and p columns.
- Returns:
Initialized coefficients with p rows and 2 columns.
- Return type:
npt.NDArray[np.float64]
- fit(X, y, sample_weight=None)[source]
Fits the linear AFT model using the trust-ncg implementation from scipy.
- Parameters:
X (npt.NDArray[np.float64]) – Design matrix.
y (np.array) – Structured array containing right-censored survival information.
sample_weight (npt.NDArray[np.float64]) – Sample weight used during model fitting. Currently unused and kept for sklearn compatibility. Defaults to None.
sample_weight – Kept for API compatibility.
- Return type:
- predict(X)[source]
Calculate linear predictor for the EH model.
Note
- Since the EH model has two coefficient vectors, we need a slightly different
signature relative to the standard X @ beta approach.
- Parameters:
X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
- Returns:
Query linear predictor with u rows and 2 columns.
- Return type:
npt.NDArray[np.float64]
- predict_cumulative_hazard_function(X, time)[source]
Predict cumulative hazard function for patients in X at times time.
- Parameters:
X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
time (npt.NDArray[np.float64]) – Query times of dimension k. Assumed to be unique and ordered.
- Raises:
ValueError – Raises ValueError when the event times are not unique and sorted in ascending order.
- Returns:
- Query cumulative hazard function for samples 1, …, u
and times 1, …, k. Thus, has u rows and k columns.
- Return type:
npt.NDArray[np.float64]
- __doc__ = 'Linear Extended Hazards (EH) model based on kernel-smoothed PL [Tseng2011]_.\n\n Fits a linear EH model based on the kernel smoothed profile likelihood\n as proposed by [1]. Uses the `trust-ncg` algorithm implementation\n from \'scipy.optimize.minimize` for optimization using a BFGS [Fletcher2000]_\n quasi-Newton strategy. Gradients are JIT-compiled using numba\n and implemented in an efficient manner (see `pcsurv.gradients`).\n\n References:\n Tseng, Yi-Kuan, and Ken-Ning Shu. "Efficient estimation for a semiparametric extended hazards model." Communications in Statistics—Simulation and Computation® 40.2 (2011): 258-273.\n\n Fletcher, Roger. Practical methods of optimization. John Wiley & Sons, 2000.\n\n Sheather, Simon J., and Michael C. Jones. "A reliable data‐based bandwidth selection method for kernel density estimation." Journal of the Royal Statistical Society: Series B (Methodological) 53.3 (1991): 683-690.\n\n Zhong, Qixian, Jonas W. Mueller, and Jane-Ling Wang. "Deep extended hazard models for survival analysis." Advances in Neural Information Processing Systems 34 (2021): 15111-15124.\n '
- __module__ = 'sparsesurv.eh'
- set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') EH
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.