sparsesurv.eh module

Summary

Classes:

`EH`	Linear Extended Hazards (EH) model based on kernel-smoothed PL [Tseng2011].

Reference

class EH(bandwidth=None, tol=None, options=None)[source]

Bases: SurvivalMixin

Linear Extended Hazards (EH) model based on kernel-smoothed PL [Tseng2011].

Fits a linear EH model based on the kernel smoothed profile likelihood as proposed by [1]. Uses the trust-ncg algorithm implementation from ‘scipy.optimize.minimize` for optimization using a BFGS [Fletcher2000] quasi-Newton strategy. Gradients are JIT-compiled using numba and implemented in an efficient manner (see pcsurv.gradients).

References

Tseng, Yi-Kuan, and Ken-Ning Shu. “Efficient estimation for a semiparametric extended hazards model.” Communications in Statistics—Simulation and Computation® 40.2 (2011): 258-273.

Fletcher, Roger. Practical methods of optimization. John Wiley & Sons, 2000.

Sheather, Simon J., and Michael C. Jones. “A reliable data‐based bandwidth selection method for kernel density estimation.” Journal of the Royal Statistical Society: Series B (Methodological) 53.3 (1991): 683-690.

Zhong, Qixian, Jonas W. Mueller, and Jane-Ling Wang. “Deep extended hazard models for survival analysis.” Advances in Neural Information Processing Systems 34 (2021): 15111-15124.

__init__(bandwidth=None, tol=None, options=None)[source]

Constructor.

Parameters:

bandwidth (Optional[float], optional) – Bandwidth to be used for kernel smoothing the profile likelihood. If left unspecified (i.e., None), optimal bandwidth will be estimted empirically, similar to previous work ([Sheather1991],[Zhong2021]_). Defaults to None.
tol (float, optional) – Tolerance for terminating the trust-ncg algorithm in scipy. Defaults to None.
options (Dict[str, Union[bool, int, float]], optional) – Solver-specific configuration options of the trust-ncg solver in scipy. Defaults to None.

init_coefs(X)[source]

Initializes the coefficients of the EH model at all zeros.

Parameters:: X (_type_) – Training design matrix with n rows and p columns.
Returns:: Initialized coefficients with p rows and 2 columns.
Return type:: npt.NDArray[np.float64]

fit(X, y, sample_weight=None)[source]

Fits the linear AFT model using the trust-ncg implementation from scipy.

Parameters:

X (npt.NDArray[np.float64]) – Design matrix.
y (np.array) – Structured array containing right-censored survival information.
sample_weight (npt.NDArray[np.float64]) – Sample weight used during model fitting. Currently unused and kept for sklearn compatibility. Defaults to None.
sample_weight – Kept for API compatibility.

Return type:

None

predict(X)[source]

Calculate linear predictor for the EH model.

Note

Since the EH model has two coefficient vectors, we need a slightly different: signature relative to the standard X @ beta approach.

Parameters:: X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
Returns:: Query linear predictor with u rows and 2 columns.
Return type:: npt.NDArray[np.float64]

predict_cumulative_hazard_function(X, time)[source]

Predict cumulative hazard function for patients in X at times time.

Parameters:

X (npt.NDArray[np.float64]) – Query design matrix with u rows and p columns.
time (npt.NDArray[np.float64]) – Query times of dimension k. Assumed to be unique and ordered.

Raises:

ValueError – Raises ValueError when the event times are not unique and sorted in ascending order.

Returns:

Query cumulative hazard function for samples 1, …, u: and times 1, …, k. Thus, has u rows and k columns.

Return type:

npt.NDArray[np.float64]

__doc__ = 'Linear Extended Hazards (EH) model based on kernel-smoothed PL [Tseng2011]_.\n\n Fits a linear EH model based on the kernel smoothed profile likelihood\n as proposed by [1]. Uses the `trust-ncg` algorithm implementation\n from \'scipy.optimize.minimize` for optimization using a BFGS [Fletcher2000]_\n quasi-Newton strategy. Gradients are JIT-compiled using numba\n and implemented in an efficient manner (see `pcsurv.gradients`).\n\n References:\n Tseng, Yi-Kuan, and Ken-Ning Shu. "Efficient estimation for a semiparametric extended hazards model." Communications in Statistics—Simulation and Computation® 40.2 (2011): 258-273.\n\n Fletcher, Roger. Practical methods of optimization. John Wiley & Sons, 2000.\n\n Sheather, Simon J., and Michael C. Jones. "A reliable data‐based bandwidth selection method for kernel density estimation." Journal of the Royal Statistical Society: Series B (Methodological) 53.3 (1991): 683-690.\n\n Zhong, Qixian, Jonas W. Mueller, and Jane-Ling Wang. "Deep extended hazard models for survival analysis." Advances in Neural Information Processing Systems 34 (2021): 15111-15124.\n '

__module__ = 'sparsesurv.eh'

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → EH

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in fit.
Returns:: self – The updated object.
Return type:: object