Metrics

Metrics#

class scripts.metrics.corrs.AvgBulkScorer(std: bool)[source]#

Bases: object

This class creates a scorer that takes the average of the (std or not) bulk gex as a proxy of signature score

score(bulk_values: DataFrame, metasig: ndarray) → Series[source]#

The main scoring function

Parameters:

bulk_values – a df of size (n_samples, n_genes) with the bulk gene expression
metasig – a list of genes representing the signature to score

Returns:

a series with the score for each patient

scripts.metrics.corrs.corr_signatures(df: DataFrame, gt_names: List[str], meta_sigs_names: List[str]) → DataFrame[source]#

scripts.metrics.corrs.get_adata(data_path: Path) → AnnData[source]#

scripts.metrics.corrs.get_all_scores(bulk_values: DataFrame, metasignatures: DataFrame, truesignatures: DataFrame) → DataFrame[source]#

Computes the scores for all metasignatures and true signatures using: the appropriate scorer function

Parameters:

bulk_values – df of bulk gex of size (n_samples, n_genes)
purity – series with the purity information per patient + cancer type of size (n_samples, 2)
metasignatures – df with n_metasignatures columns, containing in each column the list of genes that constitute the metasignature
truesignatures – df with n_true signatures columns, containing in each column the list of genes that constitute the true signature
std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average
sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)

Returns:

the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),: containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on

scripts.metrics.corrs.get_args() → Namespace[source]#

scripts.metrics.corrs.get_scores(adata: AnnData, gt_sigs: DataFrame, meta_sigs: DataFrame)[source]#

scripts.metrics.corrs.main() → None[source]#

scripts.metrics.corrs.score_dataset(bulk_data: DataFrame, metasignature: DataFrame, truesignature: DataFrame) → Tuple[List[str], List[str], DataFrame][source]#

Main function, computes the bulk score for metasignatures and reference signatures

Parameters:

bulk_file – path to file with the bulk gex
metasignature_file – path to the file with the metasignature genes
truesignature_file – path to the file with the true signature genes
scorer_name – which scoring to use
std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average
sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)

Returns:

the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),: containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on

scripts.metrics.corrs.score_signature(adata, sigs: DataFrame) → List[str][source]#

scripts.metrics.eval.get_args() → Namespace[source]#

scripts.metrics.eval.get_score(corr: DataFrame, sig_gts: List[str], sig_names: List[str]) → float[source]#

scripts.metrics.eval.main() → None[source]#

scripts.metrics.marker_overlap.get_args() → Namespace[source]#

scripts.metrics.marker_overlap.get_overlap(gt_signatures: DataFrame, signatures: DataFrame, var_names: array) → DataFrame[source]#

scripts.metrics.marker_overlap.main() → None[source]#

class scripts.metrics.score_bulk.AvgBulkScorer(std: bool)[source]#

Bases: object

This class creates a scorer that takes the average of the (std or not) bulk gex as a proxy of signature score

score(bulk_values: DataFrame, metasig: ndarray) → Series[source]#

The main scoring function

Parameters:

bulk_values – a df of size (n_samples, n_genes) with the bulk gene expression
metasig – a list of genes representing the signature to score

Returns:

a series with the score for each patient

scripts.metrics.score_bulk.get_all_scores(bulk_values: DataFrame, metasignatures: DataFrame, truesignatures: DataFrame) → DataFrame[source]#

Computes the scores for all metasignatures and true signatures using: the appropriate scorer function

Parameters:

bulk_values – df of bulk gex of size (n_samples, n_genes)
purity – series with the purity information per patient + cancer type of size (n_samples, 2)
metasignatures – df with n_metasignatures columns, containing in each column the list of genes that constitute the metasignature
truesignatures – df with n_true signatures columns, containing in each column the list of genes that constitute the true signature
std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average
sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)

Returns:

the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),: containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on

scripts.metrics.score_bulk.get_args()[source]#

scripts.metrics.score_bulk.get_data(bulk_file: Path, metasignature_file: Path, truesignature_file: Path) → Tuple[DataFrame, DataFrame, DataFrame][source]#

Helper function to download files

Parameters:

bulk_file – path to file with the bulk gex
purity_file – path to the file with the purity info
metasignature_file – path to the file with the metasignature genes
truesignature_file – path to the file with the true signature genes

Returns:

a scorer object

scripts.metrics.score_bulk.main()[source]#

scripts.metrics.score_bulk.score_dataset(bulk_file: Path, metasignature_file: Path, truesignature_file: Path) → DataFrame[source]#

Main function, computes the bulk score for metasignatures and reference signatures

Parameters:

bulk_file – path to file with the bulk gex
metasignature_file – path to the file with the metasignature genes
truesignature_file – path to the file with the true signature genes
scorer_name – which scoring to use
std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average
sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)

Returns:

the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),: containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on

scripts.metrics.eval_overlap.get_args() → Namespace[source]#

scripts.metrics.eval_overlap.get_score(overlap: DataFrame) → float[source]#

scripts.metrics.eval_overlap.main() → None[source]#

scripts.metrics.aggregate_signatures.get_args()[source]#

scripts.metrics.aggregate_signatures.main()[source]#

scripts.metrics.aggregate_methods.get_args()[source]#

scripts.metrics.aggregate_methods.main()[source]#

Metrics

Contents

Metrics#