Metrics#

class scripts.metrics.corrs.AvgBulkScorer(std: bool)[source]#

Bases: object

This class creates a scorer that takes the average of the (std or not) bulk gex as a proxy of signature score

score(bulk_values: DataFrame, metasig: ndarray) Series[source]#

The main scoring function

Parameters:
  • bulk_values – a df of size (n_samples, n_genes) with the bulk gene expression

  • metasig – a list of genes representing the signature to score

Returns:

a series with the score for each patient

scripts.metrics.corrs.corr_signatures(df: DataFrame, gt_names: List[str], meta_sigs_names: List[str]) DataFrame[source]#
scripts.metrics.corrs.get_adata(data_path: Path) AnnData[source]#
scripts.metrics.corrs.get_all_scores(bulk_values: DataFrame, metasignatures: DataFrame, truesignatures: DataFrame) DataFrame[source]#
Computes the scores for all metasignatures and true signatures using

the appropriate scorer function

Parameters:
  • bulk_values – df of bulk gex of size (n_samples, n_genes)

  • purity – series with the purity information per patient + cancer type of size (n_samples, 2)

  • metasignatures – df with n_metasignatures columns, containing in each column the list of genes that constitute the metasignature

  • truesignatures – df with n_true signatures columns, containing in each column the list of genes that constitute the true signature

  • std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average

  • sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)

Returns:

the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),

containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on

scripts.metrics.corrs.get_args() Namespace[source]#
scripts.metrics.corrs.get_scores(adata: AnnData, gt_sigs: DataFrame, meta_sigs: DataFrame)[source]#
scripts.metrics.corrs.main() None[source]#
scripts.metrics.corrs.score_dataset(bulk_data: DataFrame, metasignature: DataFrame, truesignature: DataFrame) Tuple[List[str], List[str], DataFrame][source]#

Main function, computes the bulk score for metasignatures and reference signatures

Parameters:
  • bulk_file – path to file with the bulk gex

  • metasignature_file – path to the file with the metasignature genes

  • truesignature_file – path to the file with the true signature genes

  • scorer_name – which scoring to use

  • std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average

  • sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)

Returns:

the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),

containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on

scripts.metrics.corrs.score_signature(adata, sigs: DataFrame) List[str][source]#
scripts.metrics.eval.get_args() Namespace[source]#
scripts.metrics.eval.get_score(corr: DataFrame, sig_gts: List[str], sig_names: List[str]) float[source]#
scripts.metrics.eval.main() None[source]#
scripts.metrics.marker_overlap.get_args() Namespace[source]#
scripts.metrics.marker_overlap.get_overlap(gt_signatures: DataFrame, signatures: DataFrame, var_names: array) DataFrame[source]#
scripts.metrics.marker_overlap.main() None[source]#
class scripts.metrics.score_bulk.AvgBulkScorer(std: bool)[source]#

Bases: object

This class creates a scorer that takes the average of the (std or not) bulk gex as a proxy of signature score

score(bulk_values: DataFrame, metasig: ndarray) Series[source]#

The main scoring function

Parameters:
  • bulk_values – a df of size (n_samples, n_genes) with the bulk gene expression

  • metasig – a list of genes representing the signature to score

Returns:

a series with the score for each patient

scripts.metrics.score_bulk.get_all_scores(bulk_values: DataFrame, metasignatures: DataFrame, truesignatures: DataFrame) DataFrame[source]#
Computes the scores for all metasignatures and true signatures using

the appropriate scorer function

Parameters:
  • bulk_values – df of bulk gex of size (n_samples, n_genes)

  • purity – series with the purity information per patient + cancer type of size (n_samples, 2)

  • metasignatures – df with n_metasignatures columns, containing in each column the list of genes that constitute the metasignature

  • truesignatures – df with n_true signatures columns, containing in each column the list of genes that constitute the true signature

  • std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average

  • sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)

Returns:

the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),

containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on

scripts.metrics.score_bulk.get_args()[source]#
scripts.metrics.score_bulk.get_data(bulk_file: Path, metasignature_file: Path, truesignature_file: Path) Tuple[DataFrame, DataFrame, DataFrame][source]#

Helper function to download files

Parameters:
  • bulk_file – path to file with the bulk gex

  • purity_file – path to the file with the purity info

  • metasignature_file – path to the file with the metasignature genes

  • truesignature_file – path to the file with the true signature genes

Returns:

a scorer object

scripts.metrics.score_bulk.main()[source]#
scripts.metrics.score_bulk.score_dataset(bulk_file: Path, metasignature_file: Path, truesignature_file: Path) DataFrame[source]#

Main function, computes the bulk score for metasignatures and reference signatures

Parameters:
  • bulk_file – path to file with the bulk gex

  • metasignature_file – path to the file with the metasignature genes

  • truesignature_file – path to the file with the true signature genes

  • scorer_name – which scoring to use

  • std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average

  • sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)

Returns:

the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),

containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on

scripts.metrics.eval_overlap.get_args() Namespace[source]#
scripts.metrics.eval_overlap.get_score(overlap: DataFrame) float[source]#
scripts.metrics.eval_overlap.main() None[source]#
scripts.metrics.aggregate_signatures.get_args()[source]#
scripts.metrics.aggregate_signatures.main()[source]#
scripts.metrics.aggregate_methods.get_args()[source]#
scripts.metrics.aggregate_methods.main()[source]#