Metrics#
- class scripts.metrics.corrs.AvgBulkScorer(std: bool)[source]#
Bases:
object
This class creates a scorer that takes the average of the (std or not) bulk gex as a proxy of signature score
- scripts.metrics.corrs.corr_signatures(df: DataFrame, gt_names: List[str], meta_sigs_names: List[str]) DataFrame [source]#
- scripts.metrics.corrs.get_all_scores(bulk_values: DataFrame, metasignatures: DataFrame, truesignatures: DataFrame) DataFrame [source]#
- Computes the scores for all metasignatures and true signatures using
the appropriate scorer function
- Parameters:
bulk_values – df of bulk gex of size (n_samples, n_genes)
purity – series with the purity information per patient + cancer type of size (n_samples, 2)
metasignatures – df with n_metasignatures columns, containing in each column the list of genes that constitute the metasignature
truesignatures – df with n_true signatures columns, containing in each column the list of genes that constitute the true signature
std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average
sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)
- Returns:
- the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),
containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on
- scripts.metrics.corrs.score_dataset(bulk_data: DataFrame, metasignature: DataFrame, truesignature: DataFrame) Tuple[List[str], List[str], DataFrame] [source]#
Main function, computes the bulk score for metasignatures and reference signatures
- Parameters:
bulk_file – path to file with the bulk gex
metasignature_file – path to the file with the metasignature genes
truesignature_file – path to the file with the true signature genes
scorer_name – which scoring to use
std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average
sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)
- Returns:
- the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),
containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on
- scripts.metrics.eval.get_score(corr: DataFrame, sig_gts: List[str], sig_names: List[str]) float [source]#
- scripts.metrics.marker_overlap.get_overlap(gt_signatures: DataFrame, signatures: DataFrame, var_names: array) DataFrame [source]#
- class scripts.metrics.score_bulk.AvgBulkScorer(std: bool)[source]#
Bases:
object
This class creates a scorer that takes the average of the (std or not) bulk gex as a proxy of signature score
- scripts.metrics.score_bulk.get_all_scores(bulk_values: DataFrame, metasignatures: DataFrame, truesignatures: DataFrame) DataFrame [source]#
- Computes the scores for all metasignatures and true signatures using
the appropriate scorer function
- Parameters:
bulk_values – df of bulk gex of size (n_samples, n_genes)
purity – series with the purity information per patient + cancer type of size (n_samples, 2)
metasignatures – df with n_metasignatures columns, containing in each column the list of genes that constitute the metasignature
truesignatures – df with n_true signatures columns, containing in each column the list of genes that constitute the true signature
std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average
sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)
- Returns:
- the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),
containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on
- scripts.metrics.score_bulk.get_data(bulk_file: Path, metasignature_file: Path, truesignature_file: Path) Tuple[DataFrame, DataFrame, DataFrame] [source]#
Helper function to download files
- Parameters:
bulk_file – path to file with the bulk gex
purity_file – path to the file with the purity info
metasignature_file – path to the file with the metasignature genes
truesignature_file – path to the file with the true signature genes
- Returns:
a scorer object
- scripts.metrics.score_bulk.score_dataset(bulk_file: Path, metasignature_file: Path, truesignature_file: Path) DataFrame [source]#
Main function, computes the bulk score for metasignatures and reference signatures
- Parameters:
bulk_file – path to file with the bulk gex
metasignature_file – path to the file with the metasignature genes
truesignature_file – path to the file with the true signature genes
scorer_name – which scoring to use
std – only used if scorer name is average, if True the bulk gex will be standardized before computing the average
sample_norm_method – only used if scorer name is ssgsea, what method to use for sample norm in ssgsea (see ssgsea doc for more info)
- Returns:
- the dataframe of size (n_samples, n_metasignatures + n_true signatures + 1 + 1),
containing all scores on the metasignatures, the true signatuers, the purity information, and TCGA the cancer type the scoring was performed on