API
cisTarget
DEM
- class pycistarget.motif_enrichment_dem.DEM(dem_db, region_sets: Dict[str, PyRanges], specie: str, subset_motifs: List[str] | None = None, contrasts: str | List | None = 'Other', name: str | None = 'DEM', max_bg_regions: int | None = None, adjpval_thr: float | None = 0.05, log2fc_thr: float | None = 1, mean_fg_thr: float | None = 0, motif_hit_thr: float | None = None, n_cpu: int | None = 1, fraction_overlap: float = 0.4, cluster_buster_path: str | None = None, path_to_genome_fasta: str | None = None, path_to_motifs: str | None = None, genome_annotation: PyRanges | None = None, promoter_space: int = 1000, path_to_motif_annotations: str | None = None, annotation_version: str = 'v9', motif_annotation: list = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'], motif_similarity_fdr: float = 0.001, orthologous_identity_threshold: float = 0.0, tmp_dir: int | None = None, **kwargs)[source]
DEM class.
DEM
contains DEM method for motif enrichment analysis on sets of regions.- regions_to_db
A dataframe containing the mapping between query regions and regions in the database.
- Type:
pd.DataFrame
- region_sets
A dictionary of PyRanges containing region coordinates for the regions to be analyzed.
- Type:
Dict
- specie
Specie from which genomic coordinates come from.
- Type:
str
- subset_motifs
List of motifs to disregard in the analysis. Default: None
- Type:
List, optional
- contrasts
List of contrasts to perform. Default: None (Each group versus all the rest)
- Type:
List, optional
- name
Analysis name
- Type:
str
- max_bg_regions
Maximum number of regions to use as background. Default: None (All)
- Type:
int, optional
- adjpval_thr
Adjusted p-value threshold to consider a motif enriched. Default: 0.05
- Type:
float, optional
- log2fc_thr
Log2 Fold-change threshold to consider a motif enriched. Default: 1
- Type:
float, optional
- mean_fg_thr
Minimul mean signal in the foreground to consider a motif enriched. Default: 0
- Type:
float, optional
- motif_hit_thr
Minimal CRM score to consider a region enriched for a motif. Default: None (It will be automatically calculated based on precision-recall).
- Type:
float, optional
- n_cpu
Number of cores to use. Default: 1
- Type:
int, optional
- fraction_overlap
Minimal overlap between query and regions in the database for the mapping.
- Type:
float, optional
- cluster_buster_path
Path to cluster buster bin. Only required if using a shuffled background. Default: None
- Type:
str, optional
- path_to_genome_fasta
Path to genome fasta file. Only required if using a shuffled background. Default: None
- Type:
str, optional.
- path_to_motifs
Path to motif collection folder (in .cb format). Only required if using a shuffled background. Default: None
- Type:
str, optional.
- genome_annotation
Pyranges containing genome annotation (e.g. biomart). Only required if using promoter balance. Default: None
- Type:
pr.PyRanges, optional.
- promoter_space
Space around TSS to consider a region promoter. Only used if using promoter balance. Default: 1000
- Type:
int, optional
- path_to_motif_annotations
Path to motif annotations. If not provided, they will be downloaded from https://resources.aertslab.org based on the specie name provided (only possible for mus_musculus, homo_sapiens and drosophila_melanogaster). Default: None
- Type:
str, optional
- motif_similarity_fdr
Minimal motif similarity value to consider two motifs similar. Default: 0.001
- Type:
float, optional
- orthologous_identity_threshold
Minimal orthology value for considering two TFs orthologous. Default: 0.0
- Type:
float, optional
- motifs_to_use
A subset of motifs to use for the analysis. Default: None (All)
- Type:
List, optional
- tmp_dir
Temp directory to use if running cluster_buster. Default: None ( mp)
- Type:
str, optional
- motif_enrichment
A dataframe containing motif enrichment results
- Type:
pd.DataFrame
- motif_hits
A dictionary containing regions that are considered enriched for each motif.
- Type:
Dict
- cistromes
A dictionary containing TF cistromes. Cistromes with no extension contain regions linked to directly annotated motifs, while ‘_extended’ cistromes can contain regions linked to motifs annotated by similarity or orthology.
- Type:
Dict
Methods
DEM_results
([name])Print motif enrichment table as HTML
add_motif_annotation_dem
([add_logo])Add motif annotation
run
(dem_db_scores, **kwargs)Run DEM
- class pycistarget.motif_enrichment_dem.DEMDatabase(fname: str, region_sets: Dict[str, PyRanges] | None = None, name: str | None = None, fraction_overlap: float = 0.4)[source]
DEM Database class.
DEMDatabase
contains a dataframe with motifs as rows, regions as columns and CRM scores as values. In addition, is contains a slot to map query regions to regions in the database. For more information on how to generate databases, please visit: https://github.com/aertslab/create_cisTarget_databases- regions_to_db
A dataframe containing the mapping between query regions and regions in the database.
- Type:
pd.DataFrame
- db_scores
A dataframe with motifs as rows, regions as columns and CRM scores as values.
- Type:
pd.DataFrame
- total_regions
Total number of regions in the database
- Type:
int
Methods
load_db
(fname[, region_sets, name, ...])Load DEMDatabase
- load_db(fname: str, region_sets: Dict[str, PyRanges] | None = None, name: str | None = None, fraction_overlap: float = 0.4)[source]
Load DEMDatabase
- Parameters:
fname (str) – Path to feather file containing the DEM database (regions_vs_motifs)
region_sets (Dict or pr.PyRanges, optional) – Dictionary or pr.PyRanges that are going to be analyzed with DEM. Default: None.
name (str, optional) – Name for the DEM database. Default: None
fraction_overlap (float, optional) – Minimal overlap between query and regions in the database for the mapping.
- pycistarget.motif_enrichment_dem.DEM_internal(dem_db_scores: DataFrame, region_group: List[List[str]], contrast_name: str, adjpval_thr: float | None = 0.05, log2fc_thr: float | None = 1, mean_fg_thr: float | None = 0, motif_hit_thr: float | None = None)[source]
Internal operations for DEM.
- pycistarget.motif_enrichment_dem.create_groups(contrast: list, region_sets_names: list, max_bg_regions: int, path_to_genome_fasta: str, path_to_regions_fasta: str, cbust_path: str, path_to_motifs: str, annotation: PyRanges | None = None, promoter_space: int = 1000, motifs: list | None = None, n_cpu: int = 1, **kwargs)[source]
” Format contrast groups
- pycistarget.motif_enrichment_dem.get_motif_hits(scores, regions, labels, optimal_threshold=None)[source]
Determine optimal score threshold based on precision-recall.
Homer
- class pycistarget.motif_enrichment_homer.Homer(homer_path: str, bed_path: str, name: str, outdir: str, genome: str, size: str = 'given', mask: bool = True, denovo: bool = False, length: str = '8,10,12', meme_path: str | None = None, meme_collection_path: str | None = None, path_to_motif_annotations: str | None = None, annotation_version: str = 'v9', cistrome_annotation: List[str] = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'], motif_similarity_fdr: float = 0.001, orthologous_identity_threshold: float = 0.0)[source]
Homer class.
Homer
contains Homer for motif enrichment analysis on sets of regions.- homer_path
Path to Homer bin folder.
- Type:
str
- bed_path
Path to bed file containing region set to be analyzed with Homer.
- Type:
str
- name
Analysis name.
- Type:
str
- outdir
Path to folder to output Homer results.
- Type:
str
- genome
Homer genome label to use.
- Type:
str
- size
Fragment size to use for motif finding. Default: ‘given’ [uses the exact regions you give it]
- Type:
str, optional
- mask
Whether to mask repeats or not. Default: True
- Type:
bool, optional
- denovo
Whether to infer overrepresented motifs de novo. Default: False
- Type:
bool, optional
- length
Motif length values. Default: 8,10,12
- Type:
str, optional
- meme_path
Path to meme bin folder. Meme will be used if given for motif annotation. Default: None
- Type:
str, optional
- meme_collection_path
Path to motif collection (in .cb format) to compare homer motifs with. Default: None
- Type:
str, optional
- path_to_motif_annotations
Path to motif annotations. If not provided, they will be downloaded from https://resources.aertslab.org based on the specie name provided (only possible for mus_musculus, homo_sapiens and drosophila_melanogaster). Default: None
- Type:
str, optional
- annotation_version
Motif collection version. Default: v9
- Type:
str, optional
- cistrome_annotation
Annotation to use for forming cistromes. It can be ‘Direct_annot’ (direct evidence that the motif is linked to that TF), ‘Motif_similarity_annot’ (based on tomtom motif similarity), ‘Orthology_annot’ (based on orthology with a TF that is directly linked to that motif) or ‘Motif_similarity_and_Orthology_annot’. Default: [‘Direct_annot’, ‘Motif_similarity_annot’, ‘Orthology_annot’, ‘Motif_similarity_and_Orthology_annot’]
- Type:
List, optional
- motif_similarity_fdr
Minimal motif similarity value to consider two motifs similar. Default: 0.001
- Type:
float, optional
- orthologous_identity_threshold
Minimal orthology value for considering two TFs orthologous. Default: 0.0
- Type:
float, optional
- known_motifs
A dataframe containing known motif enrichment results.
- Type:
pd.DataFrame
- denovo_motifs
A dataframe containing de novo motif enrichment results.
- Type:
pd.DataFrame
- known_motif_hits
A dictionary containing regions with motif hits for each known motif.
- Type:
Dict
- denovo_motif_hits
A dictionary containing regions with motif hits for each de novo motif.
- Type:
Dict
- known_cistromes
A dictionary containing regions with motif hits for each TF found with known motifs.
- Type:
Dict
- denovo_motif_hits
A dictionary containing regions with motif hits for each TF found de novo.
- Type:
Dict
References
Heinz S, Benner C, Spann N, Bertolino E et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol Cell 2010 May 28;38(4):576-589. PMID: 20513432
Methods
Add motif annotations (based on Homer, cisTarget and meme if specified)
find_motif_hits
([n_cpu])Find motif hits with homer2 find
get_cistromes
([annotation])Format cistromes per TF
Load de novo motif enrichment results from file.
Load known motif enrichment results from file.
run
()Run Homer
- add_motif_annotation_homer()[source]
Add motif annotations (based on Homer, cisTarget and meme if specified)
- find_motif_hits(n_cpu=1)[source]
Find motif hits with homer2 find
- Parameters:
n_cpu (int) – Number of cores to use.
- get_cistromes(annotation: List[str] = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'])[source]
Format cistromes per TF
- Parameters:
cistrome_annotation (List, optional) – Annotation to use for forming cistromes. It can be ‘Direct_annot’ (direct evidence that the motif is linked to that TF), ‘Motif_similarity_annot’ (based on tomtom motif similarity), ‘Orthology_annot’ (based on orthology with a TF that is directly linked to that motif) or ‘Motif_similarity_and_Orthology_annot’. Default: [‘Direct_annot’, ‘Motif_similarity_annot’, ‘Orthology_annot’, ‘Motif_similarity_and_Orthology_annot’]
- pycistarget.motif_enrichment_homer.homer_results(homer_dict, name, results='known')[source]
A function to show Homer results in jupyter notebooks.
- Parameters:
Homer_dict (Dict) – A dictionary with one
Homer
object per slot.name (str) – Dictionary key of the analysis result to show. Default: None (All)
results (str) – Whether to show know or de novo results. Default: ‘known’
- pycistarget.motif_enrichment_homer.run_homer(homer_path: str, region_sets: Dict[str, PyRanges], outdir: str, genome: str, size: str = 'given', mask: bool = True, denovo: bool = False, length: str = '8,10,12', n_cpu: int = 1, meme_path: str | None = None, meme_collection_path: str | None = None, path_to_motif_annotations: str | None = None, annotation_version: str = 'v9', cistrome_annotation: List[str] = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'], motif_similarity_fdr: float = 0.001, orthologous_identity_threshold: float = 0.0, **kwargs)[source]
Run Homer
- Parameters:
homer_path (str) – Path to Homer bin folder.
region_sets (Dict) – A dictionary of PyRanges containing region coordinates for the region sets to be analyzed.
outdir (str) – Path to folder to output Homer results.
genome (str) – Homer genome label to use.
size (str, optional) – Fragment size to use for motif finding. Default: ‘given’ [uses the exact regions you give it]
mask (bool, optional) – Whether to mask repeats or not. Default: True
denovo (bool, optional) – Whether to infer overrepresented motifs de novo. Default: False
length (str, optional) – Motif length values. Default: 8,10,12
n_cpu (int) – Number of cores to use.
meme_path (str, optional) – Path to meme bin folder. Meme will be used if given for motif annotation. Default: None
meme_collection_path (str, optional) – Path to motif collection (in .cb format) to compare homer motifs with. Default: None
path_to_motif_annotations (str, optional) – Path to motif annotations. If not provided, they will be downloaded from https://resources.aertslab.org based on the specie name provided (only possible for mus_musculus, homo_sapiens and drosophila_melanogaster). Default: None
annotation_version (str, optional) – Motif collection version. Default: v9
cistrome_annotation (List, optional) – Annotation to use for forming cistromes. It can be ‘Direct_annot’ (direct evidence that the motif is linked to that TF), ‘Motif_similarity_annot’ (based on tomtom motif similarity), ‘Orthology_annot’ (based on orthology with a TF that is directly linked to that motif) or ‘Motif_similarity_and_Orthology_annot’. Default: [‘Direct_annot’, ‘Motif_similarity_annot’, ‘Orthology_annot’, ‘Motif_similarity_and_Orthology_annot’]
motif_similarity_fdr (float, optional) – Minimal motif similarity value to consider two motifs similar. Default: 0.001
orthologous_identity_threshold (float, optional) – Minimal orthology value for considering two TFs orthologous. Default: 0.0
**kwargs – Extra parameters to pass to ray.init().
References
Heinz S, Benner C, Spann N, Bertolino E et al. Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities. Mol Cell 2010 May 28;38(4):576-589. PMID: 20513432
Cluster-Buster
- pycistarget.cluster_buster.cluster_buster(cbust_path: str, path_to_motifs: str, region_sets: Dict[str, PyRanges] | Dict[str, List] | None = None, path_to_genome_fasta: str | None = None, path_to_regions_fasta: str | None = None, n_cpu: int | None = 1, motifs: List[str] | None = None, verbose: bool | None = False, **kwargs)[source]
Add motif annotation
- Parameters:
cluster_buster_path (str) – Path to cluster buster bin.
path_to_motifs (str, optional.) – Path to motif collection folder (in .cb format). Only required if using a shuffled background.
region_sets (Dict) – A dictionary of PyRanges containing region coordinates for the regions to be analyzed. Only required if path_to_regions_fasta is not provided.
path_to_genome_fasta (str, optional.) – Path to genome fasta file. Only required if path_to_regions_fasta is not provided. Default: None
path_to_regions_fasta (str, optional.) – Path to regions fasta file. Only required if path_to_genome_fasta is not provided. Default: None
n_cpu (int, optional) – Number of cores to use
motifs (List, optional) – Names of the motif files to use (from path_to_motifs). Default: None (All)
verbose (bool, optional) – Whether to print progress to screen
**kwargs – Additional parameters to pass to ray.init()
References
Frith, Martin C., Michael C. Li, and Zhiping Weng. “Cluster-Buster: Finding dense clusters of motifs in DNA sequences.” Nucleic acids research 31, no. 13 (2003): 3666-3668.
Utils
- pycistarget.utils.coord_to_region_names(coord: PyRanges)[source]
Convert coordinates to region names (UCSC format)
- pycistarget.utils.get_TF_list(motif_enrichment_table: DataFrame, annotation: List[str] = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'])[source]
Get TFs from motif enrichment tables
- pycistarget.utils.get_cistromes_per_region_set(motif_enrichment_region_set, motif_hits_regions_set, annotation: List[str] = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'])[source]
Get (direct/extended) cistromes for TFs
- pycistarget.utils.get_motifs_per_TF(motif_enrichment_table: DataFrame, tf: str, motif_column: str, annotation: List[str] = ['Direct_annot', 'Motif_similarity_annot', 'Orthology_annot', 'Motif_similarity_and_Orthology_annot'])[source]
Get motif annotated to each TF from a motif enrichment table
- pycistarget.utils.get_position_index(query_list, target_list)[source]
Get position of a query within a list
- pycistarget.utils.inplace_change(filename, old_string, new_string)[source]
Replace string in a file
- pycistarget.utils.load_motif_annotations(specie: str, version: str = 'v9', fname: str | None = None, column_names=('#motif_id', 'gene_name', 'motif_similarity_qvalue', 'orthologous_identity', 'description'), motif_similarity_fdr: float = 0.001, orthologous_identity_threshold: float = 0.0)[source]
Load motif annotations from a motif2TF snapshot.
- Parameters:
specie – Specie to retrieve annotations for.
version – Motif collection version.
fname – The snapshot taken from motif2TF.
column_names – The names of the columns in the snapshot to load.
motif_similarity_fdr – The maximum False Discovery Rate to find factor annotations for enriched motifs.
orthologuous_identity_threshold – The minimum orthologuous identity to find factor annotations for enriched motifs.
- pycistarget.utils.region_names_to_coordinates(region_names: List)[source]
Convert region names (UCSC format) to coordinates (pd.DataFrame)
- pycistarget.utils.region_sets_to_signature(region_set: list, region_set_name: str)[source]
Generates a gene signature object from a dict of PyRanges objects
- Parameters:
pr_region_set – PyRanges object to be converted in genesignature object
region_set_name – Name of the regions set