Note
This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.
biolearn.data_library.GeoData¶
- class biolearn.data_library.GeoData(metadata, dnam=None, rna=None, protein_alamar=None, protein_olink=None)¶
Represents genomic data with a focus on metadata and methylation data.
GeoData facilitates the organization and access to metadata and methylation data.
- metadata¶
A pandas DataFrame where rows represent different samples and columns represent different data fields.
- Type:
DataFrame
- dnam¶
A pandas DataFrame where columns represent different samples and rows represent different methylation sites.
- Type:
DataFrame
- __init__(metadata, dnam=None, rna=None, protein_alamar=None, protein_olink=None)¶
Initializes the GeoData instance.
- Parameters:
metadata (DataFrame) – Metadata associated with genomic samples.
dnam (DataFrame) – Methylation data associated with genomic samples.
- copy()¶
Creates a deep copy of the GeoData instance.
- Returns:
A new instance of GeoData with copies of the metadata and dnam DataFrames.
- Return type:
- quality_report(sites=None)¶
Generates a quality control report for the genomic data, optionally filtered by specified methylation sites, and includes a detailed section reporting the missing percentage for each methylation site.
- Parameters:
sites (list, optional) – A list of methylation site identifiers to include in the report. If None, all sites are included.
- Returns:
- An object containing both detailed methylation data, a summary,
and a detailed section for missing percentages per site.
- Return type:
QualityReport
- classmethod from_methylation_matrix(matrix)¶
Creates a GeoData instance from a methylation matrix which can be either a DataFrame directly or a path to a CSV file.
- save_csv(folder_path, name)¶
Saves the GeoData instance to CSV files according to the DNA Methylation Array Data Standard V-2410.
- classmethod load_csv(folder_path, name, series_part='all', validate=True)¶
Loads a GeoData instance from CSV files according to the DNA Methylation Array Data Standard V-2410.
- Parameters:
folder_path (str) – The directory where the files are located.
name (str) – The base name for the files.
series_part (str or int) – “all” to load all methylation parts and concatenate; otherwise, an integer specifying the part number to load.
validate (bool) – Whether to validate metadata-omics consistency. Default is True.
- Returns:
A GeoData instance with metadata, methylation data, RNA, and protein data loaded.
- Return type: