Note
This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.
biolearn.data_library.DataLibrary¶
- class biolearn.data_library.DataLibrary(library_file=None, cache=None)¶
Manages a collection of data sources for biomarkers research.
The DataLibrary class is responsible for loading, storing, and retrieving data sources. Data sources are defined in a library file and new sources can easily be added at runtime. Currently DNA methylation data from GEO is supported.
- __init__(library_file=None, cache=None)¶
Initializes the DataLibrary instance with an optional library file and cache mechanism.
- Parameters:
library_file (str, optional) – The path to the library file. If None, the default biolearn library file is loaded.
cache (object, optional) – An object that adheres to the caching interface used in the caching module. If None, the default cache is used. This cache will be used by all returned data sources
- load_sources(library_file)¶
Loads data sources from a given library file appending them to the current set of data sources.
- Parameters:
library_file (str) – The file path of the library file to load data sources from.
- get(source_id)¶
Retrieves a data source by its identifier.
- Parameters:
source_id (str) – The identifier of the data source to retrieve.
- Returns:
The data source with the given identifier if found, otherwise None.
- lookup_sources(organism=None, format=None)¶
Looks up data sources based on the specified organism and/or format.
- search(**criteria)¶
Search and preview metadata across all available datasets without loading them.
This method allows you to explore what datasets are available and their metadata characteristics before deciding which ones to load. It’s particularly useful for discovering datasets that match specific criteria like sex, age, or other metadata fields.
- Parameters:
criteria (keyword arguments) –
Keyword arguments for filtering datasets. Common filters include:
sex (str): Filter by sex (“male”, “female”, “unknown”)
min_age (float): Minimum age threshold
max_age (float): Maximum age threshold
- Returns:
A DataFrame with columns including ‘series_id’ and available metadata fields for each matching dataset.
- Return type:
Examples
>>> # Find all datasets with female subjects >>> library = DataLibrary() >>> female_datasets = library.search(sex="female")
>>> # Find datasets with elderly subjects (70+ years) >>> elderly_datasets = library.search(min_age=70)
>>> # Find male datasets with subjects over 50 >>> male_elderly = library.search(sex="male", min_age=50)
>>> # View available metadata fields >>> all_datasets = library.search() >>> print(all_datasets.columns.tolist())
Notes
Sex encoding follows the DNA Methylation Array Data Standard: - 0 = female - 1 = male - NaN = unknown/missing