This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.


class biolearn.data_library.DataLibrary(library_file=None, cache=None)#

Manages a collection of data sources for biomarkers research.

The DataLibrary class is responsible for loading, storing, and retrieving data sources. Data sources are defined in a library file and new sources can easily be added at runtime. Currently DNA methylation data from GEO is supported.

__init__(library_file=None, cache=None)#

Initializes the DataLibrary instance with an optional library file and cache mechanism.

  • library_file (str, optional) – The path to the library file. If None, the default biolearn library file is loaded.

  • cache (object, optional) – An object that adheres to the caching interface used in the caching module. If None, the default cache is used. This cache will be used by all returned data sources


Loads data sources from a given library file appending them to the current set of data sources.


library_file (str) – The file path of the library file to load data sources from.


Retrieves a data source by its identifier.


source_id (str) – The identifier of the data source to retrieve.


The data source with the given identifier if found, otherwise None.

lookup_sources(organism=None, format=None)#

Looks up data sources based on the specified organism and/or format.

  • organism (str, optional) – The organism to filter the data sources by.

  • format (str, optional) – The format to filter the data sources by.


A list of data sources that match the specified organism and format criteria.

Examples using biolearn.data_library.DataLibrary#

Training an ElasticNet model

Training an ElasticNet model