Note

This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.

biolearn.data_library.DataSource#

class biolearn.data_library.DataSource(source_definition, cache=None)#

Represents a single data source in the DataLibrary.

This class encapsulates the details of a data source including metadata about the source and functionality to load the data.

Raises:

ValueError – If any of the required fields (‘id’, ‘path’, ‘parser’) are missing during initialization.

__init__(source_definition, cache=None)#

Initializes the DataSource instance with configuration data and an optional cache mechanism. This method parses a dictionary typically loaded from a YAML configuration file for a data source. It checks for essential fields, sets up attributes, and configures a parser for data handling.

Parameters:
  • source_definition (dict) – A dictionary containing the data source’s properties. Must include keys like ‘id’, ‘path’, ‘parser’, and optionally ‘title’, ‘summary’, ‘format’, and ‘organism’.

  • cache (object, optional) – An object that adheres to the caching interface used in the caching module. If no cache is provided, a default NoCache instance is used.

Raises:

ValueError – If any of the required fields (‘id’, ‘path’, ‘parser’) are missing in the input data.

REQUIRED_FIELDS = {'id': "'id' key is missing in item", 'parser': "'parser' key is missing in item", 'path': "'path' key is missing in item"}#
load()#

Loads the data from the source. :returns: An instance of the GeoData class containing the parsed geographical data. :rtype: GeoData

Examples using biolearn.data_library.DataSource#

“Epigenetic Clocks” in GEO Data

"Epigenetic Clocks" in GEO Data