Note
This page is a reference documentation. It only explains the class signature, and not how to use it. Please refer to the user guide for the big picture.
biolearn.data_library.DataSource#
- class biolearn.data_library.DataSource(source_definition, cache=None)#
Represents a single data source in the DataLibrary.
This class encapsulates the details of a data source including metadata about the source and functionality to load the data.
- Raises:
ValueError – If any of the required fields (‘id’, ‘path’, ‘parser’) are missing during initialization.
- __init__(source_definition, cache=None)#
Initializes the DataSource instance with configuration data and an optional cache mechanism. This method parses a dictionary typically loaded from a YAML configuration file for a data source. It checks for essential fields, sets up attributes, and configures a parser for data handling.
- Parameters:
source_definition (dict) – A dictionary containing the data source’s properties. Must include keys like ‘id’, ‘path’, ‘parser’, and optionally ‘title’, ‘summary’, ‘format’, and ‘organism’.
cache (object, optional) – An object that adheres to the caching interface used in the caching module. If no cache is provided, a default NoCache instance is used.
- Raises:
ValueError – If any of the required fields (‘id’, ‘path’, ‘parser’) are missing in the input data.
- REQUIRED_FIELDS = {'id': "'id' key is missing in item", 'parser': "'parser' key is missing in item", 'path': "'path' key is missing in item"}#
- load()#
Loads the data from the source. :returns: An instance of the GeoData class containing the parsed geographical data. :rtype: GeoData
Examples using biolearn.data_library.DataSource
#
“Epigenetic Clocks” in GEO Data