Note

This page is a reference documentation. It only explains the function signature, and not how to use it. Please refer to the user guide for the big picture.

biolearn.imputation.hybrid_impute#

biolearn.imputation.hybrid_impute(dnam, cpg_source, required_cpgs, threshold=0.8)#

Imputes missing values in a DNA methylation dataset based on a threshold. Sites with data below the threshold are replaced from an external source, while others are imputed using the average of existing values.

Parameters:
  • dnam (pd.DataFrame) – DataFrame with samples as columns and CpG sites as rows.

  • cpg_source (pd.Series) – Series containing reference values for CpG sites.

  • required_cpgs (list of str) – List of CpG sites that need to be in the final dataset.

  • threshold (float, optional) – Threshold for determining imputation strategy. Default is 0.8.

Returns:

DataFrame with missing values filled.

Return type:

pd.DataFrame

Raises:

ValueError – If certain required CpG sites are missing from both the dataset and the cpg_source.