Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML Analysis MC4R #185

Open
frehburg opened this issue Oct 25, 2023 · 4 comments
Open

ML Analysis MC4R #185

frehburg opened this issue Oct 25, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@frehburg
Copy link
Contributor

frehburg commented Oct 25, 2023

phenotype:

  • overweight < obesity class I < class II < class III
  • obesity normal = refuted phenotype
  • growth abnomarily = not recorded (i.e. not represented in PPs)

genotype:

  • interpretations -> genomic interpretations -> variantInterpretation -> variantDescriptor -> expressions ->
    • hgvs strings
    • allelic state -> GENO string
@frehburg frehburg added the enhancement New feature or request label Oct 25, 2023
@frehburg frehburg added this to the Analyse MC4R milestone Oct 25, 2023
@frehburg frehburg self-assigned this Oct 25, 2023
@frehburg
Copy link
Contributor Author

Genotyping:

Required manual validation and standardization.
ERKER form allowed mutations as free text or validated HGVS expressions.
Phenopackets required validated HGVS expressions.
Mutations and zygosity of MC4R-deficiency patients captured in an Excel sheet.
Mutations were manually standardized and validated according to HGVS nomenclature.
Varsome used to deduce possible mutations as HGVS expressions on genomic level (c.HGVS) using the reference genome hg38.
ClinVar was used to verify the mutations proposed by Varsome.
Data for MC4R-deficiency genotyping was organized into a CSV file with one patient per row, including mutations, zygosity, OMIM codes, and clinical significance.
Manual pre-processing excluded certain mutations from analysis.
A total of 105 mutations from 98 patients were identified, with some excluded due to various reasons.
Data was imported into the REDCap project as a single CSV file, processed in a Jupyter notebook, and mapped according to the ERKER genetic findings.
Phenotyping:

Required Human Phenotype Ontology (HPO) annotated standardization for data transformation.
Date of determination and status of each phenotype were necessary for deep phenotyping.
The ERKER form allowed the status of each phenotype to be captured as confirmed, refuted, or not recorded.
Disease-specific phenotyping involved categorizing obesity into overweight and class I-III obesity.
The World Health Organization (WHO) BMI to Age tables were converted to DataFrames for male and female patients.
MC4R-deficiency weight data were transferred to a CSV file and converted to a phenotyping DataFrame.
Date of determination was determined for each phenotype by adding age in months to the date of birth using Pandas DateTime functions.
Differentiation between confirmed, refuted, and not recorded phenotypes was based on HPO terms.
A Python function was developed to annotate BMI data to the respective HPO terms.
Obesity class, date, and status for each examination were determined, and new columns were created in the phenotyping data.
The phenotyping data was organized and written to the ERKER_v1.7 CSV file.
Future pre-processing steps were defined to convert data to FHIR (Fast Healthcare Interoperability Resources) using the MII-KDS core information model and ERKERonFHIR project.

@frehburg
Copy link
Contributor Author

Questions:

How probable is it per genotype to reach class 1,2,3 obesity with each genotype (vor dem xten lebensjahr)

How long does it take to reach the classes.

how likely is it to be diagnosed with obesity at all with each genotype

also do tests in reverse

@frehburg
Copy link
Contributor Author

frehburg commented Dec 4, 2023

  1. Descriptively, the number of (single) heterozygous and homozygous patients and its distribution of cHGVS mutations.
  2. The correlation between class 1-3 obesity until the preventive check up of U9 and c.HGVS mutations comparing the severity of obesity per mutation over time.
  3. The influence of the mutations and zygosity on the severity over time was displayed, determining the most severe zygosity and mutation.
  4. The combination of the most severe zygosity and mutation was analysed.

@frehburg
Copy link
Contributor Author

frehburg commented Dec 5, 2023

New df rows =mutations
Percentage zygosity
Percentage reach each obesity stage
Avg time to get there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant