闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Statistics 4H Projects

Session 2021-2022

Statistical modelling of correlated count data from biological experiments (Epi B)

BriefDescription ofProject

With the advent of powerful high-throughput experimental technologies over the last two decades, it is becoming common for biology labs to generate massive datasets, in the form of long sequences of counts, in their quest to determine features of biological interest within the human genome. For instance, the methylation of DNA is a biological process that leads to certain parts of the DNA being better protected against mutations that could lead to loss of function or even the onset of cancer, and is thus of great interest to medical scientists. The study of DNA methylation often necessitates collecting measurements on the same locations of DNA from different people, and due to biological constraints on the genome, various data sequences on the same set of genomic locations is often highly correlated, making it more difficult to determine any hidden patterns.

Most current models in use for biological sequence data assume data sequences are independent, potentially leading to incorrect inference. This project will focus on data generated from a particular type of sequencing experiment, called bisulfite sequencing, designed to study patterns of DNA methylation in the human genome. Various types of models for correlated count data will be considered and assessed, to determine an appropriate set of models (and assumptions) for data generated in such experiments. Finally, statistical inferences from the model(s) will be validated by comparisons to current biological knowledge.

Key Questions ofInterest

What kind of statistical models are appropriate for data generated from experiments to study DNA methylation?

How can correlations between sequences of counts be incorporated into such models?

Can such models usefully differentiate regions having specific biological functions?

Analysis Summary

What level of difficulty do you think the project will have for the typical student?

Moderate/Difficult

Is any Programming/Simulation required? Yes

Please specify the statistical techniques which the project is likely to require, and

Coding and implementing probability functions for non-standard distributions in R.

Depending on the student, implementing techniques for either maximum likelihood estimation or Bayesian estimation (using MCMC) of model parameters.

Please specify the statistical techniques which the project is likely to require, and any that are essential (since combined and WP(5) may not have covered them, since they have options):

Probability and inference

Bayesian Statistics/Advanced Bayesian Methods

Statistics 4H Projects Session 2021-2022