Statistical Genetics (STATS4074)
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Statistical Genetics (STATS4074)
1. Resistance to infection of humans by a certain virus is determined by a biallelic locus with alleles G and g. The allele g provides resistance whilst G does not, but G is dominant over g. 91% of individuals are not resistant.
(a) What is the proportion of g alleles in the population? What assumption do you
make? [1,1 MARKS]
(b) A father with resistance has both resistant and non-resistant children. What can
you say about the phenotype and genotype of the childrens’ mother?
[2 MARKS]
(c) Another father and mother have | (| 2 1) resistant children (and no non-resistant children). The father is resistant. What is the probability that the mother is resistant? Describe all the probability assumptions in your answer.
[4,3 MARKS]
(d) The value of the probability obtained in part (c) when | = 1 is equal to the allele frequency of g. Explain why. [2 MARKS]
(e) What is the limit of the probability in part (c) as | → N? Give an argument
leading to this value directly. [2 MARKS]
2. A new locus with an unusual mapping between genotype and phenotype has been dis- covered. It has three alleles (A, B and C). The homozygotes have distinct phenotypes (A for AA, etc.), whilst the heterozygotes all have another phenotype (H, say), so there are four phenotypes in total.
A sample of N diploids is classified by phenotype leading to NA , NB , NC and NH people in the respective phenotypes. The EM algorithm is to be used to attempt to find the maximum likelihood estimates of the proportions p.a.r of the alleles A, B and C, respectively.
(a) What are the ‘missing data’ in the problem? [1 MARK]
(b) Write down (without proof) expressions for the E-step update and the M-step
update, assuming that the locus is in Hardy–Weinberg equilibrium.
[3,3 MARKS]
(c) Suggest how you might initialize the parameters at the start of the algorithm.
[1 MARK]
(d) Propose a criterion to determine when to stop iterating the E- and M-steps.
[1 MARK]
(e) Suggest how you might reassure yourself that the algorithm, once applied to some
data, had not converged merely to a local maximum of the likelihood function.
[1 MARK]
(f) In a particular case, NA = 32, NB = 41, NC = 51, NH = 126. The EM algorithm
has been implemented and has produced the following estimates of the parameters: pˆ = 0.2824; aˆ = 0.3333; rˆ = 0.3843 (4 decimal places). Perform a likelihood-ratio test to show that there is strong evidence that the locus is not in Hardy–Weinberg equilibrium. [6 MARKS]
(g) By considering the pattern of the observed counts and the expected counts under
the null hypothesis in the test above, propose one explanation for this departure from Hardy–Weinberg equilibrium. [1 MARK]
3. Consider a genetic locus evolving by the Wright–Fisher model in a population of N diploid individuals and mutating at a rate μ per generation via the infinite alleles model. Let Gn be the homozygosity (the probability that two alleles chosen at random are the same) in generation |.
(a) Show, by considering the ways in which two random gene copies can be the same
in generation |, that Gn satifies the recurrence relation
Gn = (1 _ μ)2 ╱ + ┌ 1 _ ┐Gn − 1 、.
[4 MARKS]
(b) Show that, if μ is small so that terms of μ2 and higher can be neglected, and
N is large, so that terms in μ/N can also be neglected, the change, ∆佐, in heterozygosity between generation | _ 1 and generation | satisfies
∆佐 = _ 佐n − 1 + 2μ(1 _ 佐n − 1 ).
where 佐n − 1 is the heterozygosity in generation | _ 1. [3 MARKS]
(c) Interpret the two sources of change of heterozygosity on the right-hand side of the equation in (b). [2 MARKS]
4. (a) Two loci each have two alleles, A and a at locus 1 and B and b at locus 2. A is dominant over a, and B is dominant over b. In some population, both loci are in Hardy–Weinberg equilibrium and there is linkage equilibrium between the loci.
i. Carefully explain the difference between Hardy–Weinberg and linkage equi- libria.
[2 MARKS]
ii. The relative frequency of the phenotype corresponding to the a allele is 0.81 and that of the phenotype corresponding to b is 0.64. Calculate the proportion of the population that carries the two-locus genotype Ab/aB. [3 MARKS]
(b) Consider the following pedigree in which shaded individuals suffer from some dis-
ease. The genotypes of the individuals at an autosomal marker locus (with codom- inant alleles labelled 1 to 4) are shown.
22 14 33 44
34
14 14 24 13 13
i. Assuming that the normal allele d is dominant over the disease allele D, and neglecting the unaffected grandson (shown dotted), derive the likelihood of the recombination fraction θ between the disease and marker loci.
[5 MARKS]
ii. Still neglecting the grandson, perform a likelihood ratio test to test for linkage between the disease and marker loci. [4 MARKS]
iii. Write down the likelihood of θ when the grandson is included, and argue whether the maximum likelihood estimate is increased or decreased compared to the case before the grandson was included. [3,2 MARKS]
2022-05-03