闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EEEM030: Speech & Audio Processing & Recognition

FHEQ Level 7 Examination

LSA 2019/0

Section A

A1.

(a) Describe the masking effect in human hearing. Explain how the masking effect could be used for lossy compression of speech. [20 %]

(b) A vocal tract filter is described by the following difference equation:

S[n] = μ ∙ S[n − 2] + u[n],

where S[n] is the produced speech signal, u[n] is the excitation signal from the vocal cord, n is the discrete time index, and μ is a coefficient constant. Assuming μ in terms of Table 1,

Second rightmost digit of student number (URN)	μ	F
0	-0.36	8
1	-0.04	10
2	-0.25	22
3	-0.09	16
4	-0.16	32
5	0.25	12
6	0.49	18
7	-0.81	14
8	0.64	20
9	0.09	24

Table 1 The values of the parameters specified with respect to your student number (URN). Example: if your URN is “6789012”, the second rightmost digit is “1”, so you will use: μ = −0.04, and F = 10.

(i) Determine the transfer function of this filter,i.e. the z-transform H(z), and the causal impulse response ℎ(n). Write down the calculation process for how your answer is reached. [20 %]

(ii) Sketch the pole-zero plot of H(z) and the magnitude spectra |H(幼)|. [20 %]

(iii) Assuming the sampling rate is F kHz (find the value in terms of your URN), estimate

the formant frequencies (in Hz) that correspond to the poles displayed in above pole- zero plot. Detail the calculation process for how your answer is reached. [20 %]

(iv) Suppose the sign of μ is reversed, for example, changing from μ = −0.04 to μ =

0.04 if the second rightmost digit of URN is “ 1”, or changing from μ = 0.64 to

μ = −0.64 if the second rightmost digit of your URN is “8” . Re-sketch the pole-zero plot of H(z) and the magnitude spectra |H(幼)|, and compare them with those plots in (ii). Explain how youreach the answer. [20 %]

(a) A signal containing a quasi-stationary segment of a vowel is band-limited using an ideal low- pass filter with a cut-off frequency at the Nyquist frequency, such that the power spectral density of the signal contains peaks at the first M harmonics of the vowel, and higher harmonics are cut off. Assume the sample rate is F kHz. Calculate the frequency difference between the first and the tenth harmonic, using the parameter values corresponding to your student number (URN), as given in Table 2. [30 %]

Second rightmost digit of student number (URN)	M	F
0	20	8
1	22	8
2	50	22
3	38	16
4	90	32
5	28	12
6	36	18
7	30	14
8	55	20
9	62	24

Table 2 The values of the parameters specified with respect to your student number (URN). Example: if your URN is “6789012”, the second rightmost digit is “1”, so you will use: M = 22, and F = 8.

(i) The real cepstrum of a quasi-stationary segment of a vowel of length N samples is computed, and prominent peaks of decreasing amplitude exist at quefrency bins τ = P, 2P, 3P, …. Assuming the sampling frequency is F kHz, what is the average pitch frequency of the vowel in Hz? The value ofF can be found from Table 3 in terms of your student number (URN). Explain how you have reached your answer. [25 %]

(ii) Suppose the first four cepstral coefficients c[τ], τ = 0,1,2,3, derived from a frame of speech data, are specified in Table 3.

. What are the values of the final three cepstral coefficients c[N-3], c[N-2], and c[N- 1]? Explain how you have reached your answer. [20 %]

. Derive a Fourier series expression for the resulting log-magnitude spectral envelope of the vocal tract that gives rise to this data. Explain how you have reached your answer. [25 %]

Second rightmost digit your URN	P	F	c[0]	c[1]	c[2]	c[3]
0	40	8	6.0	- 1.2	- 1.8	2.5
1	45	8	-5.5	1.3	-2.4	0.9
2	90	22	4.9	2.6	1.8	-3.7
3	85	16	7.7	-2.2	-3.5	2.3
4	110	32	9.6	2.5	1.6	-3.5
5	60	12	-3.7	3.9	-2.7	2.9
6	80	18	-6.5	-4.6	1.4	- 1.3
7	70	14	8.2	1.7	-2.2	2.4
8	90	20	-8.4	5.2	-3.1	-2.2
9	120	24	6.5	-4.4	1.8	-3.1

Table 3 The values of the parameters specified with respect to your student number (URN). Example: if your URN is “6789012”, the second rightmost digit is “1”, so you will use: P = 45, F = 8, c[0] = -5.5, c[1] = 1.3, c[2] = -2.4, c[3] = 0.9.

Section B

B1.

(a) What dynamic programming method is typically employed for efficient decoding in

automatic speech recognition systems, and how does it differ from that used in training? [10 %]

(b) A tracking system is built for a small delivery enterprise using an ergodic (fully-connected) 3- state HMM whose observations are 2D continuous features formed from noisy visual and GPS tracking data. The three states represent the presence of red ‘R’ (i=1), green ‘G’ (i=2) or blue ‘B’ (i=3) delivery vans in the company’s loading bay respectively.

(i) Draw a state topology diagram for the model, including null states for entry and exit, considering the state-transition matrix A in Table B1.1. [15 %]

(ii) Using the state-transition matrix A in Table B1.1 and the output probability densities

bi(t) in Table B1.2 for the given observations, show that the cumulative likelihoods δt(i) at time t=2 are δ2(1)=0.0049 and δ2(2)=0.0034. [25 %]