闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

EEEM030: Speech & Audio Processing & Recognition

FHEQ Level 7 Examination

Semester 1 2021/2

Q1.

Assume the relationship between the produced speech signals(n) from the lips and the

excitation signal u(n) from the vocal cord can be described by the vocal tract filter using the following difference equation:

s[n] = 2a cos(β) s[n − 1] − a2 s[n − 2] + (bN )s[n − N]

−2a(bN ) cos(β) s[n − N − 1] + a2 (bN )s[n − N − 2] + u[n] − bu[n − 1]

where n is the discrete time index, and a, b, N, and β are the coefficients used to determine

the shape of the filter, and β is in radian. DenoteS(z) as the Z-transform of the speech

signals(n) and U(z) as the Z-transform of the excitation signal u(n). Assume the values of the above coefficients and the sampling rate F kHz are given in Table 1.1.

Second rightmost digit of student number (URN)	a	b	N	β	F
0	0.4	0.8	4	π/6	10
1	-0.4	0.3	4	4π/5	6
2	-0.3	0.5	4	π/4	12
3	0.7	0.6	4	π/8	9
4	-0.6	0.4	4	π/5	8
5	-0.3	0.5	4	π/4	12
6	0.7	0.6	4	π/8	9
7	-0.4	0.3	4	4π/5	6
8	0.4	0.8	4	π/6	10
9	-0.6	0.4	4	π/5	8

Table 1.1. The values of the parameters specified with respect to your student number (URN). For example, if your URN is “6789012”, the second rightmost digit is “1”, so you will use: a = −0.4, b = 0.3, N = 4, β = 4π/5, and F = 6 kHz.

(a) Derive the transfer function of the vocal tract filter by using Z-transform:

H(z) = [20 %]

(b) Calculate the poles and zeros of this filter. [20 %]

(d) Estimate the formant frequencies (in Hz) that correspond to the poles displayed in the above pole-zero plot. [15 %]

(e) Analyse the bounded input and bounded output (BIBO) stability of the system. If the

values of “ and bare both doubled, analyse the stability of the system again. [15 %]

(f) Re-estimate the formant frequencies that correspond to the poles when N = 6. [15 %]

Q2.

(a) The Mel-frequency scale was designed as a perceptual scale of pitches judged by listeners to be equal in distance from one another:

fmel = 1127.01048 ln(1 + )

A testis carried out where a listener is asked to judge the intervals between three different sinusoidal tones. It is judged that the following tones are subjectively equidistant in frequency: f1, f2, and f3 . Use the values off1 and f2 in Table 2.1, estimate f3 (in Hz), assuming f3 is the highest of the three tones. [20 %]

(b) Humans perceive sounds to arrive from a certain location.

(i) Using about 100 words, explain the roles of the interaural phase difference

(IPD) and interaural level difference (ILD) for human sound source localization over a broadband frequency range. [10 %]

(ii) The IPD can be calculated as Φ = 2 πfr(θ + (sin θ))/c. A sound source

emitting a pure tone atf is located at the angle θ from the median plane.

Using the values off and θ in Table 2.1, calculate the IPD for a listener with a head radius of r = 0.085 m, assuming that c = 344 m/s. Is the IPD cue reliable for source localization? [20 %]

Second rightmost digit of student number (URN)	f1 (Hz)	f2 (Hz)	f (Hz)	θ (degrees)
0	510	970	3320	40
1	1030	1780	1400	30
2	1340	2230	2660	50
3	1030	1780	1400	30
4	510	970	3320	40
5	1340	2230	2660	50
6	1030	1780	1400	30
7	510	970	3320	40
8	1340	2230	2660	50
9	510	970	3320	40

Table 2.1. The values of the parameters specified with respect to your student number (URN). For example, if your URN is “6789012”, the second rightmost digit is “1”, so you will use: f1 = 1030 Hz, f2 = 1780 Hz, f = 1400 Hz, and θ = 30 degrees.

A set of domain-specific language models is developed as part of a restaurant recommendation service in the USA. Based on thousands of utterances recorded during trials, the raw 1-gram and 2-gram counts have been used to obtain corresponding unigram and bigram models, L0(1) and L0(2), with the probabilities shown in Tables 2.2 and 2.3, respectively.

w	I	want	to	eat	Chinese	food	lunch	spend		<\s>
P(w)	0.0418	0.0153	0.0123	0.0398	0.0026	0.0180	0.0056	0.0046	0	0.1538

Table 2.2. Unigram language model L0(1) for the restaurant recommendation service based on raw counts. Key: unknown, <\s> sentence end.

W	I	want	to	eat	Chinese	food	lunch	spend	<UNK>	<\s>
P(w\|)	0.2500	0.0153	0.0399	0.0123	0.0026	0.0180	0.0056	0.0046	0	0
P(w\|I)	0.0020	0.3265	0	0.0036	0	0	0	0.0008	0	0.0122
P(w\|want)	0.0022	0	0.6559	0.0011	0.0065	0.0065	0.0054	0.0011