关键词 > 520.445/645

520.445/645 Audio Signal Processing Fall 2023 PROJECT 2

发布时间:2023-11-07

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

520.445/645 Audio Signal Processing

Fall 2023

PROJECT 2

(Due 11/14/23 at 11:59pm)

Part 1:

You are asked to design a communication channel using the LPC compression scheme.

You are given 10 audio samples to evaluate your system. Your system should include:

(a) Compression of the audio signal into as few coefficients as you can so that it can be transmitted over the communication channel. Your transmitter will estimate the prediction parameters ak and gain G for the speech signal on a frame-by-frame basis. If you use a builtin function (in MATLAB or Python), please indicate that in your report and comment it in your code.

(b) Using the parameters estimated in part 1a, together with the voicing information provided, you need to re-synthesize the original sentence. Save your resulting synthesis in a file called sampleX received.wav. For each of the 10 wavefiles, you are provided with a .txt file that contains an estimate of the fundamental frequency every 10msec. Note, you do not have to use 10msec as your frame rate.

The ultimate goal of this project is to achieve a compromise between lowest com-pression vs. quality of the resynthesized speech. Your final deliverable are recon-structed waveforms with the best possible quality, using the least amount of coefficients (i.e. filter coefficients, voicing information, pitch estimates, etc). Your report MUST include an estimate in bits/second or samples/second of how much information is required on average to reconstruct your signal. We will assess signal quality as well as signal compression.

Note: The files X.txt contain estimates of voiced/unvoiced decision and pitch fre-quencies at a frame rate of 100 frames/sec. A zero value indicates that the speech frame is unvoiced or silence, and a non-zero value is the pitch frequency estimate (in Hz) when the frame is voiced. Note that these estimates are obtained from a correla-tion analysis of the signal. They are only approximations, and may not be accurate for every frame (you will examine this fact in Part 2). In Part 1, you can chose to work with the pitch estimations provided or with your own implementation from Part 2.

Part 2:

Implement your own pitch detector based on either the signal’s autocorrelation, the LPC residual signal, or any other method. Experiment with the pitch detection method as well as how often you estimate pitch (i.e. frame rate). Does it improve the quality of your reconstructed signals? Your deliverable is a stand-alone script, which takes as input a sound vector, sampling rate, frame length and frame overlap, and returns an array of pitch values for each frame and zero for unvoiced and silence frames. Note that pitch values given in Part 1 are just estimations and should not be considered as absolute truths.

Part 3:

Record your own voice saying the sentence “The synthesized signal is supposed to be of high standard”. Send this signal through your communication channel developed in Part 1 and listen to the synthesized signal. Now, can you optimize the choice of parameters specifically for this sentence to improve compression and quality. Note this sentence has a large number of fricatives, specifically the phoneme /s/. What do you need to change in your parameters to improve compression/quality? Discuss your choices in your report and save the improved synthesized sentence. In order to estimate the pitch/voicing information for this specific speech sentence, you need to use your pitch estimator from Part 2.

Report

Write a report (max 8 pages) describing how you programmed the LPC Vocoder (e.g., decisions made on frame length, excitation generation, frame overlaps, etc.), along with any graphics/plots. Do not include code in your report.

Few pointers:

1. Work on this project individually.

2. A number of factors may affect the quality of your synthesized speech. For instance, what goes on in a given frame is not independent of what happened in previous frames. As the pitch period changes, you will need to know where the last pitch impulse occurred in the previous frame so as to determine the location of the next impulse in the current frame. You should also examine what is the benefit of using different glottal shapes for your voiced segments.

3. You can change the vocal tract filter once per frame, or you can interpolate between frames. The voicing information is provided to you at a rate of 10msec but you should explore the frame rate that works best.

4. Listen to your synthesized speech and see if you can isolate what are the main sources of distortion.

5. If you wish, you could use an automated quality metric to evaluate the quality of your synthesized signal. Documentation and Matlab code for this metric (PESQ: Perceptual Evaluation of Speech Quality) are included with the project docu-ments. A python implementation can be found here: https://pypi.org/project/pesq/.

You are not required to used this metric. It is up to you to decide how you eval-uate improvements of signal quality as you resynthesize your audio.