Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

CS5489 -Assignment 2 -Game Music Tagging

Due date: see Assignment 2 on Canvas

Goal

In this assignment, the task is to annotate or tag a music clip with descriptive (semantic) keywords. This kind of content-based tagging system could be useful to musicians and sound engineers who want to automatically

organize their sound library, or search for sound or music by keyword.

Dataset of 80s Game Music

This dataset contains video game music from an 80s-era programmable sound generator based on FM

(frequency modulation). The music was for Sega and MSX PC games. Each song is annotated with emotion tags. The data is publicly available -please do not search for it and cheat.

Methodology

Semantic annotation is a multi-label classification problem, where each label corresponds to one sound tag and is a binary classification problem. The labels can co-occur (multiple labels can be assigned to the same sound), which makes it different from multi-class classification (where only one label can be assigned). Sound is a

temporal process, so the important thing is how to define the feature space for representing the sound, before learning the binary classifiers. You are free to choose appropriate methods (e.g., feature extraction method,

dimensionality reduction, and clustering methods)to help define a suitable feature space for sound annotation. You are free to use methods that were not introduced in class, as long as you present the details in the report.  You can also consider the co-occurence of the labels to help with the multi-label classification.

Evaluation of Tagging

For evaluation, you will predict the presencelabsence of tags for each test sound. The evaluation metric is     "Mean column-wise AUC". AUC is the area under the ROC curve, which plots FPR vs TPR."Mean column-   wise" computes the average of the AUCs for the tags. To compute AUC, you will need to predict the score of each label(e.g.,decision function value,probability, etc.)rather than the label.

Evaluation on Kaggle

You need to submit your test predictions to Kaggle for evaluation. 50% of the test data will be used to show your ranking on the live leaderboard. After the assignment deadline, the remaining 50% will be used to calculate

your final ranking. The entry with the highest final ranking will win a prize! Also the top-ranked entries will be asked to give a short 5 minute presentation on what they did.

To submit to Kaggle you need to create an account, and use the competition invitation that will be posted on

Canvas. You must submit your Kaggle account name to the "Kaggle Username" assignment on Canvas 1 week before the Assignment 2 deadline. This is to prevent students from creating multiple Kaggle accounts to gain     unfair advantage.

Note: You can only submit 2 times per day to Kaggle!

What to hand in

You need to turn in the following things:

1. This ipynb file Assignment2.ipynb with your source code and documentation. You should write about all the various attempts that you make to find a good solution. You may also submit python scripts as

source code, but your documentation must be in the ipynb file.

2. Your final csv submission file to Kaggle.

3. The ipynb file Assignment2-Final.ipynb ,which contains the code that generates the final submission file that you submit to Kaggle. This code will be used to verify that your Kaggle submission is

reproducible.

4.Your Kaggle username (submitted to the "Kaggle Username" assignment on Canvas 1 week before the Assignment 2 deadline)

Files should be uploaded to Assignment 2 on Canvas.

Grading

The marks of the assignment are distributed as follows:

· 45%- Results using various feature representations, dimensionality reduction methods, classification methods,etc.

· 30%-Trying out feature representations (e.g. adding additional features, combining features from different sources) or methods not used in the tutorials.

· 20%-Quality of the written report. More points for insightful observations and analysis.

· 5%-Final ranking on the Kaggle test data (private leaderboard). If a submission cannot be reproduced by the submitted code, it will not receive marks for ranking.

· Late Penalty: 25 marks will be subtracted for each day late.

Note: This is an individual assignment. Every student must turn in their own work!

In  [1]:

%matplotlib inline

impor mapob nne # eu utpu ag oma

matpotlib nlne.baknd nlineset maplotlb forats svg

impor mapotbpypo a pt

impor apotb

from nmpy mpor *

from kearn mpor *

from cpy mpor as

randomseed( 00

impor v

from cpy mpor o

impor pke

from Pthondspa mpor Audo dspla

import ospah

In  [2]:

def showAudo info

myfile = muscp3/ + nfo fnm' + mp3

if ospathexistsmyfle

display(Audio(myfile))

else:

prn “*** mp fe” +myf + "coud no b fou **

def oadpck le fnme

f = open fname rb

out = pickle.load(f)

f.close()

return out

Load the Data

The training and test data are stored in various pickle files. Here we assume the data is stored in the musicdata directory. The below code will load the data, including tags and extracted features.

In  [3]:

tran tag = load pcke m uscdata/tra agpicke3

tran mfccs = oadpicke m usicdata/tranmfccspckle3'

tranmels = oadpicke 'musicdata/ra melspicke3

tran no = oadpicke mu scdaa/tran nopcke3

= oad pcke musicdata/tes mccspickle3'

= oa pckle musicdaa/testmelspicke3

= oad pickle( uscaa/test infopcke3

Here are the things in the dataset:

· train info - info about each sound in the training set.

· train mels -the Mel-frequency spectrogram for each sound in the training set. Mel-frequency is a logarithmically-transformed frequency with better perceptual distance. More details here

(https://towardsdatascience.com/learning-from-audio-the-mel-scale-mel-spectrograms-and-mel-frequency- cepstral-coefficients-f5752b6324a8).

· train mfccs -MFCCs(Mel-frequency cepstrum coefficients) are dimensionality-reduced version of the   Mel-frequency spectrogram. Specifically, the log is applied to the magnitudes, and then a Discrete Cosine

Transform is applied at each time.

· train tags -the descriptive tags for each sound in the training set.

· test  info    -info about each sound in the test set.

·   test mels -the Mel Spectrogram for each sound in the test.

· test  mfccs -the MFCC features for each sound in the test.

Here is the one song in the training set, as well as the tags and other info. To play the audio, we assume the

mp3s are available in the musicmp3 directory.

In    [4]:

i = 4

showAudio(train info[ii])

print tran tags[i)

prnt tran nfoi)

0:00/0:00

['fluttered',’calm']

{'id':'eb7jboiu','tags':['fluttered','calm'],'top tag':'fluttered','fname':

'eb7jboiu'}

Here is the Mel-frequency spectrogram, which shows the frequency content over time. The spectrogram is

stored in an B x T matrix, where B is the number of bins,and T is the temporal length.The left plot shows

the original Mel spectrogram (with time increasing to the right). The right plot shows the log magnitude, which can better visualize the differences. Here we use B=128 Mel-bins.

In    [5]:

print tranmels[ishape

ptfgurefigsze= 73

pltsubplot( 121

plt.imshow(train mels[ii].T);

pltxlabel 'time

pltylabel 'me bin )

pttitle 'Me pectrogram

pltsubplot( 122

plt.imshow(log(train mels[ii].T))

ptxabel tme

pltylabel mel bin )

ptttle log Me pectrogram

plt.tight layout()

(143,128)



MFCCs are a dimensionality-reduced version of the Mel-spectrogram. To get the MFCC, the Discrete Cosine

Transform (DCT) is applied to each 128-dim log-Mel bin vector. Here we use 20-dimension DCT, so the 128-dim vector is convereted to 20-dim in each time step. The left plot shows the MFCCs as an image, while the right plots the individual dimensions over time.

In    [6]:

prnt tranmfccsiishape

pltfigurefigsize= 83

pltsubplot( 121

plt.imshow(train mfccs[ii].T)

pltxlabel time )

pltylabel mfcc bin'

ptsubpot 2,2

plt.plot(train mfccs[ii])

pltxlabel( 'tme

pltyabe 'mfcc vaue )

plt.tight layout()

(143,20)



Data Pre-processing -Delta MFCCs

The first thing you might notice is that the MFCC vectors are time-series. One trick to include time-series

information into a vector representation is to append the difference between two consecutive feature vectors.

This way, we can include some relationship between two time steps in the representation.

In    [7]:

# cmpue dea MFCs

def comput deltamfccs mfccs

dmfccs =[]

for m n mccs

tmp = m:-m 0 -1

dm = hsakm 0 -tmp

dmfccs.append(dm)

return mfcs

In    [8]:

train dmfccs     =     compute delta mfccs(train mfccs)

test dmfccs     =  &nbs