闪电代写 -代写CS作业_CS代写_Finance代写_Economic代写_Statistics代写_代码代做_IT代写_加急帮助_代写数学

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DEPARTMENT OF AUTOMATIC CONTROL & SYSTEMS ENGINEERING
Autumn Semester 2020‒21
ACS6427 DATA MODELLING AND MACHINE INTELLIGENCE
1. a) Show that the least square estimate for 0a in the simple linear regression problem
0y a ε= +
from n observations of response y, i.e., 1, , ny y is the average of these observations.
Here ε is the modelling error of a zero mean.
[5 marks]
b) Consider a linear regression problem where only one predictor x1 is involved and the
relationship between the predictor and response y is
1y cx ξ= +
in which ξ is a NONZERO mean modelling error.
A set of 5 observations of 1 and y x are in the table below.
i 1 2 3 4 5
iy 3.08 4.09 5.01 6.09 7.06
1ix 2 3 4 5 6
i) Find the least square estimate of parameter c from the observed predictor and
response data.
[8 marks]
ii) Determine the Total Sum of Squares (TSS), Residual Sum of Squares (RSS),
Explained Sum of Squares (ESS) of the estimated linear regression model,
respectively.
[3 marks]
iii) Find the 2R statistic of the estimated linear regression model and assess the
model accuracy using the 2R statistic.
[4 marks]
2. a) A logistic function-based two class classifier has been determined as
( )0.1 0.2
1ˆ
1 x
y
e− +
=
+
i) Find the probability for classification result "1"y = estimated from this logistic
classifier when x = ‒3, ‒2, 0, 7, and 10, respectively.
[2 marks]
ii) Assume the true response y is as shown in the following table when x = ‒3, ‒2, 0,
7, and 10, respectively, and that the threshold T for the logistic classifier is T = 0.5.
x ‒3 ‒2 0 7 10
y 0 0 1 0 1
Find the sensitivity, specificity, false negative rate, and false positive rate of the
classifier, respectively
[4 marks]
iii) Show a sketch of the ROC curve of the classifier using the sensitivity and specificity
when T is chosen as T = 1, T = 0.5, and T = 0, respectively, and explain why the
AUC of the ROC curve is often needed to assess the performance of a classifier.
[4 marks]
b) A set of 10 observations of predictors xi=(xi1, xi2 ), i=1,…,10, are collected and shown
in the following table. The same observations are plotted in Figure 2.1 (overleaf).
xi x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
xi1 1 1.5 2 2.5 2.5 2.7 3 4 5 6
xi2 2 0.8 2.5 1 4 2 3.5 2.4 3.6 2
Figure 2.1
In order to apply K-mean clustering with K=2 to these observations, at the first step, two
centroids are randomly selected as (2,1.5) and (4, 3.5), respectively.
(i) Apply the principle in the second step of K-mean clustering to cluster the 10
observations to two subgroups C1 and C2 and show which of the 10 observations
are in C1 and which in C2.
[4 marks]
(ii) Find the centroids of C1 and C1 determined in part (i).
[2 marks]
(iii) Find the updated subgroups C1* and C2* using the centroids determined in part (ii)
and show which of the 10 observations are in C1* and which in C2*.
[4 marks]
3. a) As part of a project, you are performing a text mining experiment, looking for the term
“Karhunen–Loève”. This has involved investigating 5 different data sources, {,,,,}, of which the term appears only in document {}. Based on all documents
the TFIDF is 2.796. What is the term frequency for Document {}?
[2 marks]
b) In black-box modelling we rely on the data to help produce the models we will build.
What elements of this data must we consider as we begin to build our model? In this
context explain the requirement for cross-validation. Provide pseudo-code to outline the
process of k-fold cross validation, explaining each step and showing how the process
would change for different values of k.
[6 marks]
c) Data modelling and machine learning algorithms are often deployed in complex and
challenging scenarios. The trained algorithms will often perform poorly, even though this
may not have been intended at the design stage. As part of a newly developed team
working for a small technology-focussed company, you have been asked to develop an
algorithm to identify and predict good candidates for interview from their submitted
Curriculum Vitae (CV). Discuss how you might tackle this problem in order to ensure that
the system you develop is robust to presentation of a variety of candidates.
[12 marks]
4. a) A dataset has been provided for you to analyse. You are concerned that the dataset may
not have been presented optimally and wish to investigate this further. The dataset has a
Covariance Matrix,
= �11 −5 2−5 9 −32 −3 8 �
which produces an eigenvector matrix,
= �−0.5384 0.6934 0.4789−0.0912 −0.6129 0.78490.8378 0.3789 0.3932�
and set of eigenvalues,
= � 7.041216.51214.4468 �
(i) Discuss the application of Principal Component Analysis (PCA) to this dataset, and
explain what the application of PCA would achieve. Explain the geometrical
relationship between the principal components.
[3 marks]
(ii) Which direction vector listed for this dataset gives the first principal component of the
data? Discuss why this is the case. How much variance is contained within each
principal component?
[5 marks]
b) As part of training a two input ({1, 2}) logistic classifier, you believe that the performance
is not fit for purpose.
(i) What steps would you take to incorporate non-linearity into the decision boundary
for the model? Show how a cubic function might be implemented.
[3 marks]
(ii) Describe the issues that may arise through the implementation of an arbitrary-
shaped decision boundary
[2 marks]
c) A set of data relating pressure and flow rate in a mains water system has been provided
in Table 4-1. From basic theoretical considerations you have determined that the two
variables are linked by a non-linear model of the form = 12.
Table 4-1: Data for Q4(c)
Flow Rate,
F (m3/s)
0.436 0.586 0.614 0.659 0.764 0.9467 0.9854 1.07
Pressure,
P (Pa)
75842 117211 137895 172369 275790 298705 356210 379212
(i) The model being considered for the problem is non-linear. Show that a linearisation
can take place, and provide the new variables for the linear model.
[2 marks]
(ii) Solve for optimal values of weights within this linearised model, and thus provide
values for model parameters that are to be estimated: 1 and 2.
[5 marks]

2026-02-02

Java

物理(Physical)

LINUX

C++

Python

Processing

sas

ios

maths

maple

C语言

R语言

Internet and World Wide Web

Principles of Programming Languages

sql

scheme

prolog

JavaScript

Haskell

essay

HDL

VBA

会计学(Accounting)

Rust

经济学（ Economics）

算法分析（Algorithm analysis）

MATLAB

心理学

Ethics

建筑学

TCAD

Adobe Photoshop

语言学 (linguistics)

历史学 (History)

文学 (Literature)

教育学 (Pedagogy)

天文学 (Astronomy)

地质学（geology）

SWOT

CAD(计算机辅助设计)

G语言

地理学（Geography）

Project management （管理学）

SysML

社会学（Sociology）

商业分析(Business Analysis)

市场营销学(Marketing)

人类学(Anthropology)

人文艺术(Arts and humanities)

电气工程（Electrical Engineering）

材料学（hylology）

生物科学（biological science）

哲学（Philosophy）

管理科学与工程类（Management science and Engineering）

工商管理（Business Administration）

数学（mathematics）

计算机（computer）

网络安全（Cyber Security）

统计学 Statistics

经济与贸易 Economy and trade

Excel

Chemistry

LaTeX

OCaml

SPSS

Project

ASP

Stata

FORTRAN

Information system

SDLC

Basic

Biological

Android

ruby

HTML/CSS

Scala

PHP语言

C#