Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

MN3515 Business Data Analytics

70% Assignment 2 2022/23

You will use R to mine actual data for a problem of interest. These could be data from a problem from your current job if you have one, something of interest to the School of Management or College, data acquired from the web, etc. (there are suggestions as to places where you can find relevant data on the electronic reading list for this course). You will design the data mining task, mine the data, and describe your results.  You also will research existing solutions to the problem, if any have been proposed or documented.  Your own data and results need not be on a par with actual industry results; the goal is for you to get as realistic a hands-on experience as possible, given the constraints of what you have learned.

In writing up/presenting your research, think of yourself as an analyst employed by or retained by a company (large or small) or by a funding source (e.g., a venture capital (VC) firm or incubator), who wants to understand the state of the art for using data mining for the task in question.  Review what has been done to date on your problem. Consider as an example  predictive analytics for on-line advertising:   A VC firm considering funding on-line ad networks or ad-tech start-ups would need to understand the state of the art in using data mining for targeting on- line advertising, when considering an idea for applying data mining.  Don’t worry too much about coming up with a novel idea.  It is more important to develop the idea well (within the scope of what we’ve discussed in class).

You should use the CRISP-DM data mining process to structure your research and report. Keep in mind that it may be ineffective simply to proceed linearly through the steps, and this may need to be reflected in your analysis. You should interact with me from the preparation of your initial ideas through to the preparation of your report, as a consultant would interact with a firm or funding source in preparing a research report. Use your imagination, prior experience, or ask for help to fill in any gaps between the material available and what you would be able to find out if you actually could interact with the client firm.

This assignment will have a phased submission of work, as follows:

Submission 1: On Tuesday 28 February 2023, you will submit a proposal for your project via Moodle.  The bare minimum you may submit is:

•    Your selected data file or a URL which will take me straight to the right webpage to find it

•    A list of the analytical methods you are going to use to analyse your data file

•    The name of the dependent variable which you are trying to predict

If you wish you can give me more details and/or questions so that I can give you brief feedback as to whether your proposal is viable. You might include in your proposal your ideas about: What is the exact business problem? What precisely is the data mining problem?  Is it supervised or unsupervised?  What data will you be using and where will you obtain it? What is a data instance?  What might be the target variable?  What features/variables would be useful?  How exactly would it add business value? And so on.

Submission 2: On Monday 24 April 2023 you will submit your final report which should be about 1500 words, plus any appendices you would like to include.  Use external sources where appropriate and provide clear citations and bibliography. You must also submit your data file and a working R script which I can run against it.

You will get the most out of the project if you interact with me during the development of your ideas.  Please feel free to talk to me about your ideas as often as you’d like either in workshops in the second half of term or in my online feedback and guidance hours. Or email me with specific questions/problems you are having please include your complete R script file and data file or link to it so that I can answer your question easily.

While we often learn coding by copying and editing code written by others, there is a limit to how much copying you can do for this assignment. You may copy code from any of the workshop notes. You may also copy code snippets (a line or two at a time) from elsewhere so long as you have to edit them in some way to refer to your dataset. You should not copy code from a source which is working with the same dataset that you are using I will regard this as plagiarism, and you will then have to take the consequences.

Your report should include the information detailed below, in approximately the order given. Be as precise/specific as you can.

Business Understanding (take this seriously)

•    Identify, define, and motivate the business problem that you are addressing.

•    How (precisely) will a data mining solution address the business problem?

(NB: Id like to see a good definition/motivation of the business problem and a precise statement of how a data mining solution will address the problem. Its not so important that the hands-on results match perfectly. Its more important that you have the experience of working through a realistic problem definition.)

Data Understanding

•    Identify and describe the data (and data sources) that will support data mining to address the business problem.  Include those aspects of the data that we talk about in class and/or in the quizzes. This should include some exploration of the data, such as:

o Summary statistics

o Visualisation using graphs/charts

Data Preparation

•    Specify how these data are integrated and prepared to produce the format required for data mining.

(NB: data preparation can be time consuming. Get started early. Talk to me if you need advice.)

Modelling

•    Specify the type of model(s) built and/or patterns mined.

•    Discuss choices for data mining algorithm: what are alternatives, and what are the pros and cons? How did you evaluate each of the models you used? Which is the best-performing model?

•    Discuss  why  and  how  this  model  should solve”  the  business  problem  (i.e.,  improve  along  some dimension of interest to the firm).

Evaluation

•    Discuss  how the  result of the data  mining  is/should  be evaluated.   How should  a  business case  be developed to project expected improvement?  ROI?   If this is impossible/very difficult, explain why and identify any viable alternatives.

Deployment

•    Discuss how the result of the data mining will be deployed.

•    Discuss any issues the firm should be aware of regarding deployment.

•    Are there important ethical considerations?

•    Identify the risks associated with your proposed plan and how you would mitigate them.

MARKING CRITERIA

The submitted and assessed part of this coursework is a report together with R code and data files, rather than an academic essay.  Thus, the marking criteria are different from those usually required for an academic essay.  Your assignment will be assessed on the criteria shown in the rubric on the next page:

•    The percentage given in the leftmost cell of each row shows you the percentage of the final mark available for that criterion

•    The % shown in the topmost cell of each column shows you the range of final marks you would achieve if you were awarded marks in this column for all criteria

•    Your feedback will include a mark for each criterion enabling you to see exactly where you gained/lost marks