Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Module code and Title

DTS303TC Big Data Security and Analytics

School Title

School of AI and Advanced Computing

Assignment Title

Assessment 2 – Project

Submission Deadline

Wednesday, November 1st 23:59,2023

(China Time, GMT + 8)

Final Word Count

N/A

DTS303TC Big Data Security and Analytics

Coursework 2 – Project

Submission deadline: 23:59, November 1st, 2023

Percentage in final mark: 60%

Learning outcomes assessed: C, D

Individual/Group: Individual

Length: Individual Report 2000 words (+/- 10%) + Application with Source Code and

Recorded Individual Presentation (not more than 5 minutes). The length of your report must

not be longer than 15 pages. The assessment has a total of 100 marks (20 marks for Part I and 80 marks for Part II)

Late policy: 5% of the total marks available for the assessment shall be deducted from the assessment mark for each working day after the submission date, up to a maximum of five   working days

Risks:

•  Please read the coursework instructions and requirements carefully. Not following these instructions and requirements may result in loss of marks.

•  The formal procedure for submitting coursework at XJTLU is strictly followed. Submission     link on Learning Mall will be provided in due course. The submission timestamp on Learning Mall will be used to check late submission.

PART I: Data Cryptography and Access Control (20%)

Cryptography includes a set of techniques for scrambling or disguising data so that it is available only to someone who can restore the data to its original form. In current computer systems, cryptography provides a strong, economical basis for keeping data secret and for verifying data integrity. Please answer the following questions:

Question 1: (5 marks)

Perform some research and discuss the cryptosystems and encryption schemes used to secure the following applications.

(i)    Privacy Enhanced Mail (PEM)

(ii)   Secure Electronic Transactions (SET)

(iii)  Secure Sockets Layer (SSL)

Note: Each answer only requires one or two sentences.

Question 2: (5 marks)

Perform some research and discuss the following criteria on how biometric data in access control systems are evaluated.

(i)    False reject rate

(ii)   False accept rate

(iii)  Crossover error rate

Note: Each answer only requires one or two sentences.

Question 3: (5 marks)

Decipher the following ciphertext which was encrypted with the Caesar cipher.

TEBKFKQEBZLROPBLCERJXKBSBKQP

What is the most likely plaintext? Show your reasoning on how you arrive at the answer.

Question 4: (5 marks)

Decipher the following ciphertext which was encrypted with the Vigenere cipher.

TSMVM MPPCW CZUGX HPECP RFAUE IOBQW PPIMS FXIPC TSQPK SZNUL

OPACR DDPKT SLVFW ELTKR GHIZS FNIDF ARMUE NOSKR GDIPH WSGVL

EDMCM SMWKP IYOJS TLVFA HPBJI RAQIW HLDGA IYOUX

What is the key and the most likely plaintext? Show your reasoning on how you arrive at the answer.

PART II: Big Data Analytics for Information Security (80%)

Task Summary

Big data analytics for security is a rising trend that is helping security analysts and tool vendors do much more with data. Machine learning techniques can help security systems identify patterns and threats with no prior definitions, rules or attack signatures, and with much higher accuracy. However, to be effective, machine learning needs very big data. The challenge is storing so much more data than ever before, analyzing it in a timely manner, and extracting new insights. An organization that utilizes security and analytics tools can detect potential threats before they can affect the company's assets and infrastructure. An important tool for organizations to manage information security is through access control and only giving access to legitimate users. In this section, we will focus on using biometrics for access control and information security.

Conduct a Big data science study in the security domain, for example, biometrics which utilizes fingerprint, face, iris or other modalities. Other examples in the security domain will be fraud analytics, intrusion detection, etc. Write an individual report on your Big data security and analytics project. The report should be written in a clear and concise manner (and be no more than 2000 words in length). You should start by exploring a biometric modality that interests you. You need to identify a compact dataset (structured or unstructured) with a reasonable large size and number of attributes/variables in your chosen modality or modalities which can be used for the assessment. Your report should include the background of the chosen modality or modalities and the data analytics problem you attempt to solve, aims and objectives, significance of your study, and describe your analytics approach including the statistical method(s) and/or machine learning technique(s) you used to address the problem. You are required to submit an individual recorded video presentation to the Mediasite or other source which will be informed before the submission date.

Context

In recent years, information security has taken center stage in the personal and professional lives  of the majority of the global population. Data breaches are a daily occurrence, and intelligent  adversaries target consumers, corporations, and governments with practically no fear of being  detected or facing consequences for their actions. This is all occurring while the systems, networks, and applications that comprise the backbones of commerce and critical infrastructure are growing  ever more complex, interconnected, and unwieldy. Defenses built solely on the elements of faith-  based security—unaided intuition and “best” practices—are no longer sufficient. The rising trend  is for organizations to adopt the proven tools and techniques being used in other disciplines to  take an evolutionary step into Data-Driven Security.

This assessment has been designed to help you build the necessary skills to achieve the following learning  objectives  to  fulfil  the  learning  outcomes  of  this  module.  After  completing  this assessment, you should be able to:

.    Show proficiency with at least one data analytics software package; and

.     Demonstrate awareness of issues related to computer and data security

By completing this assessment item, you will acquire the knowledge of information security, data analytics and programming skills in Python to analyse the data from a security domain. You will also acquire the presentation skills necessary to present the analysis of the results in your report and recorded video to your audiences. This assessment will prepare you to address a Big data security and analytics/science problem in the real world.

Task Instructions

(1) Write a short individual project proposal to describe your Big data security and analytics project. Your project proposal should be written in a clear and concise manner (no more than 500 words or 1-page A4 size). You start by exploring an area or domain in biometrics which interests you. The project topic can be chosen from your target modality e.g., fingerprint, iris, face, palm print, etc. Show and discuss your proposal with the Teaching Assistant (TA) during the laboratory sessions. Please note that no mark will be given for this short proposal. However, this short proposal should serve as your first document to plan for your Big data security and analytics project.

(2) Write a report on your Big data security and analytics project. The report should be written in a clear and concise manner (and be no more than 2000 words in length). Your final report should be detailed and address the following areas:

.    Clearly define the problem definition in your Big data security and analytics project.

.     Describe the significance of your Big data security and analytics project in the chosen domain or area.

.    Identify a compact dataset (structured or unstructured) with a reasonable large size and number of attributes/variables in your chosen dataset. Some examples are shown in the table below.

Note 1: On the one hand, students aiming for “Excellent” or “Very Good” grades will pay attention to the complexity of the selected security dataset and advanced approaches/steps to perform the analytics. For example, students could demonstrate  individual modality performances for palm print and knuckle print, and then show that a combined multimodality (palm print and knuckle print) approach could give higher   performance.    On   the    other    hand,   standard    and/or    conventional approaches/steps  for  a  single   modality  solution  would   be  likely  awarded  an “Adequate”, “Competent” or Comprehensive” grade.

Security

Domain

Dataset

Fraud

https://www.kaggle.com/datasets/kartik2112/fraud-detection

Palm print and knuckle print

https://www.kaggle.com/datasets/michaelgoh/contactless- knuckle-palm-print-and-vein-dataset

Fingerprint

https://www.kaggle.com/datasets/ruizgara/socofing

Hand tremor

https://www.kaggle.com/datasets/hakmesyo/hand-tremor- dataset-for-biometric-recognition

Iris

https://www.kaggle.com/datasets/naureenmohammad/mmu- iris-dataset

.     Highlight the project aim and objectives.

.     Discuss the background of your chosen topic in the domain or area.

.     Describe the analytics approach used.

.     Describe how your analytics approach helped answer the problem and the statistical method(s) and machine learning technique(s) you used.

.     Describe all the steps you took to analyse your data.

.    Discuss the results of the analysis.

. Include evidence,  such as tables, graphs and  plots from the  programming codes, to support your results.

(3) Prepare and record a short individual presentation (5 minutes) to introduce and explain your Big data security and analytics project and its significance. Your presentation should list the data science question or problem, describe your analytics approach and the statistical and/or machine learning method(s) you used to address the data science problem.  Present  and  discuss  the  results  of  your  analysis,  and  provide  evidence (screenshots) from your programming codes to support the results. Your presentation should be clear, should be in no more than eight PowerPoint slides, and you should not take more than 5 minutes to go through them. Your video presentation file cannot be more than 50MB.

Note: Students MUST use the tools and software packages in the lab sessions to support their

data analytics involving practical scenarios.

Additionally, your final report should:

.    be clearly structured (with well-organised content); and

.     use the APA referencing style and include a reference list at the end.

For this assessment item, you are required to create programs using Python programming language in software packages from your lab sessions to analyse your data. You are also required to submit the programming source codes with the final report. Your programming source codes should be:

.    written in Python programming language;

.     use the packages studied in lab e.g., pyspark for analysis, not external packages e.g. pandas, numpy, seaborn and sklearn;

.    can use purely visualization tool e.g., excel, Matplotlib to display, not analysis;

.    well commented upon in relation to both the main program and each individual module, such as the function module; and

.    free of errors, such as syntax errors, runtime errors, etc.

Report Format

. Cover  Page: This should  include the Assessment Number, Assessment Title, Student Name, Student ID and Student Email.

. Body of the report: This should include all the relevant section headings to address each aspect as indicated/highlighted in the question and the marking rubric.

. References: Both your in-text and the references included in the ‘References’ section the end of the report should adhere to the APA style.

. Glossary (Optional): This should include any terms frequently used in the report.

The following points are a general guide for the presentation of assessment items:

Assessments items should be typed;

.    Use single spacing;

.     Use a wide left margin (as markers need space to be able to include their comments);

.     Use a standard 12-point font, such as Times New Roman, Calibri or Arial;

.    Left-justify body text;

.     Number your pages (excepting the cover page);

.    Insert a header or footer that details your name and student number on each page;

.    Always keep a copy (both hard and electronic) of your assessments; and

.     Most importantly, always run a spelling and grammar check; however, remember, such checks may not pick up all errors. You should still edit your work manually and carefully.