ISIT312 Big Data Management Assignment 1 Spring 2023
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ISIT312 Big Data Management
Assignment 1
Spring 2023
Published on 24 July 2023
Scope
This assignment includes the tasks related to implementation of HDFS application and implementation MapReduce applications .
This assignment is due on Saturday, 19 August 2023, 7:00pm (sharp).
This assignment is worth 10% of the total evaluation in the subject.
The assignment consists of 4 tasks and specification of each task starts from a new page.
Only electronic submission through Moodle at:
https://moodle.uowplatform.edu.au/login/index.php
will be accepted. A submission procedure is explained at the end of Assignment 1 specification.
A policy regarding late submissions is included in the subject outline.
Only one submission of Assignment 1 is allowed and only one submission per student is accepted.
A submission marked by Moodle as "late" is always treated as a late submission no matter how many seconds it is late.
A submission that contains an incorrect file attached is treated as a correct submission with all consequences coming from the evaluation of the file attached.
All files left on Moodle in a state "Draft(not submitted)" will not be evaluated.
A submission of compressed files (zipped, gzipped, rared, tared, 7-zipped, lhzed, … etc) is not allowed. The compressed files will not be evaluated.
An implementation that does not compile well due to one or more syntactical and/or run time errors scores no marks.
The first assignment is an individual assignment and it is expected that all its tasks will be solved individually without any cooperation with the other students. However, it is allowed to declare in the submission comments that a particular component or task of this assignment has been implemented in cooperation with another student. In such a case evaluation of a task or component may be shared with another student. In all other cases plagiarism will result in a FAIL grade being recorded for entire assignment. If you have any doubts, questions, etc. please consult your lecturer or tutor during laboratory/tutorial classes or over e-mail.
Task 1 (1 mark)
Merging files in HDFS
Read an analyse HDFS applications provided in the files FileSystemCat.java and FileSystemPut.java and available in a folder Resources attached to a specification of laboratory class for Week2 on Moodle.
Use the applications FileSystemCat.java and FileSystemPut.java to implement in Java HDFS application, that merges two files located in HDFS into one file also located in HDFS.
The application must have the following parameters.
(1) A path to, and a name of the first input file in HDFS.
(2) A path to, and a name of the second input file in HDFS.
(3) A path to, and a new name of an output file to be created in HDFS. The file supposed to contain the contents of the first input file followed by the contents of the second input file.
Implement the application and save its source code in a file solution1.java.
Upload to two files to HDFS. The contents, the name, and the locations of the files in HDSF are up to you.
When ready, compile, create jar file, and process your application. Display the results created by the application.
Use Hadoop to provide an evidence, that two files uploaded into HDFS has been successful merged in one file in HDFS.
Deliverables
A file solution1.txt that contains a listing of source code of your application , a report from compilation, creation of jar file, uploading to HDFS two small files for testing, listing of both files in HDFS, processing of the application and an evidence that that two files uploaded into HDFS has been successful merges in one file in HDFS. A file solution1.txt must be created through Copy/Paste of the contents of Terminal window into a file solution1.txt. No screen dumps are allowed and no screen dumps will be evaluated.
Task 2 (2 marks)
Implementation of a simple MapReduce application
Read an analyse MapReduce application provided in a file Filter.java available in a folder Resources attached to a specification of laboratory class for Week3 on Moodle.
The application has the functionality equivalent to the functionality of the following SQL statement:
SELECT key, value
FROM sequence-of-key-value-pairs
WHERE value > given-value;
An objective of this task is to use the Java code provided in a file Filter.java to implement a MapReduce application Solution2 that has the functionality equivalent to the functionality of the following SQL statement:
SELECT item-name, price-per-unit * total-units
FROM sales.txt
WHERE price-per-unit * total-units > given-value;
A single line in an input data set sales.txt must have the following format.
item-name price-per-unit total-units
For example:
bolt 2 25
washer 3 8
screw 7 20
nail 5 10
screw 7 2
bolt 2 20
bolt 2 30
drill 10 5
washer 3 8
The contents of a file sales.txt is up to you as long as it is consistent with a format explained above.
A value of given-value must be passed through a parameter of your program.
Save your solution in a file Solution2.java.
When ready list Solution2.java in Terminal window, compile, create jar file, and process the application. List an input dataset sales.txt in Terminal window and the results created by the application. When completed, Copy and Paste all messages from a Terminal screen into a file solution2.txt.
Deliverables
A file solution2.txt with a listing of source code of your application, report from compilation, creating jar file, processing the application, listing of a file sales.txt and listing of the results of processing of MapReduce application Solution2.java. A file solution2.txt must be created through Copy/Paste of the contents of Terminal window into a file solution2.txt. No screen dumps are allowed and no screen dumps will be evaluated.
Task 3 (3 marks)
Implementation of a simple MapReduce application
Read an analyse MapReduce application provided in a file MinMax.java available in a folder Resources attached to a specification of laboratory class for Week3 on Moodle.
The application has the functionality equivalent to the functionality of the following SQL statement.
SELECT key, MIN(value), MAX(value)
FROM sequence-of-key-value-pairs
GROUP BY key;
An objective of this task is to use the Java code provided in a file MinMax.java to implement a MapReduce application Solution3 that has the functionality equivalent to the functionality of the following SQL statement.
SELECT item-name, SUM(price-per-unit * total-units)
FROM sales.txt
GROUP BY item-name
A single line in an input data set sales.txt must have the following format.
item-name price-per-unit total-units
For example:
bolt 2 25
washer 3 8
screw 7 20
nail 5 10
screw 7 2
bolt 2 20
bolt 2 30
drill 10 5
washer 3 8
The contents of a file sales.txt is up to you as long as it is consistent with a format explained above.
Save your solution in a file Solution3.java.
When ready list Solution3.java in Terminal window, compile, create jar file, and process the application. List an input dataset sales.txt in Terminal window and the results created by the application. When completed, Copy and Paste all messages from a Terminal screen into a file solution3.txt.
Deliverables
A file solution3.txt with a listing of source code of your application, report from compilation, creating jar file, processing the application, listing a file sales.txt and listing of the results of processing of MapReduce application Solution3.java. A file solution3.txt must be created through Copy/Paste of the contents of Terminal window into a file solution3.txt. No screen dumps are allowed and no screen dumps will be evaluated.
Task 4 (4 marks)
Implementation of MapReduce application
Assume, that a bank records in a text file the withdrawals and deposits of certain amounts of money from the bank accounts. A single row in a file with the withdrawal/deposit records consists of an account number, a date when a withdrawal/deposit occurred, and an amount of money involved. Assume, that the withdrawals are represented by the negative numbers and the deposits are represent by the positive numbers and that each withdrawal/deposit modulo 50 = 0. All values in a single record are always separated with a single blank.
An objective of this task is to implement MapReduce application Solution4 that finds the total amount of money deposited by each customer per year. For example, if a sample file with the withdrawals and deposits contains the following lines
1234567
1234567
9876543
9876543
9876543
1234567
9876543
12-DEC-2019
15-DEC-2019
25-JUL-2018
12-FEB-2018
01-JAN-2019
21-OCT-2020
22-OCT-2019
200
50
150
-50
150
-250
300
then your application supposed to produce the following outputs.
1234567 2019 250
9876543 2018 150
9876543 2019 450
The order of the lines listed above is up to you.
Upload to a local file system a small file for the purpose of future testing. The file must contain the withdrawals and deposits and it must have an internal structure the same as it is explained and visualized above. A name of file and location of file in a local file system is up to you.
Save your solution in a file Solution4.java.
When ready list Solution4.java in Terminal window, compile, create jar file, and process the application. List an input dataset with information about deposits and withdrawals in Terminal window and the results created by the application. When completed, Copy and Paste all messages from a Terminal screen into a file solution4.txt.
Deliverables
A file solution4.txt with a listing of source code of your application, report from compilation, creating jar file, processing the application, listing a file with information about deposits and withdrawals and listing of the results of processing of MapReduce application Solution4.java. A file solution4.txt must be created through Copy/Paste of the contents of Terminal window into a file solution4.txt. No screen dumps are allowed and no screen dumps will be evaluated.
Submission of Assignment 1
Note, that you have only one submission. So, make it absolutely sure that you submit the correct files with the correct contents. No other submission is possible !
Submit the files solution1.txt, solution2.txt, solution3.txt, and solution4.txt through Moodle in the following way:
(1) Access Moodle at http://moodle.uowplatform.edu.au/
(2) To login use a Login link located in the right upper corner the Web page or in the middle of the bottom of the Web page
(3) When logged select a site ISIT312/912 (S223) Big Data
Management
(4) Scroll down to a section Assessment items (Assignments)
(5) Click at In this place you can submit the outcomes of your work on the tasks included in Assignment 1 link.
(6) Click at a button Add Submission
(7) Move a file solution1.txt into an area You can drag and drop files here to add them. You can also use a link Add…
(8) Repeat step (7) for the remaining files solution2.txt, solution3.txt, and solution4.txt
(9) Click at a button Save changes
(10) Click at the checkbox with a text attached: By checking this box, I confirm that this submission is my own work, … in order to confirm the authorship of your submission .
(11) Click at a button Continue
(12) Check if Submission status is Submitted for grading.
2023-08-10