关键词 > Python代写

Big Data Analytics using Spark

发布时间:2021-10-06

Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


Big Data Analytics using Spark

Course Overview


Welcome to Big Data Analytics using Spark! This course teaches you how to perform statistical analysis of very large datasets that do not fit on a single computer. You will learn some of the most popular tools for performing this type of analysis: apache spark, XGBoost and TensorFlow. You will learn how to use these tools through Jupyter Notebooks and experience the power of combining narrative, code and graphics to create convincing analytical documents.


Instructor

Yoav Freund, Professor of Electrical and Computer Engineering, UC San Diego


Teaching Assistants

Litao Qiao, Graduate Student, CSE, UC San Diego


Prerequisites

The most important prerequisites for this course are:

● The ability to program in Python and to use Jupyter notebooks. This can be obtained by taking the course DSE 200x, Python for Data Science.

● Probability and statistics. This can be obtained by taking the course DSE 210X, Probability and Statistics using Python.

● Machine Learning: This can be obtained by taking the course DSE 220X, Machine learning Fundamentals.


Learning objectives

This course has two main goals: The first is an introduction to using large scale data analysis frameworks (Spark, XGBoost and TensorFlow). This includes the underlying computer architecture and the programming abstractions. The second is to combine methods from statistics and machine learning to perform large scale analysis, identify statistically significant pattern and visualize statistical summaries.


Course Outline

This is a ten-week course.


Topics

● Memory Hierarchy, latency vs. throughput.

● Spark Basics

● Dataframes and SQL

● PCA and weather analysis

● K-means and intrinsic dimensions

● Decision trees, boosting, and random forests

● Neural Networks and TensorFlow


Python notebooks

Jupyter notebooks are the foundation of this course. Most of the videos are overviews of a notebook. It is recommended that you follow along in the notebook while observing the video. The notebooks contain explanations, code and figures, as well as interactive widgets and pointers to additional material. It is recommended that questions to the forum refer to locations in the notebooks.


Discussion forums

Discussion forums provide an opportunity for learners to discuss course materials with each other and with course staff.


Verified Certificate / Verification Deadline

The deadline to switch from the un-verified track to the verified track is before three days before the first assignment due date (May 30th, 2020 @ 18:00 UTC).


Assignments and exams

● Weekly programming assignments (50% of grade):

● Engagement (0%). This consists of simply checking the "mark as complete" button after viewing each video and associated materials.

● Poll questions (0%).

● Weekly Comprehensive quizzes (15% of grade). These are simple multiple-choice questions based on the week's videos and Notebooks.

● Final exam (35% of grade). This consists of a Dataset and a notebook with a set of questions. You are to use the notebook to analyze the data and answer the questions.


Time and grading policies for weekly assignments

Each assignment is due five weeks after it is given.

The worst two comprehension quiz scores will be dropped and the worst two programming assignment scores will be dropped. This means, for instance, that it is possible to obtain a full score while skipping any two of the programming assignments and any two of the comprehension quizzes.


Verified Learners

Learners can earn a verified certificate for the course by enrolling as part of the verified track, completing identity verification, completing the proctored exam, and earning a passing grade.


Grading

Grades will be assigned based on final scores, according to the following rubric:

85-100%: A

65-85%: B

Less than 65%: F


Effort

The weekly effort for the course is intended to be roughly 10 - 20 hours.


Pace and deadlines

The course is instructor-paced. Every two weeks, two week’s worth of the relevant material (videos and assignments) will be released, and will remain online until the end of the course. We encourage learners to keep current with the videos and assignments; however, as described above, there is a five week window for submitting each assignment.


Honor code

Beyond learning this important material, we hope learners will take the course seriously and respect fellow students. Please read and abide by the edX honor code pledge.


We value your feedback

This is a new online course. We are committed to making it as accessible and educational as possible, and would appreciate any feedback about how we might improve it.


Thank you!

Thank you very much for taking the course. We hope you enjoy it.