CSC108H Assignment 1


Deadline: Wednesday June 2 2021 by 5:00pm EST

Late policy: There are penalties for submitting the assignment after the due date. These penalties depend on how many hours late your submission is. Please see the syllabus on Quercus for more information.


Goals of this Assignment

Use the Function Design Recipe to plan, implement, and test functions.

Write function bodies using variables, numeric types, strings, and conditional statements. (You can do this whole assignment with only the concepts from Weeks 1, 2, and 3 of the course.)

Learn to use Python 3, Wing 101, provided starter code, a checker module, and other tools.


Tweet Analyser

This assignment is based on the social network company Twitter. Twitter allows users to read and post tweets that are between 1 and 280 characters long, inclusive. In this assignment, you will be writing functions that (we imagine) are part of the programs that manage Twitter feeds.

Here are some example tweets:

Standing ovation as Setsuko Thurlow is awarded a Doctor of Laws degree, honoris causa, by the University of Toronto @UofT for her tireless nuclear disarmament work and contributions to the Treaty on the Prohibition of Nuclear Weapons with @nuclearban ICAN

Congratulations to our class of 2021 #UofTGrad21

Looking for accurate #COVID19 information, including about vaccines? @Canada has a number of resources available in various languages here, including video: https://bit.ly/3epnTyy #cdnpoli


Some terminology

tweet: A message posted on Twitter. For our purposes, the message text is between 1 and MAX_TWEET_LENGTH characters long (inclusive). MAX_TWEET_LENGTH is a constant.

tweet word: A word in a tweet. For our purposes, a tweet word contains only alphanumeric characters and underscores. For example, pink_elephant is a tweet word, while bits&pieces is not (In fact, bits&pieces is two tweet words, bits and pieces , with an ampersand ( & ) between them.)

hashtag: A word in a tweet that begins with the hash symbol. Twitter uses the number sign ( # ) as the hash symbol. For our assignment, we'll use the constant HASHTAG_SYMBOL to represent the hash symbol. Hashtags are used to label important words or terms in a tweet. A valid hashtag has the hash symbol as its first character and the rest of the characters form a tweet word. In other words, a hashtag begins with the hash symbol, and contains all alphanumeric characters and underscores up to (but not including) the first non-alphanumeric character (such as space, punctuation, etc.) or the end of the tweet. A hashtag can appear anywhere in a tweet, at the beginning, in the middle, or at the end. A hashtag must contain at least one alphanumeric character.

#UofT , #csc108 , and #COVID19 are three examples of hashtags on Twitter.

Note that a hashtag is not a tweet word, because it has the hash symbol as its first character.

mention: A word in a tweet that begins with the mention symbol. Twitter uses the at-sign ( @ ) as the mention symbol. For our assignment, we'll use the constant MENTION_SYMBOL to represent the mention symbol. Mentions are used to direct a message at or about a particular Twitter user, so the word should be a Twitter username (but for the purposes of this assignment, we will not check if the username is valid — we'll just assume it). For our purposes, the definition of a mention is very similar to that of a hashtag. A valid mention has the mention symbol as its first character and the rest of the characters form a tweet word. In other words, a mention begins with the at-sign, and contains all alphanumeric characters and underscores up to (but not including) the first non alphanumeric character (such as space, punctua!on, etc.) or the end of the tweet. A men!on can appear anywhere in a tweet, at the beginning, in the middle, or at the end. A mention must contain at least one alphanumeric character.

@redcrosscanada , @UN_Women , and @UofTGrad2021 are three examples of Twitter mentions.

Note that a mention is not a tweet word, because it has the mention symbol as its first character.

Here are some more interesting examples of how we will treat tweet words, hashtags, and mentions in this assignment.

In the tweet

CTVNews on vaccination,#COVID19, watch @CTVNews!!    #Vaccination

we have four tweet words ( CTVNews , on , vaccination , and watch ), two hashtags ( #COVID19 and #Vaccination ), and one mention ( @CTVNews ). It is important to note that in this example there is no space between the first comma and the hashtag #COVID19 , there is a comma immediately following the hashtag #COVID19 , there are two exclamation marks immediately following the mention @CTVNews , and there are more than one space after the exclamation marks. All these are valid in a tweet. Also note that the first occurrence of the word CTVNews is not considered to be a mention, because it does not have the mention symbol.

In the tweet

@UofT welcomes its 2021 graduates! #UofTGrad2021#graduation!

we have four tweet words ( welcomes , its , 2021 , and graduates ), two hashtags ( #UofTGrad2021 and #graduation ), and one mention ( @UofT ). It is important to note that in this example there is no space between hashtags #UofTGrad2021 and #graduation . This is also valid in a tweet.

Some more obscure yet valid examples:

In something#something_else we consider something is a tweet word and #something_else is a hashtag.

In no@spaces#whatsoever?! we consider no is a tweet word, @spaces is a mention, and #whatsoever is a hashtag.

For a complete list of Twitter terms, check out the Twtter glossary.


Starter code

For this assignment, we are giving you some files, including a Python starter code file. Please download the Assignment 1 Files and extract the zip archive.

Starter code: tweet.py

This file contains some constants, the header and the complete docstring (but not body) for the first function you are to write. Your job is to complete this file.

Checker: a1_checker.py

We have provided a checker program that you should use to check your code. See below for more information about a1_checker.py.


Constants

Constants are special variables whose values do not change once assigned. A different naming convention (uppercase pothole) is used for constants, so that programmers know to not change their values. For example, in the starter code, the constant MAX_TWEET_LENGTH is assigned the value 50 at the beginning of the module and the value of MAX_TWEET_LENGTH should never change in your code. When writing your code, if you need to use the value of the maximum tweet length, you should use MAX_TWEET_LENGTH . The same goes for the other constant values.

Using constants simplifies code modifications and improves readability. If we later decide to use a different tweet length, we would only have to change the length in one place (the MAX_TWEET_LENGTH assignment statement), rather than throughout the program.


What to do

In the starter code file tweet.py, complete the following function definitions. Use the Function Design Recipe that you have been learning in this course . We have included the type contracts in the following table; please read through the table to understand how the functions will be used.

We will be evaluating your docstrings in addition to your code. Please include two examples in your docstrings. You will need to paraphrase the full descriptions of the functions to get an appropriate docstring description.



Using Constants

As we discussed in section Constants above, your code should make use of the provided constants. If the value of one of those constants were changed, and your program rerun, your functions should work with those new values.

For example, if MAX_TWEET_LENGTH were changed, then your functions should work according to the new maximum tweet length.

Your docstring examples should reflect the given values of the constants in the provided starter code, and do not need to change.


No Input or Output

Your tweet.py file should contain the starter code, plus the function definitions specified above. tweet.py must not include any calls to the print and input functions. Do not add any import statements. Also, do not include any function calls or other code outside of the function definitions.


How should you test whether your code works

First, run the checker and review ALL output — you may need to scroll. You should also test each function individually by writing code to verify your functions in the Python shell. For example, after defining function compare_tweet_lengths , you might call it from the shell (e.g., compare_tweet_lengths('I love', 'programming') ) to check whether it returns the right value ( -1 ). One call usually isn't enough to thoroughly test the function — for example, we should also test compare_tweet_lengths('programming', 'is fun') where it should return 1 and compare_tweet_lengths('this course', 'is for me!!') where it should return 0 .


A1 Checker

We are providing a checker module ( a1_checker.py ) that tests two things:

whether your code follows the Python style guidelines, and

whether your func!ons are named correctly, have the correct number of parameters, and return the correct types.

To run the checker, open a1_checker.py and run it. Note: the checker file should be in the same directory as your tweet.py, as provided in the starter code zip file.

If the checker passes for both style and types:

Your code follows the style guidelines.

Your function names, number of parameters, and return types match the assignment specification. This does not mean that your code works correctly in all situations. We will run a different set of tests on your code once you hand it in, so be sure to thoroughly test your code yourself before submitting.

If the checker fails, carefully read the message provided:

It may have failed because your code did not follow the style guidelines. Review the error description(s) and fix the code style. Please see the PyTA documentation for more information about errors.

It may have failed because:

you are missing one or more func!on,

one or more of your functions is misnamed,

one or more of your functions has the incorrect number or type of parameters, or

one of more of your function return types does not match the assignment specifica!on.

Read the error message to identify the problematic function, review the function specification in the handout, and fix your code.

Make sure the checker passes before submitting. We have prepared a video walkthrough on how to run and use the checker.


Running the checker program on Markus

In addition to running the checker program on your own computer, run the checker on MarkUs as well. You will be able to run the checker program on MarkUs once every 24 hours. This can help to identify issues such as uploading the incorrect file.

Once you have submitted your work on MarkUs, click on the "Automated Testing" tab and then click on "Run Tests". Wait for a minute or so, then refresh the webpage. Once the tests have finished running, you'll see results for the Style Checker and Type Checker components of the checker program (see both the Automated Tes!ng tab and results files under the Submissions tab). Note that these are not actually marks -- just the checker results. If there are errors, edit your code, run the checker program again on your own machine to check that the problems are resolved, resubmit your assignment on MarkUs, and (if time permits) after the 24 hour period has elapsed, rerun the checker on MarkUs.


Marking

These are the aspects of your work that may be marked for A1:

Coding style (20%):

Make sure that you follow Python style guidelines that we have introduced and the Python coding conventions that we have been using throughout the semester. Although we don't provide an exhaustive list of style rules, the checker tests for style are complete, so if your code passes the checker, then it will earn full marks for coding style with one exception: docstrings may be evaluated separately. For each occurrence of a PyTA error, one mark (out of 20) deduction will be applied. For example, if a C0301 (line-too-long) error occurs 3 times, then 3 marks will be deducted.

All functions, including helper func!ons, should have complete docstrings including preconditions when you think they are necessary.

Correctness (80%): Your functions should perform as specified. Correctness, as measured by our tests, will count for the largest single portion of your marks. Once your assignment is submitted, we will run additional tests not provided in the checker. Passing the checker does not mean that your code will earn full marks for correctness.


No Remark Requests

No remark requests will be accepted. A syntax error could result in a grade of 0 on the assignment. Before the deadline, you are responsible for running your code and the checker program to identify and resolve any errors that will prevent our tests from running.


What to Hand In

The very last thing you do before submitting should be to run the checker program one last time.

Otherwise, you could make a small error in your final changes before submitting that causes your code to receive zero for correctness.

Submit tweet.py on MarkUs by following the instructions on the MarkUs website. Remember that spelling of filenames, including case, counts: your file must be named exactly as above.