Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

COMSW3101 Introduction to Python

Problem 1 - Decrypting Government Data

Your job is to summarize this gov data about oil consumation

The format of the file is rather bizzare - note that each line has data for two months, in two different years! (Plus I had to hand edit the file to make it parseable)

Fortunately, Python is great for untangling and manipulating data.

Write a generator that reads from the given url over the network, and produces a summary line for a year’s data on each ‘next’ call

remember that urllib.request returns ‘bytes arrays’, not strings

The generator should read the lines of the oil2.txt file in a lazy fashion - it should only read 13 lines for every two years of output. Note a loop can have any number of ‘yield’ calls in it.

Ignore the monthly data, just extract the yearly info

Drop the month column

In addition to the ‘oil’ generator function, my solution had a separate helper function, ‘def makeCSV- Line(year, data):’

Here is the first two years of data, 2014 and 2013

Year,Quantity,QuantityChange,Unknown,Unknown2,Price,PriceChange

2014,2700903,-112867,246409332,-26397845,91.23,-5.72

2013,2813770,-283638,272807177,-40367786,96.95,-4.15

2012,3097408,-224509,313174963,-18407090,101.11,1.29

2011,3321917,-55160,331582053,79421544,99.82,25.15

2010,3377077,62290,252160509,63448733,74.67,17.74

2009,3314787,-275841,188711776,-153200712,56.93,-38.29

2008,3590628,-99940,341912488,104700835,95.22,30.95

2007,3690568,-43658,237211653,20584322,64.28,6.26

2006,3734226,-20445,216627331,40871990,58.01,11.20

2005,3754671,-66308,175755341,44012676,46.81,12.33

2004,3820979,144974,131742665,32575492,34.48,7.50

2003,3676005,257983,99167173,21883842,26.98,4.37

2002,3418022,-53045,77283331,2990437,22.61,1.21

2001,3471067,71827,74292894,-15583539,21.40,-5.04

2000,3399240,171148,89876433,38986812,26.44,10.68

1999,3228092,-14620,50889621,13637399,15.76,4.28

1998,3242712,173281,37252222,-16973685,11.49,-6.18

1997,3069431,175785,54225907,-704950,17.67,-1.32

1996,2893646,126333,54930857,11181204,18.98,3.17

now that we have something that looks like a CVS file, can do all kinds of things

could save it to a file then

excel, openoffice could read it

Python has a CVS Reader

with a little juggling, can easily pump the data into a panda DataFrame

Input:

with open('/tmp/oil.csv', 'w') as f:

for l in oil(url):

f.write(l + '\n')

o = oil(url)

ls = list(o)

s = '\n'.join(ls)

import pandas as pd

import io

10 # we will cover StringIO next week - kind of an 'in-memory' file

11 df = pd.read_csv(io.StringIO(s))

12df

Output:

Year Quantity QuantityChange Unknown Unknown2 Price PriceChange

0 2014 2700903 -112867 246409332 -26397845 91.23 -5.72

1 2013 2813770 -283638 272807177 -40367786 96.95 -4.15

2 2012 3097408 -224509 313174963 -18407090 101.11 1.29

3 2011 3321917 -55160 331582053 79421544 99.82 25.15

4 2010 3377077 62290 252160509 63448733 74.67 17.74

5 2009 3314787 -275841 188711776 -153200712 56.93 -38.29

6 2008 3590628 -99940 341912488 104700835 95.22 30.95

7 2007 3690568 -43658 237211653 20584322 64.28 6.26

10 8 2006 3734226 -20445 216627331 40871990 58.01 11.20

11 9 2005 3754671 -66308 175755341 44012676 46.81 12.33

12 10 2004 3820979 144974 131742665 32575492 34.48 7.50

13 11 2003 3676005 257983 99167173 21883842 26.98 4.37

14 12 2002 3418022 -53045 77283331 2990437 22.61 1.21

15 13 2001 3471067 71827 74292894 -15583539 21.40 -5.04

16 14 2000 3399240 171148 89876433 38986812 26.44 10.68

17 15 1999 3228092 -14620 50889621 13637399 15.76 4.28

18 16 1998 3242712 173281 37252222 -16973685 11.49 -6.18

19 17 1997 3069431 175785 54225907 -704950 17.67 -1.32

20 18 1996 2893646 126333 54930857 11181204 18.98 3.17

21 19 1995 2767313 63116 43749653 5270236 15.81 1.58

22 20 1994 2704197 160822 38479417 10041 14.23 -0.90

23 21 1993 2543375 248805 38469376 -83679 15.13 -1.68

Input:

1 [df['Price'].mean(), df['Price'].min(), df['Price'].max()]

Output:

1 [46.63681818181818, 11.49, 101.11]

Problem 2

suppose we want to convert between C(Celsius) and F(Fahrenheit), using the equation 9C = 5 (F-32)

could write functions ‘c2f’ and ‘f2c’

do all computation in floating point for this problem

Input:

def c2f(c):

return((9. * c + 5. * 32.) / 5.)

def f2c(f):

return(5. * (f - 32) / 9.)

[c2f(0), c2f(100), f2c(32), f2c(212)]

Output:

1 [32.0, 212.0, 0.0, 100.0]

to write f2c, we solved the equation for C, and made a function out of the other side of the equation

to write c2f, we solved for F, . . .

there is another way to think about this

rearrange the equation into a symmetric form 9 * C - 5 * F = -32 * 5

you can think of the equation above as a “constraint” between F and C. if you specify one variable, the other’s value is determined by the equation. in general, if we have c0 * x0 + c1 * x1 + … cN * xN = total

cI are fixed coefficients

specifying any N of the (N + 1) x’s will determine the remaining x variable

define a class, ‘Constaint’ that will do ‘constraint satisfaction’

you may find ‘dotnone’ to be helpful

Input:

# regular dot product, except that if or both values in a pair is 'None',

# that term is defined to contribute 0 to the sum

def dotnone(l1, l2):

'''another dot product variant'''

sum = 0

for e1,e2 in zip(l1,l2):

if not (e1 is None or e2 is None):

sum += e1 * e2

return(sum)

10 [dotnone([1,2,3], [4,5,6]), dotnone([1,None,3], [4,5,6]), dotnone([None,1], [2,Non

Output:

1 [32, 22, 0]

Input:

# setup constraint btw C and F

# 1st arg is var names,

# 2nd arg is coefficients

# 3rd arg is total

c = Constraint('C F', [9, -5], -5 * 32)

# 1st arg - variable index or name

# 2nd arg - variable value

# setvar will fire when there is only one unset variable remaining

# it will print the variable values, return them in a list, and

10 # clear all variable values

11 c.setvar(0, 100)

12 C = 100.0

13 F = 212.0

Output:

1 [100.0, 212.0]