DATA2001 – Data Science, Big Data, and Data Diversity


Practical Assignment: Bushfire Risk Analysis

– Assignment specification available in Canvas (Canvas: Modules –> Assignment)

– Worth 20% of the final grade in DATA2001/DATA2901

– Due on Friday of Week 12

Python/SQL notebook; brief report; team demo in tutorials of Week 12/13

– Main idea:

– Calculate a 'risk' or impact score' per suburb in Greater Sydney wrt. bushfires

• Based on ABS data about population and bushfire prone areas in NSW

– Visualise and correlate with income data

– Goal: Practical experience with data variety, data analysis, and presentation

– Technologies as covered in this course: Python, Jupyter notebooks, Web APIs, and SQL

– Three tasks:

– Data import, integration and database generation

• We provide census data and spatial data from NSW Rural Fire Services

• Needs to be loaded into database and combined, eg. via spatial join

• Feel free to extend with own datasets

• Milestone 1: Integration of provided datasets to be ready in Week 11 tutes

– Bushfire Risk Analysis (Jupyter Notebook)

• Computation of risk score per neighborhood; example formula is provided

• When adding other datasets, feel free to adjust formula

• Correlation analysis to affluency of neighborhoods

– Documentation and (brief) Report

– Additional tasks/options on web access and ML for teams in advanced stream


Provided Datasets (cf. Canvas)

– ABS Data

– Census data on neighbourhoods (SA2-level areas) in Greater Sydney+surrounds such as population, land area, number of dwellings

– Business statistics per SA2-area

– Income and rent statistics to check for correlation with

– NSW Rural Fire Services – Bush Fire Prone Land (BFPL)

– Locations and areas of bushfire prone land in NSW with 3 categories

– Note that SA2-level data from the ABS does not always match suburbs, and that the BFPL data is heavily simplified with just a GPS location and an area size; neither the ABS neighbourhoods nor the BPFL data contain actual shapes

– cf. tutorial this week on how to retrieve boundary data for neighbourhoods

– Adding more datasets from your side is explicitly encouraged.

– Try different types and forms, not just CSV…


Assignment Rules

– Groupwork

– teams of 2 (unless odd-size class or other good reasons)

– All team members should be in the same tutorial

– Deliverables: Jupyter notebook with source code and a short report (PDF)

See page 4 of the assignment handout

– Due on Friday of Week 12

– Submission page and marking rubric will be published in Canvas

– Only one member per team needs to submit for the whole group; they should submit both a ZIP archive under "Bushfire Risk Analysis Assignment" and also the PDF of your report in the separate "TurnItIn Dropbox – Bushfire Risk Analysis"

– Late submissions: -20% of achieved mark per day late

– Demo in Weeks 12 and 13

– There will be a short demo during the tutorials of the last two weeks to the tutors

– Individual grades can be scaled based on participation in project or demo


Tip: PostGIS

– Spatial database extension for PostgreSQL supporting geographic objects (OGC)

– Geometry types for Points, LineStrings, Polygons, MultiPoints, etc.

• including import/export from standard formats such as GeoJSON or KML

– Support for spatial reference systems and transformations between

– Spatial predicates on geometries using the 3x3 nine-intersection model

– Spatial operators for determining geospatial measurements like area, distance, length and perimeter, and geospatial set operations, like union, difference etc.

– R-Tree indexing (over GiST)

– Example:

INSERT INTO superhero VALUES ('Catwoman', ST_SetSRID(ST_MakePoint(41.87,-87.634), 4326);

SELECT superhero.name

    FROM city, superhero

  WHERE ST_Contains(city.geom, superhero.location)

      AND city.name = 'Gotham';


WGS84 versus Australian GDA94

– WGS84 is used by the GPS system

– The official geodetic datum (coordinate system) for Australia is GDA94 ("Geocentric Datum of Australia")

– Based on IERS Terrestrial Reference Frame (ITRF), but fixed to a number of reference points in Australia.

– ABS data will use GDA94

– Difference between WGS84 and GDA94:

– "The spheroids used for WGS84 and GDA84 are also almost identical, and both systems are geocentric. Thus for most mapping, exploration and GIS uses, WGS84 and GDA94 coordinates will be the same. […] For precise surveys, however, the difference between WGS84 and GDA94 may be significant, and changes slowly over time. […] The difference between GDA94 and WGS84 is approximately 45cms in 2000."


OpenGIS Consortium (OGC) Data Model