Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

FIT5137 S2 2025 Assignment 3: PTV Assignment Scenario

(Weight = 35%)

Due date: Friday, 31 October 2025, 11:55 PM

General Information and Submission

●   This is an individual assignment.

●   Submission method: Submission is online through Moodle.

●   Penalty for late submission: 5% deduction for each day.

●   Assignment FAQ: There is an Assignment 3 FAQ page set up on the EdStem forum.

Assignment Background

 

You have been hired as a data analyst at Public Transport Victoria (PTV), the Victoria Government authority  responsible  for  public  transport  in  the  state.  Some  of your  duties  are  data  extraction, integration and analysis to provide good understanding regarding the public transportation condition in Victoria to the stakeholders.

After  the  COVID-19  restrictions  were  lifted,  most  companies  are  switching  the  workstyle  from work-from-home   to face-to-face. Therefore, transportation infrastructure and network is one of the most important aspects. While some people prefer to drive to work, some other people prefer to use the public transportation network as their main transportation mode. PTV as the sole provider for the public transportation network reduced their  services  during the  lockdown period. Now, PTV has restored the services to cover as many areas as possible in the whole region. However, some questions remained mysteries. How good is the current PTV coverage? Are there any uncovered spots? Which area has the best public transportation options?

Therefore, as a data analyst, your task is to evaluate the dataset and perform spatial analysis to assess bus accessibility specifically within the Greater Melbourne area (also referred to as the Melbourne Metropolitan area), and present your findings to the stakeholders at Public Transport Victoria (PTV). The data should be presented in an area level, such as municipality, suburbs or postcode. For example, you may present The number of bus services in Bundoora” .

PTV/GTFS dataset and Australian Boundary data

There are two datasets that you have to obtain in this assignment, which are the PTV/GTFS dataset and Australian Boundary data.

The  General  Transit  Feed  Specification  (GTFS)  is  a  data  specification  that  allows  public transit agencies to publish their transit data in a format that can be consumed by a wide variety of software applications. Today, the GTFS data format is used by thousands of public transport providers.

GTFS is  split into  a  schedule  component that  contains schedule, fare, and geographic transit information and a real-time component that contains arrival predictions, vehicle positions and service advisories. A GTFS feed is composed of a series of text files collected in a ZIP file. Each file models a particular aspect of transit information: stops, routes, trips, and other schedule data.

For more detailed information about GTFS, you can refer to the official documentation provided by  Google  athttps://developers.google.com/transit/gtfs.  Additionally,  You  can  read  further explanation   about   the   PTV-GTFS   data   fromhttps://transitfeeds.com/p/ptv/497  .    For   this assignment, we will be using the 17th March 2023 version of the dataset.

 

The GTFS data structure is shown below:

 

The Australian digital boundary is defined by the Australian Bureau of Statistics using the Australian Statistical Geography Standard (ASGS). The ASGS is a classification of Australia into a hierarchy of statistical areas. It is a social geography, developed to reflect the location of people and communities. It is used for the publication and analysis of official statistics and other data.  The  ASGS  is  updated  every  5  years  to  account  for  growth  and  change  in  Australia’s population, economy and infrastructure. For the 2021 release, the ASGS will be re-named to the Australian Statistical Geography Standard (ASGS) Edition 3.

The ASGS is split into two parts, the ABS and Non ABS Structures. The ABS Structures are geographies that the ABS designs  specifically  for the release and analysis of statistics. This means that the statistical areas are designed to meet the requirements of statistical collections as well as geographic concepts relevant to those statistics. This helps to ensure the confidentiality, accuracy and relevance of ABS data.The Non ABS Structures generally represent administrative regions which are not defined or maintained by the ABS, but for which the ABS is committed to directly providing a range of statistics.

The Main Structure is developed by the ABS and is used to release and analyse a broad range of social, demographic and economic statistics. It is a nested hierarchy of geographies, and each level directly aggregates to the next level. Mesh Blocks (MBs) are the smallest geographic areas defined by the ABS and form the building blocks for the larger regions of the ASGS. Most Mesh Blocks contain 30 to 60 dwellings.

Below is the simplified  ABS and Non ABS Structure. You can read further explanation about the structure here

https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition -3/jul2021-jun2026#overview

 

The Digital boundary files that you have to get is the Mesh Blocks dataset. The Mesh Blocks dataset is available as Shape file. You can read further explanation about the Mesh Blocks dataset here

https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition -3/jul2021-jun2026/access-and-downloads/digital-boundary-files

 

Allocation files are non-spatial representations of how each geography is aggregated from their building block geography. You can also read further explanation about the Allocation files dataset here

https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition -3/jul2021-jun2026/access-and-downloads/allocation-files

 

Population information

According to the Australian Bureau of Statistics (ABS), the estimated entire population of Greater Melbourne was 4,917,750 as recorded in the 2021 Census (all persons). You can find more detailed information about the population by referring to the official sources.

●   https://www.abs.gov.au/census/find-census-data/quickstats/2021/2GMEL

●   If you  need  to  use  population  information,  please  refer  exclusively  to  the  information provided  above  (4,917,750).  Do  not  use  any  external  population  data  sources  for  this assignment.

Assignment Task list

Your assignment consists of several parts. Always read the instructions one by one. Do not move to the step without completing the previous step:

●  Task  1:  Data  Restoration  -  Restore the  data to the  database.  Monitor the success indicator to ensure successful restoration of the data.

●  Task 2: Data Preprocessing  - Perform necessary  structure maintenance and create result tables for further processing.

●  Task  3: Data Analytics and Visualization - Develop SQL queries to analyze the data and evaluate performance & Create visualizations to present the results of the data analytics.

   No data cleaning required for this assignment.

   For more information, see the FAQ for Assignment 3.

For  simplicity,  all the  data  required  for this  assignment is  readily  available  in the PostGIS Docker  container.  You  can  access  these  datasets  within  the  container  by  navigating  to  the /data/adata folder. If you don’t know how to do it, refer to the lab 10 activities.

 

Verify your data before the restoration process.

 

As a data analyst, it is your responsibility to understand and

explore these publicly available data.

Assignment Task

Task 1: Data Restoration

Before you can start the data analytic processes, the first thing you have to do is to restore the external data  to  your  database.  Make  sure  you  prepare  a  destination  schema  to  restore  your  data.  The destination schema for your assignment is “ptv”.

Note:

●   Before initiating the data restoration process, it is essential to thoroughly explore the dataset. This exploration involves identifying appropriate data types, determining field lengths, and making other relevant considerations that will inform the creation of the table structure.

   Ensure  that  you  restore  the  data  into  the  PTV  schema  using  regular  (local) tables. Do not utilise foreign tables, as the data must be stored directly within the PostgreSQL database.

   Ensure that all tables are successfully restored, including 8 tables from GTFS and 3 tables for MB_2021, LGA_2021 and SAL_2021 respectively.

The outputs ofthis task for Report are:

a)  Attach a screenshot of the results to include all the tables you restored in Task 1,

including the number of rows for each table you restored by using following code:

with tbl as

(select table_schema, TABLE_NAME from information_schema.tables

where table_schema in ('ptv'))

select table_schema, TABLE_NAME,

(xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int AS rows_n

from tbl

order by table_name;

Task 2: Data Preprocessing for Melbourne Metropolitan area

The purpose of this section is to manipulate the data into a suitable format for the following task analysis. This task has two parts: Mandatory requirement and Optional requirements.

Mandatory requirement

[You must meet the mandatory requirements described in this section.]

In this assignment, we aim to explore the BUS accessibility [Topic of report] of the Melbourne Metropolitan area exclusively [Scope of report]. The mb_2021 table contains mesh blocks for the entire country of Australia. To minimise query costs, ensure that you only use the mesh blocks within the Melbourne Metropolitan area for this assignment. The Melbourne Metropolitan’s mesh blocks can be identified from the gcc_name21. If the column contains “Greater Melbourne”, this mesh block is located in Melbourne Metropolitan.

   As a result, you need to create a table called "mb2021_mel" that contains ONLY the mesh blocks in Melbourne Metropolitan.

 

Moreover, the land area of each Mesh Block, measured in  square kilometers, is provided in the areasqkm21 column. For instance, the Mesh Block with ogc_fid 165709 has an area of 0.0677 km² .

 

The Stops table does not have any geometry column. You need to add a geometry column, using the latitude and longitude values available in the table. Make sure you use GDA2020 (SRID:7844) for this column.

And, the Stops table does not show direct information regarding the vehicle types, routes_short_name and  routes_long_name.   These   informations   are   stored   in  the  Routes  table.   To  explore  bus accessibility,  use  the  Routes  table  to   filter  for  records  where  the  route_type  equals  3,  which corresponds to bus services/stops. All other route types should be excluded from the analysis.

●   As a result, you need to create a table called "stops_routes_mel" to encompass the following attributes: stop_id, stop_name, coordinates, route number (derived from routes_short_name), route  name  (derived  from  routes_long_name),  and  vehicle  type.   This  data  set  should encompass all bus stops within the Melbourne Metropolitan area.

Use this figure as an example of expected result. (Note: Data value is for demonstration purposes only.)

 

Mesh  Blocks  reflect  the  dominant  land  use  where  possible   (see  Land  Use   section  for  more information). For example, residential areas are typically distinguished from commercial or industrial zones. In this assignment, we assume that each Mesh Block is associated with a single land use category for  simplicity. The mb_cat21 column is used to indicate the land use type of each Mesh Block.

 

。 For instance, the Mesh Block with ogc_fid 166,178 is classified as residential)

Focus area:

[You  are  encouraged  to  explore  and  manipulate  the  data  creatively,  provided  that  your  analysis focuses exclusively on bus accessibility within the Melbourne Metropolitan area. Additionally, you are required to address at least one of the specified perspectives in your analysis.]

Focus areas can be  selected based  on your  specific  data  analysis needs. Make  sure to include a detailed explanation of your rationale in the report for optional requirements you choose.

Question:

Am I required to include at least one of the perspectives listed below?

Answer:

Yes,  you  are  expected  to  explore  and  manipulate  the  data  from  one  or  more  of the provided perspectives, ensuring your analysis remains focused on bus accessibility within

the Melbourne Metropolitan area.


The following perspectives are provided to guide the data exploration and analysis of bus accessibility in the Melbourne Metropolitan area. You are required to choose at least one perspective for your assignment:

   Residential & Population Perspective

 Condition: mb_cat21 = 'Residential'

 Examines  how  bus   services  support  areas  based  on  the  above  population  data, focusing on accessibility for daily commuting and general mobility.

   Employment & Industrial Perspective

 Condition: mb_cat21 IN ('Industrial', 'Primary Production')

 Assesses how bus services support access to areas designated for industrial activity and primary production, emphasizing bus needs for workers and associated economic activities.

   Natural Environment Perspective

 Condition: mb_cat21 IN ('Parkland', 'Water')

 Analyzes bus accessibility to natural areas such as parklands and water bodies, with a focus on supporting recreation, environmental conservation, and tourism access.

   Community Services & Infrastructure Perspective

 Condition: mb_cat21 IN ('Commercial', 'Education', 'Hospital/Medical')

 Evaluates  how  bus  services  connect  people  to  essential  community  services  and infrastructure, including commercial areas, schools and medical facilities..

The outputs ofthis task for Report are:

b)  Attach  a  screenshot  of  SQL  script  for  creating  a  table    named  “mb2021_mel”  that contains ONLY the mesh blocks in Melbourne Metropolitan.

c)   Attach  a screenshot of SQL script for creating a table  named “stops_routes_mel” that contains only bus stops within the Melbourne metropolitan area.

d)  Provide  a  detailed  explanation  of  the   remaining  data  processing   steps  you  have conducted  for  focus  areas  sections,  including  screenshots  of the  SQL  scripts  and  the rationale behind your choices in the report.

Task 3: Data Analytics and Visualisation

In this  section, you  are required to perform data analysis on the restored tables, focusing on bus accessibility within Metropolitan Melbourne and the perspective(s) you have selected for your analysis. Use the techniques you have learned in the spatial database part to carry out your analysis and  ensure  that  your  analysis  relates  to  the  main  topic:  bus   accessibility  in   metropolitan Melbourne.

This could include exploring different  statistical measures or carrying out other relevant analyses. Present your findings clearly and concisely, demonstrating your understanding of the dataset and highlighting any notable observations or patterns.

As part of this data visualisation, you will also need to create at least one map-based heatmap using QGIS to present your findings related to the main topics. These visualisations will be used in the next section of the assignment, the summary report. To support your analysis, you can include screenshots of the visualisations directly in the report.

Be sure to include the script or code used for data analysis and data visualisation in the appendix of your report. The script should provide clear instructions on how the analysis was performed and any necessary calculations or transformations applied to the data. This will ensure that your analysis can be  reproduced  and  verified.  Remember  to  include  appropriate  labels,  titles,  and  legends  in  your visualisations to make them easy to understand. The visualisations should be of sufficient quality and clarity to effectively convey your analysis findings.


Note:

   Use SQL queries to investigate the restored tables.

●   Conduct a thorough descriptive analysis to uncover insights within the data.

●    Summarise and Visualise your findings clearly and concisely.

●   Highlight key observations and patterns discovered during the analysis.

   Ensure your findings reflect a deep understanding of the data.

The outputs ofthis task for Report are:

e)   Data analysis and visualisation, including the screenshot of SQL script and visualisation. For the visualisation, it must contain at least one map base figure.