Data-Led Executive Briefing

Due date: 13 January at 6pm GMT.

<Submission area>

A Jupyter notebook submitted (in a zipped or gzipped format) at the start of Term 2. There are two parts to the notebook: an Executive Briefing and a Reproducible Analysis. The word limit for the Executive Briefing portion of the notebook is 2,500 words. There is no limit to the amount of code in the Reproducible Analysis portion of the notebook.

The Executive Briefing portion of the notebook will present an analysis of data from the Inside Airbnb web for London Students may use data from more than one time period if they wish, but this is not required. It will be written as if to brief the Mayor of the Greater London Authority or the Chief Executive of an Investment Company on the challenges/opportunities relating to Airbnb's operations in London.

Students are free to select a narrower or different focus of their briefing but, for example, they may wish to develop the evidence either for/against the regulation of listings on Airbnb in London, or for/against investing in the Airbnb platform in London. The briefing document should reference existing policies, where relevant, and may make recommendations based on the analysis undertaken.

The Executive Briefing will begin with a short Executive Summary including Key Findings and, if desired, Recommendations; this will be followed by a brief review of the evidence both from London and elsewhere (similar to a Literature Review), and then an analysis supported by the data and presented using appropriately generated and selected tables, charts, and maps.

The Reproducible Analysis must be written in Python and may draw on concepts and methods covered in both Quantitative Methods and GIS. It should be possible to 'Restart Kernel and Run All' such that all graphs and tables used in the Markdown document are produced. For this reason, students are strongly encouraged to organise their notebook as follows (with the below sections as Level 2 Headers):

·    Front Matter: module name and number, your student id, the title of your briefing, and word count for the Executive Briefing.

·    Executive Summary: this does not need to include charts or tables and so does not require any code to have run. It does not count towards the word count but should not exceed approximately 1 page (formatted) in length.

·    Reproducible Analysis: all code, including data access, cleaning, and transformation. It should be possible for markers to select Kernel > Restart Kernel and Run All Cells... and be able to reproduce your entire analysis (e.g. data extraction, cleaning, transformation, clustering... charts, tables, etc.)

·    Executive Briefing: up to 2,500 words of analysis and supporting evidence (e.g. relevant literature and reviews of policy). Figures and tables should be generated by code cells included in the Briefing section. They may be generated by calls to functions defined earlier, or by inline Python code. Each figure or table counts for 250 words, and so students should give careful consideration to the trade-offs involved: more figures may serve to illustrate your points but leave you with much less space to synthesise and present and argument.

·    A figure with A/B elements will count as one figure, but only where the two parts are conceptually related (e.g. before/after; pre/post; non-spatial distribution/spatial distribution; type 1 and type 2; etc.). Figures with more than 2 elements will count as more than 1 figure. The only exception to this will be the output from PySAL's LISA analysis since that is formatted as 3 figures in one but they are all conceptually related. Similarly, Seaborn's jointplot would only be considered one plot even though it is technically three because the distribution plots in the margin are related to the scatter plot that is the focus of the plot.

·    In principle, a notebook with 10 figures would have no space for any writing or interpretation; this is deliberate because its purpose is to focus your attention on which charts and tables best-communicate your findings. In practice, using A/B figure layouts then you are looking at up to 20 separate figures before hitting the limit, though you would at this point be producing an infographic and not a briefing. Figures in the Reproducible Analysis will not be counted as part of your figure total but you may also not refer to them as a part of your briefing. So you don't need to go through your reproducible code and delete any/all figures that you produced as part of your research process, but you shouldn't refer to them in the text either.

The briefing may be written without substantially new modelling or coding by drawing on the code written in practicals to develop an analysis based on the judicious use of descriptive statistics (see, for instance, Housing and Inequality in London and The suburbanisation of poverty in British cities, 2004-16: extent, processes and nature), but it is likely that a better mark will be obtained by demonstrating the capacity to go beyond exactly what was taught by selectively deploying more advanced programming techniques.

The focus of this assessment is the student's ability to make use of concepts and methods covered in class as part of an analytical process to support decision-making in a non-academic context. It is not necessary that you employ every technique covered in class. It is necessary that you justify your choice of approach with reference to relevant academic and 'grey' literature, as well as the computational, statistical, and analytical objectives of your briefing paper. It is perfectly possible to complete this assessment without the use of advanced analytical topics (e.g. clustering, NLP, or global/local/LISA autocorrelation methods); however, it is unlikely that you would be able to complete this assessment to a high standard without some graphs and some maps chosen for their ability to advance your argument in place of 250 words of description or explanation.

Marking Scheme

The marking scheme for this submission has two parts:

1.   The Executive Briefing (60% of total mark for this submission) will be assessed as an essay incorporating analytical elements, with consideration given to the language, presentation, and content of the essay as befits a data-led briefing for a busy executive or policy-maker.

2.   The Reproducible Analysis will be assessed on the following criteria:

·    Reproducibility (20% of total mark): we are able to run the entire notebook without errors. Inability to reproduce the output of the notebook may affect our ability to evaluate the student submission in the other two areas.

·    Accuracy & Legibility (10% of total mark): the outputs of the notebook (figures and maps, primarily) used in the Executive Briefing are of a high quality in terms of clarity, colour, layout, fonts, labelling, etc.

·    Quality of Code (10% of total mark): a holistic view will be taken of the code in terms of its clarity, efficiency, and legibility.

Guidance for Notebook Submission

To simplify submission and replication of your work:

·    You should put the Reproducible Analysis first: this will enable us to run all of the analytical code need to generate any figures or tables in your Executive Briefing. How you produce the figures and tables later in the notebook is up to you (e.g. saving temporary data to a local file and then reloading the data later, keeping the cleaned-up data ready for display in a temporary data frame, etc.)

·    Both the Reproducible Analysis and Executive Briefing section should be level 1 headers (i.e. # Reproducible Analysis) so that they are easy to find in your notebook. They should be the only level 1 headers in your notebook. You should use level 2, 3, and 4 headers as-needed to format your notebook and signpost to readers the structure of your submission.

·    We will assess reproducibility by selecting "Restart Kernel and Run All" using the sds:2020 Docker environment. If you have made use of another Docker image (e.g. sds:2020b) you must clearly signpost this at the start of your notebook so that we know to select a different image. We will not install libraries 'by-hand' in an ad hoc manner order to test the reproducibility of your work.

·    It is also up to you to ensure that all relevant data sets are available via a valid URL: this could be a GitHub repo or a Dropbox link or some other resource. We may not be able to access resources placed on Chinese web servers to please bear this in mind. As an alternative,small zipped data sets of up to 10MB each (50MB total) may be submitted along with your notebook.

·    You should zip up your notebook (and any zipped data sets) prior to submission and then submit this as a Zipfile (so any manually submitted data will be zipped up inside the Zip file) so that it is not corrupted by Moodle.

·    If you have used an Anaconda Environment and not Docker/Vagrant, then you must also include a 'dump' of your conda environment to assist us in replicating your work. Failure to include a conda environment file and our inability to reproduce your analysis will impact your grade. You can export a full copy of your working environment using the command: conda env export -n < env>>environment.yml (you would replace '' with your active environment and then include this with the submitted Zip.

Models for the Executive Briefing

Although the following examples are all much longer than permitted under the assessment format, they are exemplary in their communication of the data and key findings in a manner that is clear, straightforward, and well-illustrated:

1.   Smith, D.A. (2010), Valuing housing and green spaces: Understanding local amenities, the built environment and house prices in London, GLA Economics; URL: https://www.london.gov.uk/sites/default/files/gla_migrate_files_destination/GLAE-wp-42.pdf.

2.   Travers, T. Sims, S. and Bosetti, N. (2016), Housing and Inequality in London, Centre for London; URL: https://www.centreforlondon.org/publication/housing-and-inequality-in-london/.

3.   Bivens, J. (2019), The economic costs and benefits of Airbnb, Economic Policy Institute; URL: https://www.epi.org/publication/the-economic-costs-and-benefits-of-airbnb-no-reason-for-local-policymakers-to-let-airbnb-bypass-tax-or-regulatory-obligations/.

4.   Wachsmuth, D., Chaney, D., Kerrigan, D. Shillolo, A. and Basalaev-Binder, R. (2018), The High Cost of Short-Term Rentals in New York City, Urban Politics and Governance research group, McGill University; URL: https://www.mcgill.ca/newsroom/files/newsroom/channels/attach/airbnb-report.pdf.

5.   Chapple, K. (2009), Mapping Susceptibility to Gentrification: The Early Warning Toolkit, Centre for Community Innovation; URL: https://communityinnovation.berkeley.edu/publications [A bit more 'academic' in tone but still intended to be very accessible to a lay-reader.]

Notice that they all follow a standard format that includes Key Findings, potentially some Recommendations, and 2 or more sections/chapters in which the evidence is developed. This format provides for more flexibility in style and presentation than a traditional essay (Introduction, Literature Review, Methodology, Results, Conclusion), though you will note that they all refer to a mix of academic and grey literature as well!

Some Possible Topics

These are indicative topics and you should feel free to strike out if some other aspect of the topic and data interest you:

·    Impact of Airbnb on local area rental markets — this would require some assumptions about listings and lettings based on available data but as long as these are clearly stated this would be a strong approach; there are good examples of models used in other cities that it may be possible to draw on, or adapt to, London. You may want to consider things like the type of listing and the issues around the Short- and Long-Term Rental markets.

·    Impact of Airbnb on London's Tourism Economy — this would look at the distribution of London's tourism venues and, possibly, hotels alongside Airbnb listings in order to evaluate the extent to which tourism 'dollars' might be spent in ways that positively impact less tourist-oriented areas if we assume (again, detail the assumptions) that some percentage of a tourist's dollars are spent locally in an area. Again, there may be models developed elsewhere that could be adapted for the London context.

·    Opportunities and Risks arising from Covid-19 — it should/may be possible to assess the impact of Covid-19 on London's short- and long-term rental markets by looking at entry to/exit from the Airbnb marketplace by comparing more than one snapshot of London data. Again, this will require some reasonable assumptions to be drawn (are all flats withdrawn from Airbnb going back on to the Long-Term Rental Sector?) but these can be documented and justified.

·    Opportunities for Place- or Listing-Branding — identifying key terms and features/amenities used to market listings by area and using these to identify opportunities for investment or branding. This would benefit from the use of NLP approaches and, potentially, word embeddings to identify distinctive patterns of word use as well as, potentially, One-Hot encoding to identify specific amenities that appear associated in some way with particular areas.

·    The Challenge of Ghost Hotels — evaluating ways to automatically identify ghost hotels from the InsideAirbnb data and then, potentially, assessing their extent and impact on local areas where they dominate either 'proper' hotel provision or other types of listings. You will need to consider the way that Airbnb randomly shuffles listings to prevent exactly this type of application and textual similarity via NLP is an obvious application.

·    The Professionalisation of Airbnb — this could be treated either as a regulatory challenge (is Airbnb not benefiting locals) or an investment opportunity (is this a way to 'scale' or develop new service offers for small hosts) depending on your interests. You will need to consider the different types of hosts and evaluate ways of distinguishing between them (e.g. number of listings, spatial extent, etc.).

·    Impact Profiles — a geodemographic classification of London neighbourhoods based on how they have, or have not, been impacted by Airbnb. This would require you to think about how to develop a classification/clustering of London neighbourhoods and use data to develop 'pen portraits' of each so that policy-makers could better-understand the range of environments in which Airbnb operates and why a 1-size-fits-all regulatory approach may be insufficient. Again, this could be argued from either standpoint or even both simultaneously: these areas are already so heavily impacted that regulation is too little, too late, while these other areas are 'at risk'.

You will also want to review the partial bibliography available here; this is by no means complete and you will likely find other relevant work 'out there' but you may find it useful for spurring your thinking on what to study and how to study it. You might also want to have a look at guidance for London:

·    KeyNest (2019), Understanding Airbnb regulations in London, KeyNest; URL: https://keynest.com/blog/airbnb-regulations-london

·    Airbnb (n.d.), I rent out my home in London. What short-term rental laws apply?, Airbnb; URL: https://www.airbnb.co.uk/help/article/1340/i-rent-out-my-home-in-london-what-shortterm-rental-laws-apply

·    Hostmaker (2018), Important Airbnb regulations and laws you should know about in London, Hostmaker; URL: https://hostmaker.com/blog/important-airbnb-regulations-country-laws-know-london/