Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ST207 - Databases: Project (MT2022)

Overall objective

This project is intended to i) assess your overall knowledge related to Databases, specifically the concepts and techniques discussed during our lectures and seminars, ii) give you the opportunity to design an entire database application on a topic/scenario of your choice, and iii) allow you to work as a data science team.

The overall objective is to design a database application with focus on non-standard data, such as multimedia, audio, video, spatial, streaming, multimodal or NoSQL data.

Instructions

1. GROUPS: This is a workgroup project. As such, the group is expected to design a solid

solution for a particular application or scenario. Everyone in the group is expected to engage and contribute to the final solution. Forming groups: if you need to find someone else to form a group with you, check our Excel sheet and contact your prospective teammates.

2. TOPIC: Choose a topic or scenario for which you want to design a database application based on non-standard data (i.e., multimedia, audio, video, spatial, streaming, multimodal or NoSQL data). See the References section for some ideas. Although this is not a classical relational database, make sure your topic/scenario has a good number of entities (objects from the real world) and relationships that you can map into your database application. The main point for you is to assess how easy/hard is to design a  database application for the chosen topic/scenario given i) your knowledge about the context, ii) the set of queries and other operations you can perform over the proposed database, and iii) available data (see item 3).

3. DATA: Make sure you can identify a consistent set of real data to use in your application. You can also generate synthetic data in case you don't find real data. Also, you can use any existing dataset(s) and import these data into your database application. Make sure you have a clear understanding of the data and that the data is

of good quality/completeness. No need for playing with big/huge data, but make sure you have a good amount of data for each entity/object that allows for relevant queries and update operations.

4. DATA MODELLING: Make sure to clearly describe all entities and relationships from the  chosen topic/scenario. Also, any constraints and business rules that are relevant to your application. For instance, if you decide to design a multimedia database, make sure to   describe all data items (such as audio, video or other files) and their related attributes,   how these data items relate to each other (for instance, if a given audio is linked to a video, how this link is expressed in the database) and all constraints and other aspects   of your database application. If feasible, you can design an Entity-Relationship (ER) model capturing all the entities and relationships, primary and foreign keys, single and  multivalued attributes, weak entities, partial and total relationships, and any other important aspect for understanding the context of your database application. You can choose any data modelling tool. If you decide for a NoSQL database, make sure you provide a conceptual description of the topic, the entities/objects and corresponding   structure/attributes, any relationships among the objects, and any other specific aspects you are considering in your model. This conceptual description can be a diagram or text clearly presenting the required information.

5. DATABASE CREATION/DESIGN: Make sure to describe how the entities are created in  the chosen database application, and how the data is populated into the database. You should demonstrate any commands used for defining entities/objects and their attributes, as well as to load data into those entities/objects. If feasible, you can create  any indexes, triggers and views as needed by the application rules of your topic/scenario. In case of a NoSQL database, specify all the creation commands needed for mapping your conceptual model into a set of objects (e.g. documents, graphs etc).   This step should cover all the database definition tasks.

6. DATABASE USAGE: Based on the chosen topic/scenario, specify a set of queries and update operations. Make sure to cover all main operation (use cases) of the given database, such as i) retrieving general information from an entity/object, ii) filtering entities/objects by specific filters/conditions, iii) aggregating/summarising data based  on aggregation operators, iv) joining entities/objects to address more complex queries,

v) any update or delete operations. Also, make sure to consider whether you can use aggregation operations, subqueries, and any other type of more elaborated queries. You must provide a consistent textual description of each query in terms of i) what the query or update operation is about, ii) input parameters and conditions for filtering/matching data, and iii) expected outputs. Only the SQL/NoSQL code is not sufficient; you must provide the textual description explaining all the database operations.

7. DATABASE TECHNOLOGY: Feel free to play with any database software/tool and/or programming library. Make sure you justify your choice based on the chosen topic/scenario, proposed database operations, and available data. Make sure to provide all code for your database application along with instructions for reproducibility.

Deliverables

Your solution MUST contain:

a PDF document with i) LSE candidate numbers (LSE student IDs are fine in case you

don't have your candidate numbers, but don't put your names), ii) description of the chosen topic/scenario (based on item 2 above), iii) description of your data (based on item 3 above), iv) justification of the database technology/tool (item 7) with necessary instructions for reproducibility (for instance, whether an account is necessary, download/installation of software, and other configuration steps), v) the outputs of your data modelling step (any ER/EER diagram or conceptual description of your NoSQL model), and vi) textual description of all the operations (queries and updates) with the corresponding outputs for each command.

any code file designed in your project (i.e., SQL commands, scripts etc).

any dataset or synthetic data used in your project.

make sure to upload all your solution files and data to the provided GitHub repository. In case of storage/space limitations, please provide links to alternative repositories where the data can be accessed.

Important dates

Assignment released: 09/12/2022

Submission of group/topic information (via Excel sheet) and project approval:

16/12/2022, 12 pm (London) firm deadline

Submission of solution: 09/01/2023, 5 pm (London) firm deadline

Feedback and grade (provisional): 27/02/2023, 10 pm (London)

Marking criteria

This assignment is worth 60% of the final grade.

IMPORTANT: according to the School policy, you must submit an answer to this

assignment; otherwise, you will be graded 0 (zero).

Problem breakdown

Max marks

(2) Topic/scenario - relevance and complexity of the topic/scenario in terms of entities/objects, relationships, and usage operations. Clear description of the  topic/scenario.

15

(3) Data - data consistency and quality. Usage of real data and any criteria for subsampling. Generation of synthetic data and how it mimics a real scenario. Good description of the data. Amount of data used versus database usage    operations.

10

(4) Database modelling - model clarity and consistency (how close to the real scenario it is). Complete description of all entities/objects, relationships, keys and constraints.

10

(5) Database creation - Complete and correct set of commands for materialising the database model into a set of relations, documents, or nodes, and associated relationships. Use of indexes, views, and triggers according to the application rules (chosen topic/scenario).

15

(6) Database usage - relevant and consistent set of queries and update operations. Rationale behind each database operation and to what extent the provided query/update commands has explored the available data. Usage of aggregation, subquery, join and other complex query/update structure. If available in the database, good exploration of indexes, views, and triggers. Clear documentation of all outputs.

30

(7) Database technology - justification, adherence, and technical complexity involved in its use. Clear instructions for reproducibility purposes.

10

Documentation - quality of the PDF report, code organisation and documentation.

10

TOTAL

100

Feedback and grade

To be provided after your submission.

References

Wikipedia: Database application - some example databases.

10 Database Examples in Real Life

Database Applications Types and Examples

15 Database Software Examples 2022

Oracle: Application Express App Builder User's Guide

Medium: 10 Best Database Design Practices

Learn Com