Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

In this project, you are asked to design and implement a sample database system. Here are general requirements.

.     The system should support a data model, which can be relation (as in MySQL) or

JSON (as in Firebase and MongoDB), or any other model of your choice. Users of the system will structure their data using the model provided by the system.

.     The system should have its own query language which should be different from

existing query languages, including SQL, and queries provided by Firebase and

MongoDB. It is ok that the language is like natural language, e.g., “find employees who are at least 25 years old” .

.     The query language should support projection (selecting a subset of rows), filtering (selecting rows), join (e.g., combining multiple data tables), grouping, aggregation,  and ordering.

.     The system should also provide commands for inserting, deleting, and updating the data. These commands can be like that in existing database systems.

.     You are free to decide how you store the data in a data model (e.g., you may store a table in a file), and how you implement the data modification commands above.

.     Your system should not load the entire database into the memory and process

queries and data modifications on the entire database. Instead, you should assume    that the database may potentially handle a large amount of data that might not fit in the main memory.

.     You should implement an interactive command line interface (similar to MySQL,

MongoDB, sftp clients, etc.) for users to interact with the systems, issue commands, and get results. As an example, suppose your database is called MyDB, your interface may look like:

o MyDB > create table person(a int);

o MyDB>Tablecreated.

o MyDB > insert into ...

o MyDB > find employees who are at least 25 years old o MyDB>...

o MyDB > exit

.     You should show how to create a database using your system to store multiple real- world data

sets, and how the queries and data modifications work on the data sets.

.     Your dataset should be some existing real-world dataset available on the Web. For example,

Kaggle, google, etc. are good places to find such datasets 。  o

https://datasetsearch.research.google.com/

o https://www.kaggle.com/

you also need to design two database systems, 1. one for relational, 2. the other for NoSQL data; and use different datasets. In addition, 3. you are expected to build a web application that demonstrates the functions of your database systems.

There is no restriction on programming languages that can be used for the project.

Notice:

.    Dataset Size:

"Your system should not load the entire database into the memory and process queries and data modifications on the entire database.

Instead, you should assume that the database may potentially handle a large amount of data that might not fit in the main memory "

.     Your dataset should demonstrate the above guidline. Students

are free to choose the dataset dimensions from the real world

datasets. You should assume ideal size of your memory for your  project and the size of the dataset should be very big compared to memory. Assume a student chooses size of the memory used for execution of the queries is 4MB then datset should not be less  than 20MB (Please note that this size is just mentioned as an

example and you may have different size for your data. (Even

when you are trying to implement query like "select * from

All_Students", you should ensure that you don't load the entire data into memory).

.    Usage of Pandas:  Pandas library can be used for this project but the scope is limited. All the queries executions should be performed in the limited memory without loading the entire dataset. Inbuilt

operations of pandas cannot be used like  "join"  cannot be used to perform the operation. Joining algorithm, mapreduce techniques will be further taught in the course which will give idea on how to

implement join methods.

.    Storage System:  Students are free to design their storage

system. A dataset can be stored as a single file or can be split into multiple files. Processing should be done in chunks.

.    Query Language:  The language should be different from existing

query language. Design a query language like human conversations with some keywords and map them to process the query using programming

languages.

.    Web Application:  Should provide User interface for your project. Should be able to fetch results using buttons. For example, displaying the entire table in the UI and providing UI buttons to add data, delete data or modify data. These buttons in the frontend should create an API call to your database backend. Given example is  one of the examples of UI design, students are free to showcase their creativity in designing the UI.

1. Usage of inbuilt libraries is very limited. Inbuilt libraries

should not be used for performing operations like join, sort etc. for Example, pandas has .join() and .merge() which  should not  be used  for performing those operations.

2. Logic for those operations needs to be implemented and should be able to demonstrate in code when asked.

3. Again pointing to the same point, key challenge of this project is to address the issue of memory management. You should assume you  have a limited memory to perform operations. If your datasize is 5MB and your memory assumption is 1MB, then your dataset should be

divided into 5 different chunks and then perform operations.

4. Best approach for this project is to store data as chunks rather  than reading as chunks using builtin methods like read_csv(n_chunks  = ...). So that you have control over data processing as chunks. You can also store intermediate results on disk and construct final

results from intermediate results again by reading them from the disk.

Some of the teams are asking whether they are allowed to used some  libraries, we cannot completely assists them if they do not provide how far they are using that library in the implementation. So it is

the ultimate responsibility of students to make sure all the

requirements above and in the guidelines are satisfied even after using the builtin libraries.