Homework 4
Due Date: 12/01/2019, 11:59PM

100 pts

Read the instructions carefully before starting the assignment. Make sure your code follows the stated guidelines to ensure full credit for your work.

Instructions:

- The work in this assignment must be completed alone and must be your own.

- Download the starter code file from the HW4 Assignment on Canvas. Do not change the function names or given started code on your script. Also, please do not change the string formats given! These will be useful for displaying the content in a convenient and consistent way.*

- You are responsible for debugging and testing your code with enough data, you can share testing code on Piazza.

- A doctest is provided as an example of code functionality. You are responsible for debugging and testing your code with enough data.

Goal:

In the Module 5 and 6 video lectures, we discussed the linked list and hash table data structure. This homework assignment will combine both concepts to create a (heavily) modified cache class. In a computer, a cache stores data so that any future accesses of that data can be completed very quickly. A practical example that you’ve probably noticed is your browser cache. This stores a lot of frequently used, useful information in your browser so it’s ready to access whenever you need it (frequent searches, passwords, usernames, etc.). If you’ve ever cleared your cache, you’d notice that all of this information is gone and needs to be repopulated). The implementation of the cache for this project will be very simplified, and ignore some of the features of a cache you might find online – so make sure to read this assignment document carefully.

Before jumping into some of the specifics, it’s important to understand how a cache is organized. Most caches on a computer are split into different levels. For this assignment, we will have three levels of caches, L1, L2, and L3. Usually, in a cache, if the computer is looking for data and it is unable to find it in one level, it’ll check another level to find it. A computer often keeps the most frequently used information in the L1 cache because it’s the fastest and first level it will check. We won’t be implementing this aspect of a cache for the project, but it’s useful information to understand how a cache works in general. The hierarchy of caches as different levels (L1, L2, L3) will be implemented as hash table in this assignment. In this implementation, you will create content and put it into either the L1, L2, or L3 cache based on one of the content’s attributes (more on this later).

At each level of the cache, you’ll have to actually implement a data structure to store the data. For this assignment, the cache at each level of the hierarchy will be implemented as a linked list.

Given this information, your cache can be visualized like this:

Cache Hash Table
Level
Cache Linked list
L1
Content -> content -> content -> …
L2
Content -> content -> content -> …
L3
Content -> content -> content -> …
Each level’s cache will have a size of 200. It cannot exceed this size. This is initialized for you in the code.
The content object has been created for you as well, and it stores the following information:
    • content_id
    • size
    • header
    • content
This is an example content object, formatted using the __str__ function given in the code: 
CONTENT ID: 1006
SIZE: 170
HEADER: Content-Type: 2
CONTENT:
In summary, you will take content objects and insert them into the appropriate cache (L1, L2, or L3). This directory of caches will be implemented using a hash table. To insert, evict, and find content within a cache you will need to implement it as a linked list. The next section will cover specifics about the implementations of the hash table and linked list.

The Cache Directory (Hash Table):

The class Cache() is the data structure you will use to store the three other caches (L1, L2, L3). It stores an array of 3 CacheLists, which are the linked list implementations of the cache (which will be explained later). Each CacheList has been already initialized to a size of 200. Do not change this. There are 3 functions you must implement yourself:

  • def hashFunc(self, contentHeader)
    • This function determines which cache level your content should go in. Create a hash function that sums the ASCII values of each content header, takes the modulus of that sum with the size of the Cache hash table, and accordingly assigns an index for that content – corresponding to the L1, L2, or L3 cache. So, for example, let’s assume the header “Content-Type: 2” sums to an ASCII value of 1334. 1334 % 3 = 2. So, that content would go in the L2 cache. You should notice a very obvious pattern in which content headers correspond to which caches. The hash function should be used by your insert and retrieveContent functions.
  • def insert(self, content, evictionPolicy)
    • Once a content object is created, call the cache directory’s insert function to insert it into the proper cache. This function should call the linked list’s implementation of the insert function to actually insert into the cache itself. The eviction policy will be a string – either ‘lru’ or ‘mru’. These eviction policies will be explained later.
    • This function should return a message including the attributes of the content object inserted into the cache. This is shown in the doctests.
  • def retrieveContent(self, content)
    • This function should take a content object, determine which level it is in, and then return the object if it is in the cache at that level. If not, it should return a message indicating that it was not found. This is known as a “cache miss”. Accordingly, finding content in a cache is known as a “cache hit”. The doctests show the format of the return statements.

The Cache Lists (Linked Lists):

As explained earlier, at each level within the cache directory, the actual cache is stored as a linked list object. You will use the CacheList() class to implement this linked list. Each cache has a maximum size of 200 (Please note that this does NOT mean there can be a maximum of 200 items in the cache. It means that the sum of the sizes of all content objects in a cache cannot exceed 200). Do not change this value. There are 5 functions you must implement yourself.

  • def put(self, content, evictionPolicy)
    • This function puts content into your cache. As explained earlier, caches should make information readily available to a user. Accordingly, you should always insert new content at the front of your list so it is easy to obtain.
    • You should be checking to make sure that your content is not bigger than the maximum size, in that case it should not be inserted.
    • Accordingly, you should also make sure there is enough space remaining in the cache to insert your content. If there is not enough space, you need to evict content currently in the cache. You must keep removing content until there is enough space to insert the new content (so long as the new content is not bigger than the maximum size). There are two ways to do this: LRU or MRU.
      • LRU: Least Recently Used
        • Remove the content that was last added to the list
      • MRU: Most Recently Used
        • Remove the content that was most recently added to the list
    • These eviction policies will be specified when the hashtable’s insert function is called. It will be passed as a string parameter ‘lru’ or ‘mru’.
  • def lruEvict(self)/def mruEvict(self)
    • As explained previously you will have to implement these specialized removal functions yourself to remove content from the list based on when it was used (in this case added).
    • This does not need to return a value.
  • def find(self)
    • This function should be able to find and return a content object from a cache. If it
      is not found, it should return None.
      is not found, it should return None.
  • def clear(self)
    • This function should remove every single item in a cache, and reset the cache’s size and number of items. When all items are removed, it should return a message indicating so. This is shown in the doctests.
As a general note, please keep all __str__ and __repr__ functions intact. They have been written to visualize the content in a consistent and easy to read way.
Doctests (please note that the does not need to show up in your code, it should just be a blank line)

>>> cache = Cache()
>>> content1 = ContentItem(1000, 10, "Content-Type: 0", "0xA")
>>> content2 = ContentItem(1003, 13, "Content-Type: 0", "0xD")
>>> content3 = ContentItem(1008, 242, "Content-Type: 0", "0xF2")
>>> content4 = ContentItem(1004, 50, "Content-Type: 1", "110010")
>>> content5 = ContentItem(1001, 51, "Content-Type: 1", "110011")
>>> content6 = ContentItem(1007, 155, "Content-Type: 1", "10011011")
>>> content7 = ContentItem(1005, 18, "Content-Type: 2", "

'CMPSC132'

")
>>> content8 = ContentItem(1002, 14, "Content-Type: 2", "

'PSU'

")
>>> content9 = ContentItem(1006, 170, "Content-Type: 2", "")
>>> cache.insert(content1, 'lru')
'INSERTED: CONTENT ID: 1000 SIZE: 10 HEADER: Content-Type: 0 CONTENT: 0xA'
>>> cache.insert(content2, 'lru')
'INSERTED: CONTENT ID: 1003 SIZE: 13 HEADER: Content-Type: 0 CONTENT: 0xD'
>>> cache.insert(content3, 'lru')
'Insertion not allowed. Content size is too large.'
>>> cache.insert(content4, 'lru')
'INSERTED: CONTENT ID: 1004 SIZE: 50 HEADER: Content-Type: 1 CONTENT: 110010'
>>> cache.insert(content5, 'lru')
'INSERTED: CONTENT ID: 1001 SIZE: 51 HEADER: Content-Type: 1 CONTENT: 110011'
>>> cache.insert(content6, 'lru')
'INSERTED: CONTENT ID: 1007 SIZE: 155 HEADER: Content-Type: 1 CONTENT: 10011011'
>>> cache.insert(content7, 'lru')
"INSERTED: CONTENT ID: 1005 SIZE: 18 HEADER: Content-Type: 2 CONTENT:

'CMPSC132'

"
>>> cache.insert(content8, 'lru')
"INSERTED: CONTENT ID: 1002 SIZE: 14 HEADER: Content-Type: 2 CONTENT:

'PSU'

"
>>> cache.insert(content9, 'lru')
"INSERTED: CONTENT ID: 1006 SIZE: 170 HEADER: Content-Type: 2 CONTENT:"
>>> cache
L1 CACHE:
REMAINING SPACE:177
ITEMS:2
LIST:
[CONTENT ID: 1003 SIZE: 13 HEADER: Content-Type: 0 CONTENT: 0xD]
[CONTENT ID: 1000 SIZE: 10 HEADER: Content-Type: 0 CONTENT: 0xA]
L2 CACHE:
REMAINING SPACE:45
ITEMS:1
LIST:
[CONTENT ID: 1007 SIZE: 155 HEADER: Content-Type: 1 CONTENT: 10011011]
L3 CACHE:
REMAINING SPACE:16
ITEMS:2
LIST:
[CONTENT ID: 1006 SIZE: 170 HEADER: Content-Type: 2 CONTENT:]
[CONTENT ID: 1002 SIZE: 14 HEADER: Content-Type: 2 CONTENT:

'PSU'

]
>>> cache.hierarchy[0].clear()
'Cleared cache!'
>>> cache.hierarchy[1].clear()
'Cleared cache!'
>>> cache.hierarchy[2].clear()
'Cleared cache!'
>>> cache
L1 CACHE:
REMAINING SPACE:200
ITEMS:0
LIST:
L2 CACHE:
REMAINING SPACE:200
ITEMS:0
LIST:
L3 CACHE:
REMAINING SPACE:200
ITEMS:0
LIST:
>>> cache.insert(content1, 'mru')
'INSERTED: CONTENT ID: 1000 SIZE: 10 HEADER: Content-Type: 0 CONTENT: 0xA'
>>> cache.insert(content2, 'mru')
'INSERTED: CONTENT ID: 1003 SIZE: 13 HEADER: Content-Type: 0 CONTENT: 0xD'
>>> cache.retrieveContent(content1)
CONTENT ID: 1000 SIZE: 10 HEADER: Content-Type: 0 CONTENT: 0xA
>>> cache.retrieveContent(content2)
CONTENT ID: 1003 SIZE: 13 HEADER: Content-Type: 0 CONTENT: 0xD
>>> cache.retrieveContent(content3)
'Cache miss!'
>>> cache.insert(content5, 'mru')
'INSERTED: CONTENT ID: 1001 SIZE: 51 HEADER: Content-Type: 1 CONTENT: 110011'
>>> cache.insert(content6, 'mru')
'INSERTED: CONTENT ID: 1007 SIZE: 155 HEADER: Content-Type: 1 CONTENT: 10011011'
>>> cache.insert(content4, 'mru')
'INSERTED: CONTENT ID: 1004 SIZE: 50 HEADER: Content-Type: 1 CONTENT: 110010'
>>> cache.insert(content7, 'mru')

"INSERTED: CONTENT ID: 1005 SIZE: 18 HEADER: Content-Type: 2 CONTENT:

'CMPSC132'

"
>>> cache.insert(content8, 'mru')
"INSERTED: CONTENT ID: 1002 SIZE: 14 HEADER: Content-Type: 2 CONTENT:

'PSU'

"
>>> cache.insert(content9, 'mru')
"INSERTED: CONTENT ID: 1006 SIZE: 170 HEADER: Content-Type: 2 CONTENT:"
>>> cache
L1 CACHE:
REMAINING SPACE:177
ITEMS:2
LIST:
[CONTENT ID: 1003 SIZE: 13 HEADER: Content-Type: 0 CONTENT: 0xD]
[CONTENT ID: 1000 SIZE: 10 HEADER: Content-Type: 0 CONTENT: 0xA]
L2 CACHE:
REMAINING SPACE:45
ITEMS:1
LIST:
[CONTENT ID: 1007 SIZE: 155 HEADER: Content-Type: 1 CONTENT: 10011011]
L3 CACHE:
REMAINING SPACE:12
ITEMS:2
LIST:
[CONTENT ID: 1006 SIZE: 170 HEADER: Content-Type: 2 CONTENT:]
[CONTENT ID: 1005 SIZE: 18 HEADER: Content-Type: 2 CONTENT:

'CMPSC132'

]

Deliverables

  • Submit to HW4 Gradescope assignment before the due date.