Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Calculate Huffman Coding Length

In this problem, you are required to solve a problem related to huffman coding.

Recall: Huffman coding is a lossless data compression algorithm that assigns variable-length codes to characters based on their frequency of occurrence in the input. An efficient algorithm for generating it is related to the priority queue and covered in course slides.

The files attached to this problem are shown as follows. We provide a test to help you check your code.

1 .

2 └── attachments/

3    ├── tests/

4    │ └── test_huffman.cpp

5    └── huffman_calculator.hpp

NOTE:

1. You should NEVER use std::priority_queue , std::map , std::set , std::multimap , std::multiset or call heap related functions like std::make_heap() .

2. In the question, you should write a more optimal algorithm that brute-force Huffman tree building to pass all tests.

Requirements:

Your task is to complete the function size_t get_huffman_length(const std::vector> &data) in huffman_calculator.hpp .

Description

Assume that we have a sequence:

1   i a u e i y a a e a o o a a i e a

Count the occurrence of each character:

character             occurrence

a                         7

e                         3

i                          3

o                         2

u                         1

y                          1

Then we generate the huffman tree (a possible solution):

Hence, we can shorten the sequence into the following format (ignore the space of storing the huffman tree):

1      110010011111101000001110101101001101110

With Huffman Coding, the sequence can be shortened to length 39 (binary).

Parameter data

In this question, we help you to ignore the process of dealing the bunch of text .

Instead, we directly give you some pairs.

The first number (of type size_t , named occurrence) indicate a possible number of occurrence of the character.

Occurrence will always be a positive number.

The second number (of type size_t , named amount) indicate the number of characters satisfies the number of occurrences given by the first number.

Amount will always be a positive number.

In the example above, we have characters with occurrences (7, 3, 3, 2, 1, 1) .

So the data we pass to you in this example will be: [(7, 1), (3, 2), (2, 1), (1, 2)]

Return value

You need to return the length of the Huffman encoded sequence.

In this example, the return value is 39 .

Misc.

For your convenience, all the pairs we provide are guaranteed to have different occurrences.

We also guarantee that the sum of amount is larger than 1.

When iterating through the std::map , the order is NOT guaranteed to be sorted by any means.

Hint

The pair (10, 10) is somehow "equivalent" to (20, 5) .

You can calculate the return value in the process of building the huffman tree.