Calculate Maximum Information Gain LeetCode Solution

Problem – Calculate Maximum Information Gain LeetCode Solution

Given a group of values, the entropy of the group is defined as the formula as following:

where P(x) is the probability of appearance for the value x.

e.g.

the input group:  [1, 1, 2, 2]

the probability of value 1 is:  2/4 = 1/2
the probability of value 2 is:  2/4 = 1/2

Therefore, its entropy can be obtained by:  - (1/2) * log2(1/2) - (1/2) * log2(1/2) = 1/2 + 1/2 = 1

This exercise, however, is aimed to calculate the maximum information gain that one can obtain by splitting a group into two subgroups. The information gain is the difference of entropy before and after the splitting.

For a group of L, we divide it into subgroups of {L1, L2}, then the information gain is calculated as following:

The overall entropy of the splitted subgroups {L1, L2} is the sum of entropy for each subgroup weighted by its proportion with regards to the original group.

Problem Description


In this exercise, one can expect a list of samples on Iris flowers. Each sample is represented with a tuple of two values: <petal_length, species>, where the first attribute is the measurement on the length of the petal for the sample, and the second attribute indicates the species of sample. Here is an example.

The task is to split the sample list into two sublists, while complying with the following two conditions:

  1. The petal_length of sample in one sublist is less or equal than that of any sample in the other sublist.
  2. The information gain on the species attribute of the sublists is maximal among all possible splits.

As output, one should just return the information gain.

In addition, one can expect that each value of petal_length is unique.  

In the above example, the optimal split would be L1 = [(0.5, ‘setosa’), (1.0, ‘setosa’)] and L2 = [(1.5, ‘versicolor’), (2.3, ‘versicolor’)]. According the above formulas, the maximum information gain for this example would be 1.0.

Note:  For certain languages (e.g. Java), there is no build-in type of tuple. As a reuslt, to make the input more general, we decompose the input into two lists: [petal_length] [species] respectively, where the petal_length would be of float value and the species is of string. The elements in the petal_length list and species list are associated to each other one by one by order.

Calculate Maximum Information Gain LeetCode Solution in Python

class Solution:
    def calculateMaxInfoGain(self, petal_length: List[float], species: List[str]) -> float:
        if not petal_length or not species:
            return 0
        n = len(petal_length)
        Hs = math.inf
        mp = sorted(zip(petal_length, species))
        petal_length, species = zip(*mp)
        for i in range(1, n):
            H1 = self.calculateEntropy(species[:i])
            H2 = self.calculateEntropy(species[i:])
            Hs = min(Hs, H1*i/n + H2*(n-i)/n)   
        Hl = self.calculateEntropy(species)
        return Hl-Hs
            
    def calculateEntropy(self, species: List[str]) -> float:
        size = len(species)
        probs = collections.defaultdict(int)
        for sp in species:
            probs[sp] += 1/size
        if len(probs.values())<=1:
            return 0
        return -sum([p*log2(p) for p in probs.values()])

Calculate Maximum Information Gain LeetCode Solution in C++

class Solution {
public:
    double calEntropy(vector<string>& species)
    {
        unordered_map<string, int> shown;
        double total = species.size();
        for(const string& specie:species)
        {
            ++shown[specie];
        }
        
        double entropy = 0.0;
        for(auto& n:shown)
        {
            double p = n.second / total;
            entropy -= p*log(p)/log(2.0);
        }
        return entropy;
    }
    
    
    double calculateMaxInfoGain(vector<double>& petal_length, vector<string>& species) {
        // Avoid edge case
        if(petal_length.empty() || species.empty()) return 0.0;
        
        int s = petal_length.size();
        double all_entropy = calEntropy(species);
        double information_gain = 0.0;
        
        // Iterate all possible splits
        for(int i = 0; i < s; ++i)
        {
            // Split list
            double split = petal_length[i];
            vector<string> l1;
            vector<string> l2;
            for(int i = 0; i < s; ++i)
            {
                if(petal_length[i] < split)
                {
                    l1.push_back(species[i]);
                }
                else
                {
                    l2.push_back(species[i]);
                }
            }
            
            // Calculate information gain
            double l1_entropy = calEntropy(l1);
            double l2_entropy = calEntropy(l2);
            double p1 = (double)l1.size()/s;
            double p2 = (double)l2.size()/s;
            information_gain = max(information_gain, all_entropy - l1_entropy*p1 - l2_entropy*p2);
        }
    
        return information_gain;
    }
};
Calculate Maximum Information Gain LeetCode Solution Review:

In our experience, we suggest you solve this Calculate Maximum Information Gain LeetCode Solution and gain some new skills from Professionals completely free and we assure you will be worth it.

If you are stuck anywhere between any coding problem, just visit Queslers to get the Calculate Maximum Information Gain LeetCode Solution

Find on LeetCode

Conclusion:

I hope this Calculate Maximum Information Gain LeetCode Solution would be useful for you to learn something new from this problem. If it helped you then don’t forget to bookmark our site for more Coding Solutions.

This Problem is intended for audiences of all experiences who are interested in learning about Data Science in a business context; there are no prerequisites.

Keep Learning!

More Coding Solutions >>

LeetCode Solutions

Hacker Rank Solutions

CodeChef Solutions

Leave a Reply

Your email address will not be published.