Skip to content

_impurity

entropy_impurity(y) ⚓︎

Calculates the entropy of a given node

Parameters:

Name Type Description Default
y 2d ndarray

array of y labels

required

Returns:

Type Description
float

entropy impurity of a given node

Source code in mlproject/decision_tree/_impurity.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def entropy_impurity(y):
    """Calculates the entropy of a given node

    Parameters
    ----------
    y : 2d ndarray
        array of y labels

    Returns
    -------
    float
        entropy impurity of a given node
    """
    epsilon = 1e-07
    # flatten the array only because np.bincount expects a 1 dimensional array
    y = y.flatten()
    counts = np.bincount(y)
    N = np.sum(counts)
    p = counts / N
    return np.sum(-p * np.log2(p + epsilon))

gini_impurity(y) ⚓︎

Calculates the gini impurity of a given node

Parameters:

Name Type Description Default
y 2d ndarray

array of y labels

required

Returns:

Type Description
float

gini impurity score for the node

Source code in mlproject/decision_tree/_impurity.py
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
def gini_impurity(y):
    """Calculates the gini impurity of a given node

    Parameters
    ----------
    y : 2d ndarray
        array of y labels

    Returns
    -------
    float
        gini impurity score for the node
    """
    # flatten the array only because np.bincount expects a 1 dimensional array
    y = y.flatten()
    counts = np.bincount(y)
    N = np.sum(counts)
    p = counts / N
    return 1 - np.sum(p**2)