helpers

`accuracy_score(y_true, y_pred, normalize=True)` ⚓︎

Calculate the accuracy score from a given array of true labels and a given array of predicted labels.

Inspired by https://stackoverflow.com/a/64680660

Parameters:

Name	Type	Description	Default
`y_true`	`2d ndarray`	array of shape (n_samples, 1) of true labels	required
`y_pred`	`2d ndarray`	array of shape (n_samples, 1) of predicted labels	required

Returns:

Name	Type	Description
`accuracy_scores`	`float`	calculated accuracy score

Raises:

Type	Description
`ValueError`	if y_true and y_pred are not of the same shape

Source code in mlproject/helpers/_metrics.py

def accuracy_score(y_true, y_pred, normalize=True):
    """Calculate the accuracy score from a given array of true labels
    and a given array of predicted labels.

    Inspired by [https://stackoverflow.com/a/64680660](https://stackoverflow.com/a/64680660)

    Parameters
    ----------
    y_true : 2d ndarray
        array of shape (n_samples, 1) of true labels
    y_pred : 2d ndarray
        array of shape (n_samples, 1) of predicted labels

    Returns
    -------
    accuracy_scores : float
        calculated accuracy score

    Raises
    ------
    ValueError
        if y_true and y_pred are not of the same shape
    """

    if y_true.shape[0] != y_pred.shape[0] and y_true.shape[1] != y_pred.shape[1]:
        raise ValueError(
            f"Length of y_true: ({len(y_true)}) and y_pred: ({len(y_pred)}) should be the same!"
        )

    accuracy = []
    for i in range(len(y_pred)):
        if y_pred[i] == y_true[i]:
            accuracy.append(1)
        else:
            accuracy.append(0)
    if normalize == True:
        return np.mean(accuracy)
    if normalize == False:
        return sum(accuracy)

`data_loader(raw=True, scaled=False, pca=False)` ⚓︎

Loads the fashion_mnist training and test data from the data directory.

The function returns four numpy arrays containing the training and test data respectively.

If specified it can also return the standard scaled version of the data or the first 10 principal components of the data.

The different dimensions of the returned data is below:

	Raw	Scaled	PCA
Training
\(X\)	\((10.000 \times 784)\)	\((10.000 \times 784)\)	\((10.000 \times 10)\)
\(Y\)	\((10.000 \times 1)\)	\((10.000 \times 1)\)	\((10.000 \times 1)\)
Test
\(X\)	\((5.000 \times 784)\)	\((5.000 \times 784)\)	\((5.000 \times 10)\)
\(Y\)	\((5.000 \times 1)\)	\((5.000 \times 1)\)	\((5.000 \times 1)\)

Returns:

Type	Description
`2d ndarrays`	numpy data arrays in the order X_train, X_test, y_train, y_test.

Source code in mlproject/helpers/_data_loader.py

def data_loader(raw=True, scaled=False, pca=False):
    r"""Loads the fashion_mnist training and test data from the data directory.

    The function returns four numpy arrays containing the training and test data
    respectively.

    If specified it can also return the standard scaled version of the data or
    the first 10 principal components of the data.

    The different dimensions of the returned data is below:

    |              |          Raw          |         Scaled        |          PCA         |
    |:------------:|:---------------------:|:---------------------:|:--------------------:|
    | **Training** |                       |                       |                      |
    |      $X$     | $(10.000 \times 784)$ | $(10.000 \times 784)$ | $(10.000 \times 10)$ |
    |      $Y$     |  $(10.000 \times 1)$  |  $(10.000 \times 1)$  |  $(10.000 \times 1)$ |
    |   **Test**   |                       |                       |                      |
    |      $X$     |  $(5.000 \times 784)$ |  $(5.000 \times 784)$ |  $(5.000 \times 10)$ |
    |      $Y$     |   $(5.000 \times 1)$  |   $(5.000 \times 1)$  |  $(5.000 \times 1)$  |

    Returns
    -------
    2d ndarrays
        numpy data arrays in the order X_train, X_test, y_train, y_test.
    """
    if raw and not scaled and not pca:
        X_train, y_train = np.hsplit(
            np.load(f"{ROOT_DIR}/data/fashion_train.npy"), [-1]
        )
        X_test, y_test = np.hsplit(np.load(f"{ROOT_DIR}/data/fashion_test.npy"), [-1])

    elif scaled and not raw and not pca:
        X_train, y_train = np.hsplit(
            np.load(f"{ROOT_DIR}/data/fashion_train_scaled.npy"), [-1]
        )
        X_test, y_test = np.hsplit(
            np.load(f"{ROOT_DIR}/data/fashion_test_scaled.npy"), [-1]
        )
        # converting the y_labels back to integers from floats to avoid issues
        y_train, y_test = y_train.astype(int), y_test.astype(int)

    elif pca and not raw and not scaled:
        X_train, y_train = np.hsplit(
            np.load(f"{ROOT_DIR}/data/fashion_train_pca.npy"), [-1]
        )
        X_test, y_test = np.hsplit(
            np.load(f"{ROOT_DIR}/data/fashion_test_pca.npy"), [-1]
        )
        # converting the y_labels back to integers from floats to avoid issues
        y_train, y_test = y_train.astype(int), y_test.astype(int)
    else:
        raise ValueError("If raw, scaled or pca is True, then all other arguments must be False.")

    return X_train, X_test, y_train, y_test

helpers

accuracy_score(y_true, y_pred, normalize=True) ⚓︎

data_loader(raw=True, scaled=False, pca=False) ⚓︎

`accuracy_score(y_true, y_pred, normalize=True)` ⚓︎

`data_loader(raw=True, scaled=False, pca=False)` ⚓︎