Version: v1.0.0

ml.dataframes

`module` `ml.dataframes`

The dataframes module provides a lot of common operations for dataframe handling

`function` `shuffle`#

shuffle(df: DataFrame) → DataFrame

Shuffles the DataFrame and returns it

Args:

`df` (pd.DataFrame): The DataFrame that should have its records shuffled

Returns:

`pd.DataFrame`: The DataFrame that is shuffled

`function` `one_hot_encode`#

one_hot_encode(    df: DataFrame,    column_name: str,    drop_column: bool = True,    prefix: str = None) → DataFrame

Take a categorical column and pivots the DataFrame to add columns (0 or 1 value) for every category

Args:

`df` (pd.DataFrame): The DataFrame that contains the column to be encoded
`column_name` (str): The name of the column that contains the categorical values
`drop_column` (bool): Will remove the original column from the dataframe
`prefix` (str): The prefix of the new columns. By default the original column name will be taken

Returns:

`pd.DataFrame`: The DataFrame with the one hot encoded features

`function` `keep_numeric_features`#

keep_numeric_features(df: DataFrame) → DataFrame

Takes the DataFrame and removes all non-numeric columns or features

Args:

`df` (pd.DataFrame): The DataFrame that should have its non-numerics removed

Returns:

`pd.DataFrame`: The DataFrame with only the numeric features

`function` `plot_features`#

plot_features(    df: DataFrame,    column_names: <built-in function array> = None,    grid_shape=None,    fig_size=None)

Plots the distribution of the relevant columns of a DataFrame

Args:

`df` (pd.DataFrame): The DataFrame that should have its non-numerics removed
`column_names` (np.array): The columns that should be plotted. If None, all numeric columns will be taken
`grid_shape` (int, int): The shape of the plotting grid (rows, cols). If None, the grid will have maximum 5 columns
`fig_size` (int, int): The size of the full plotting grid. If None, auto size will be applied

Returns:

`figure, axes (tuple)`: The figure of the plot and the axes of the plot will be returned for further tuning where needed

`function` `to_timeseries`#

to_timeseries(df: DataFrame, time_column: str) → DataFrame

This is deprecated and it is advised to use the timeseries.set_timeseries function for this

`function` `distribute_class`#

distribute_class(    df: DataFrame,    class_column: str,    class_size: int = None,    shuffle_result: bool = True)

Makes sure a DataFrame is returned with an equal class distribution For every class a number of samples will be taken The class size is defined by the minimum of the passed class_size parameter and the smallest class in the Dataframe

Args:

`df` (pd.DataFrame): the DataFrame that contains all records
`class_column` (str): the name of the column that contains the class feature
`class_size` (int): the size of the class. defaults to the minimum available class size
`shuffle_result` (bool): indicates the DataFrame should be shuffled before returning. Default to True

Returns:

`pd.DataFrame`: the DataFrame that contains the records with the equal class distribution

This file was automatically generated via lazydocs.

function shuffle#

function one_hot_encode#

function keep_numeric_features#

function plot_features#

function to_timeseries#

function distribute_class#

`function` `shuffle`#

`function` `one_hot_encode`#

`function` `keep_numeric_features`#

`function` `plot_features`#

`function` `to_timeseries`#

`function` `distribute_class`#