Feature Modules

General Preprocessing

preprocess_trends

features.preprocessing.preprocess_trends(trends, points, device, feature_cfg)

Takes in trend slopes and durations and corresponding time series data and returns training, validation, and test sets from samples from the data

Parameters:

trends (tensor) – Data to extract trends from
points (tensor) – Tensor of data points
device (object) – Device to store data on
feature_cfg (dict) – Hyperparameters

Returns:

SplitData object containing the train, valdation, and test splits of trend data

train_valid_test_split

features.preprocessing.train_valid_test_split(X, y=None, props=None, device=None)

Separates data into train, validation, and test sets

Parameters:

X (tensor) – Feature data
y (tensor) – Target data
props (list[int]) – Train/Val split proportions (Test split is inferred from these numbers)
device (object) – Device to store data on

Returns:

SplitData object containing train/val/test splits of the data

convert_data_points

features.preprocessing.convert_data_points(data)

Converts dataframe of trend data to tensors and pads time series data

Parameters:: data (DataFrame) – Dataframe containing the trend durations, slopes, and corresponding time series data point for the sequence
Returns:: Tensors containing trend duration and slope and padded corresponding time series data points

pad_data

features.preprocessing.pad_data(data)

Pad rows of time series data with 0’s to match longest row

Parameters:: data (DataFrame) – Data to pad
Returns:: Pandas dataframe containing the padded data

extract_data

features.preprocessing.extract_data(data, num_input, num_output)

Creates sequences of m data points to predict the next n data points

Parameters:

data (tensor) – Data to split into subsequences
num_input (int) – Number of input data
num_output (int) – Number of output data

Returns:

Two tensors containing the input and ouput data

SplitData

class features.preprocessing.SplitData(data_labels=['X_train', 'y_train', 'X_valid', 'y_valid', 'X_test', 'y_test'])

Initializes a SplitData object that stores any number of different sets of data and can merge data with other SplitData objects with the same sets

Parameters:: data_labels (list[str], optional) – List of labels corresponding to the names of each set
Returns:: None

add(label, data)

Adds data to a specified set based on labels

Parameters:

label (string) – Data label type (ex: “X_train”, “X_valid” etc.)
data (tensor) – Data to add

Returns:

None

get(label)

Retrieves data from a specified set based on the label

Parameters:: label (string) – Data label type (ex: “X_train”, “X_valid” etc.)
Returns:: Tensor containing data from the specified set

merge(data)

Merges data from the same set label groups from a different SplitData object

Parameters:: data (SplitData object) – SplitData object
Returns:: None

Linear Approximation

LinearApproximation

class features.linear_approximation.LinearApproximation(max_error, min_segment_length, data=None, target_col=None, date_index=None)

Create a LinearApproximation object that extracts trend sequence durations and slope from raw time series data

Parameters:

max_error (float) – Maximum error allowed in each segment during linear approximation
min_segment_length (int) – Minimum number of data points in each segment during linear approximation
data (Pandas DataFrame) – Dataset to process
target_col (string or int) – Name or index of target column
date_index (string or int) – Name or index of date column

Returns:

None

add_data(data, target_col, date_index='date_index')

Load in data to process with a date index column and target prediction column

Parameters:

data (Pandas DataFrame) – Dataframe to process
target_col (string or int) – Name or index of target column
date_index (string or int) – Name or index of date column

Returns:

None

best_line(i, upper_bound)

Calculates end index of current window in linear approximation algorithm

Parameters:

i (int) – Starting index of current window
upper_bound (int) – Maximum size of window

Returns:

Ending index of current window

Return type:

int

bottom_up(i, j)

Performs bottom up algorithm on data[i:j] as described in: http://www.cs.ucr.edu/~eamonn/icdm-01.pdf and returns list of segments represented by indices

Parameters:

i (int) – Starting index of current window
j (int) – Ending index of current window

Returns:

segments (2-D list): segments[i] = [starting index of segments[i], ending index of segments[i]]

Return type:

list[list]

process_data()

Transform original data to Pandas DataFrame containing information about trends. Each row in the DataFrame (trends[i]) corresponds to [trend_duration[i], trend_slope[i], original data points that make up trends[i]]

Returns:: DataFrame of processed data

save_to_csv(file_path)

Saves transformed data to csv file for use

Parameters:: file_path (string) – File path to save csv to
Returns:: If csv file was saved
Return type:: boolean

Scaler

MultiScaler

class features.scaler.MultiScaler(num_sources)

Scales multiple sources of data using the Scaler class. See Scaler for more details.

Parameters:: num_sources (int) – Number of different sources to scale
Returns:: None

fit_transform(data)

Fits and scales all data sources. Use None to fill in missing data sources.

Parameters:: data (list[tensor or dataframe or series]) – Data to fit and transform
Returns:: List of all scaled data

inverse_transform(data)

Inverse transform scaled data. Use None to fill in missing data sources.

Parameters:: data (list[tensor]) – Data to revert back to original values
Returns:: Returns list of tensors of inversely transformed values

transform(data)

Transforms all data sources according to pre-trained scalers. Use None to fill in missing data sources.

Parameters:: data (list[tensor or dataframe or series]) – Data to transform
Returns:: List of all scaled data

Scaler

class features.scaler.Scaler

Scales data using Sklearn’s MinMaxScaler. Generalized to accept tensors.

Returns:: None

fit_transform(data)

Fits and scales data

Parameters:: data (tensor or dataframe or series) – Data to fit and transform
Returns:: Tensor of scaled data values

inverse_transform(data)

Inverse transform scaled data

Parameters:: data (tensor or dataframe or series) – Data to revert back to original values
Returns:: Tensor of inversely transformed values

transform(data)

Scales data according to pretrained scaler

Parameters:: data (tensor or dataframe or series) – Data to transform
Returns:: Tensor of scaled data values