Feature Modules
General Preprocessing
preprocess_trends
- features.preprocessing.preprocess_trends(trends, points, device, feature_cfg)
Takes in trend slopes and durations and corresponding time series data and returns training, validation, and test sets from samples from the data
- Parameters:
trends (tensor) – Data to extract trends from
points (tensor) – Tensor of data points
device (object) – Device to store data on
feature_cfg (dict) – Hyperparameters
- Returns:
SplitData object containing the train, valdation, and test splits of trend data
train_valid_test_split
- features.preprocessing.train_valid_test_split(X, y=None, props=None, device=None)
Separates data into train, validation, and test sets
- Parameters:
X (tensor) – Feature data
y (tensor) – Target data
props (list[int]) – Train/Val split proportions (Test split is inferred from these numbers)
device (object) – Device to store data on
- Returns:
SplitData object containing train/val/test splits of the data
convert_data_points
- features.preprocessing.convert_data_points(data)
Converts dataframe of trend data to tensors and pads time series data
- Parameters:
data (DataFrame) – Dataframe containing the trend durations, slopes, and corresponding time series data point for the sequence
- Returns:
Tensors containing trend duration and slope and padded corresponding time series data points
pad_data
- features.preprocessing.pad_data(data)
Pad rows of time series data with 0’s to match longest row
- Parameters:
data (DataFrame) – Data to pad
- Returns:
Pandas dataframe containing the padded data
extract_data
- features.preprocessing.extract_data(data, num_input, num_output)
Creates sequences of m data points to predict the next n data points
- Parameters:
data (tensor) – Data to split into subsequences
num_input (int) – Number of input data
num_output (int) – Number of output data
- Returns:
Two tensors containing the input and ouput data
SplitData
- class features.preprocessing.SplitData(data_labels=['X_train', 'y_train', 'X_valid', 'y_valid', 'X_test', 'y_test'])
Initializes a SplitData object that stores any number of different sets of data and can merge data with other SplitData objects with the same sets
- Parameters:
data_labels (list[str], optional) – List of labels corresponding to the names of each set
- Returns:
None
- add(label, data)
Adds data to a specified set based on labels
- Parameters:
label (string) – Data label type (ex: “X_train”, “X_valid” etc.)
data (tensor) – Data to add
- Returns:
None
- get(label)
Retrieves data from a specified set based on the label
- Parameters:
label (string) – Data label type (ex: “X_train”, “X_valid” etc.)
- Returns:
Tensor containing data from the specified set
- merge(data)
Merges data from the same set label groups from a different SplitData object
- Parameters:
data (SplitData object) – SplitData object
- Returns:
None
Linear Approximation
LinearApproximation
- class features.linear_approximation.LinearApproximation(max_error, min_segment_length, data=None, target_col=None, date_index=None)
Create a LinearApproximation object that extracts trend sequence durations and slope from raw time series data
- Parameters:
max_error (float) – Maximum error allowed in each segment during linear approximation
min_segment_length (int) – Minimum number of data points in each segment during linear approximation
data (Pandas DataFrame) – Dataset to process
target_col (string or int) – Name or index of target column
date_index (string or int) – Name or index of date column
- Returns:
None
- add_data(data, target_col, date_index='date_index')
Load in data to process with a date index column and target prediction column
- Parameters:
data (Pandas DataFrame) – Dataframe to process
target_col (string or int) – Name or index of target column
date_index (string or int) – Name or index of date column
- Returns:
None
- best_line(i, upper_bound)
Calculates end index of current window in linear approximation algorithm
- Parameters:
i (int) – Starting index of current window
upper_bound (int) – Maximum size of window
- Returns:
Ending index of current window
- Return type:
int
- bottom_up(i, j)
Performs bottom up algorithm on data[i:j] as described in: http://www.cs.ucr.edu/~eamonn/icdm-01.pdf and returns list of segments represented by indices
- Parameters:
i (int) – Starting index of current window
j (int) – Ending index of current window
- Returns:
- segments (2-D list)
segments[i] = [starting index of segments[i], ending index of segments[i]]
- Return type:
list[list]
- process_data()
Transform original data to Pandas DataFrame containing information about trends. Each row in the DataFrame (trends[i]) corresponds to [trend_duration[i], trend_slope[i], original data points that make up trends[i]]
- Returns:
DataFrame of processed data
- save_to_csv(file_path)
Saves transformed data to csv file for use
- Parameters:
file_path (string) – File path to save csv to
- Returns:
If csv file was saved
- Return type:
boolean
Scaler
MultiScaler
- class features.scaler.MultiScaler(num_sources)
Scales multiple sources of data using the Scaler class. See
Scaler
for more details.- Parameters:
num_sources (int) – Number of different sources to scale
- Returns:
None
- fit_transform(data)
Fits and scales all data sources. Use None to fill in missing data sources.
- Parameters:
data (list[tensor or dataframe or series]) – Data to fit and transform
- Returns:
List of all scaled data
- inverse_transform(data)
Inverse transform scaled data. Use None to fill in missing data sources.
- Parameters:
data (list[tensor]) – Data to revert back to original values
- Returns:
Returns list of tensors of inversely transformed values
- transform(data)
Transforms all data sources according to pre-trained scalers. Use None to fill in missing data sources.
- Parameters:
data (list[tensor or dataframe or series]) – Data to transform
- Returns:
List of all scaled data
Scaler
- class features.scaler.Scaler
Scales data using Sklearn’s MinMaxScaler. Generalized to accept tensors.
- Returns:
None
- fit_transform(data)
Fits and scales data
- Parameters:
data (tensor or dataframe or series) – Data to fit and transform
- Returns:
Tensor of scaled data values
- inverse_transform(data)
Inverse transform scaled data
- Parameters:
data (tensor or dataframe or series) – Data to revert back to original values
- Returns:
Tensor of inversely transformed values
- transform(data)
Scales data according to pretrained scaler
- Parameters:
data (tensor or dataframe or series) – Data to transform
- Returns:
Tensor of scaled data values