Reinforcement learning is typically implemented in an online setting, where the agent starts off making stupid actions and learns through a feedback loop with the environment's reward system. As it continues, the agent becomes more capable.
This style of reinforcement learning is extremely suitablem for episodic games as each episode is well defined and upon each iteration the game can be restarted and the agent can try again optimizing its interal decision making process depending on which algorithm is used. For applications where episodes are difficult or unable to be simulated this is not the case. We can not just restart the system in a new environment for online learning and generate a new set of actions and feedbacks for the agent to learn from.
On a similar note, under an online application, there is no use of any historical data collected by a separate agent or model. Naively training on historical action-reward pairs will result in biased training set as we cannot guarantee that the action space has been effectively explored with the other method.
This can seemingly limit the application of RL to well contained and seemingly contrived applications. Whereas most common problems in data science are muddy and unclear where environments and reward systems are not well defined and deploying completely naive models to learn in an online setting is risky and could prove costly.
An application which overcomes some of these limitation is the use of the RL in classifying multivariate sequences in a set of time series. This approach has been outlined in several papers by C. Martinez G. Perrin and E. Ramasso (Refs 1 and 2) and the general concept is covered in the review article by A. Gupta et al.
In these works they propose and implement a Double Deep Q Network agent which at each time step of the sequence will either classify the sequence to a set of pre-defined $k$ classes or delay the action to the next time step. The agent's preference for early classification or waiting until it is more certain can be tuned with hyperparameters.
Below I will show an implementation of this early classification with RL using Pytorch.
Unlike other application of RL where there is a predefined environment set up in a gym API and our primary job is to set up an agent which chooses an action and receives a reward. In this situation we also need to set up our environment class. This evironment class will have have to receive actions and return particular rewards. The latter paper by Martinez et al. describes two different approaches to the reward function of the environment. In the repo we have implemented both approaches but we will only discuss the shaped reward function here.
This time series classification specific reward function has the delay reward shaped such that the longer you delay the classification, the larger the penalty you receive. This explicitly encodes the decay of the reward function over time in the environmental reward and the paper suggests having the agent not discount the future rewards they receive (\gamma = 1).
The shaped reward function is:
$$ \begin{aligned} a &= \text{action} \\ t &= \text{timestep in sequence} \\ T &= \text{total length of sequence} \\ r(a, t; T, \lambda, \kappa) &= \begin{cases} 1 & a \neq 0 \text{ and } a \text{ is correct}\\ -1 & a \neq 0 \text{ and } a \text{ is incorrect}\\ -\dfrac{\lambda}{\kappa^{T-t}} & a = 0 \end{cases} \end{aligned} $$
This is simplified compared to the one suggested in the above papers. Here $\lambda$ and $\kappa$ are tunable hyperparameters which optimize whether the agent prioritizes quicker classification of accuracy.
$\lambda$ :Tunes the importance of earliness compared to accuracy. Linearly scales the magnitude of the delay penalty
$\kappa$ : Tunes the penality of the delay as the sequence progresses.
One of the benefits of this application of RL is that we aren’t significantly affected by the downsides of offline RL learning. A problem with offline RL learning is that we don’t have counter-factual states given different actions than the ones taken. This can leads to poor exploration of the state-action space. For early classification, since every action but one is a terminal action, we can treat a training set time series of a given sequence as just a series of 'delay' actions. Then for every time step we can calculate the given reward of all other actions and the associated reward given the true class label. This will results in an exhaustive set of training tuples.
For example, for a time series with 3 time steps (states) and 2 possible classes a and b. Assume that the true label is a and we use discount reward function (no reward for delay). The training tuples from this will be.
State | Action | Reward | Next state |
---|---|---|---|
state 1 | delay | reward=0 | state 2 |
state 2 | delay | reward=0 | state 3 |
state 3 | delay | reward=0 | None |
state 1 | a | reward=1 | None |
state 2 | a | reward=1 | None |
state 3 | a | reward=1 | None |
state 1 | b | reward=-1 | None |
state 2 | b | reward=-1 | None |
state 3 | b | reward=-1 | None |
For a dataset with dimensions [number of sequences, number of timesteps, number of features] an exhaustive training set is generated by the env class in the repo. Multivariate datasets A repository of open, multivariate datasets for time classification can be found here. Each dataset contains sequences with a constant number of time steps and a set number of features at each timestep. Each sequence is labeled as a class. Heart anomaly dataset One of the datasets contains cardio measurements for several different patients over time. The provided description of the data is
Training set Sequences : 204 Time steps in sequence : 405 Classes : [normal, abnormal] Class ratio : [0.28, 0.72] Test set Sequences : 205 Time steps in sequence : 405 Classes : [normal, abnormal] Class ratio : [0.28, 0.72]
We can just batch generate the training and evaluation data. Using the env class method
env.load_exhaustive_dataset(self, data_dict)
where data_dict is a dictionary containing the data
and metadata for the heartbeat data. This will generate the exhaustive set of training tuples
described above that will be passed to the agent for training.
Each tuple has the structure q_tpl = (state, action, reward, next_state, terminal, is_last)
A placeholder next_state is used if the state is terminal.
For the evaluation data, for all timesteps the class method env.load_eval_data(data_dict, data_type = 'TEST')
generates the tuple structure (state, label, timestep, is_last).
def load_exhaustive_dataset(self, data_dict):
"""
Offline generation of training and validation training set. A benefit of this classification
or delay problem is that for each training sequence we have, for each timestep we can
calculate rewards and next_states for every action. This is because all actions except
"delay" terminates the sequence (No next state) and "delay" moves us to the next timestep in
the data. Therefore we can generate an exhaustive training and validation set.
Keyword Arguments:
data_dict (dictionary): dictionary with numpy dataset and metadata
"""
data = data_dict['data']
labels = data_dict['labels']
classes = data_dict['class_list']
action_dict = data_dict['action_dict']
num_feats = data_dict['num_feats']
# Define specific environment information. Instantiated as Nones.
self.num_feats = num_feats
self.class_list = classes
self.action_dict = action_dict
# Reverse action dictionary to get actions asscociated with true labels.
rev_action_dict = dict((v, k) for k, v in action_dict.items())
labels = [rev_action_dict[i] for i in labels]
# print(f"True labels: {labels}")
# print(data[0])
# Generate all possible training q tuples from offline data
data_tuples = []
num_actions = len(self.class_list ) + 1
exhaustive_data = []
# Iterate over sequences (first axis of data)
for i in range(data.shape[0]):
num_timesteps = data[i].shape[0]
# Iterate of each timestep of sequence
for t in range(num_timesteps):
# Append state as contextual features of timestep.
state = data[i][t]
# Apply all actions to timestep.
for action in range(num_actions):
# print(action)
if t == num_timesteps-1:
is_last = True
else:
is_last = False
reward = self._gen_reward(label=labels[i], action=action, timestep=t+1, is_last=is_last, seq_length=num_timesteps)
# Get next state of sequence given action.
if action == 0:
if not is_last:
next_state = data[i][t+1]
terminal = 0
else:
next_state = np.zeros(num_feats)
terminal = 1
else:
# Placeholder terminal state. Multiplied out in pytorch.
next_state = np.zeros(num_feats)
terminal = 1
q_tpl = (state, action, reward, next_state, terminal, is_last)
# Append tuple to exhaustive data.
exhaustive_data.append(q_tpl)
return exhaustive_data
The RL agent uses a double deep Q network to learn the Q(x, a) functions. It is implemented in pytorch. We update the target DQN after every 20 batches or at the end of every epoch.
We implemented a DDQN with the following parameters * Two fully-connected, linearly activated hidden layers. * Both hidden layers have 128 neurons. * Learning rate exponential decay scheduler. Initial lr = 0.0001, decay factor = 0.9. * Batch size = 128. * Number of epochs = 50 (No early stopping). * Update target DQN with local DQN parameters after every 20 batches and every epoch.
def train_offline(self, epochs, batch_size, train_dataset, val_dataset=None, target_update_type='batch', update_batches=20, tau=1e-3, save_model=None):
"""
Pretrain the model with historical data. This is essentially like copying all the data into
the experience replay and training on it.
Keyword Args:
epochs (int) : Number of epochs to train over.
batch_size (int): size of batch to train at once.
train_dataset (Pytorch dataset): Training Dataset containing only tuples of
(state, reward, next_state, terminal_flag, is_last).
val_dataset (Pytorch dataset): Validation Dataset containing only tuples of
(state, reward, next_state, terminal_flag, is_last).
target_update_type (str) : One of 'batch' or 'soft'. Determines how to target QNN
is updated.
update_batches (int) : number of batched required to update target model.
tau (float): interpolation parameter for soft update.
save_model (Boolean) : Whether to save to model as a pickle file.
"""
# Load dataset into a loader to iterate.
loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
if val_dataset:
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True)
# Track historical training loss
train_loss_hist = []
# Track historical validation loss
val_loss_hist = []
print("Training")
for e in range(epochs):
# Track epoch loss
train_epoch_loss= []
for i, batch in enumerate(loader):
state_batch, action_batch, reward_batch, next_state_batch, terminal_batch, _ = batch
state_batch, action_batch, reward_batch, next_state_batch, terminal_batch = \
state_batch.to(device), action_batch.to(device), reward_batch.to(device), next_state_batch.to(device), terminal_batch.to(device)
action_batch = action_batch.reshape((action_batch.shape[0], 1))
criterion = torch.nn.MSELoss()
# Set the local NN for training. This will update the weights each batch.
self.qnetwork_local.train()
# Set the target NN for evaluation
# Target model is one with which we need to get our target (next state Q values).
# So that when we do a forward pass with target model it does not calculate gradient.
# We will update target model weights with soft_update function
self.qnetwork_target.eval()
# Calculate Q values of current state for the action that is taken. This is done with
# a forward pass of out local model.
q_values = self.qnetwork_local(state_batch.float()).gather(1,action_batch)
# calculate the max Q values of the next state after the action with the target network.
# Choose the Q - value with the largest value (best action). Terminal state logic is
# implemented in the following step. Q values for the placeholder will be used instead.
with torch.no_grad():
next_q_values = self.qnetwork_target(next_state_batch.float()).detach().max(1)[0].unsqueeze(1)
# Generate the target from the reward and the Q - value of the next state. If this was a
# terminal state/action, only use the reward.
target = reward_batch + self.gamma*next_q_values.reshape(next_q_values.shape[0])*(1 - terminal_batch.float())
current = q_values.reshape(q_values.shape[0])
# Confirm dtype
target = target.to(torch.float32)
current = current.to(torch.float32)
# Loss between target and current to minimize.
batch_train_loss = criterion(current, target)
self.optimizer.zero_grad()
batch_train_loss.backward()
train_epoch_loss.append(batch_train_loss.item())
self.optimizer.step()
if target_update_type == 'soft':
# Soft update target QNN
self._soft_update_target(self.qnetwork_local,self.qnetwork_target,tau)
if target_update_type == 'batch' and i % update_batches == 0:
# print("Updating target nn")
self._batch_update_target(self.qnetwork_local,self.qnetwork_target)
# Copy final local qnn to target qnn
self._batch_update_target(self.qnetwork_local, self.qnetwork_target)
# Calculate epoch loss
train_epoch_loss = sum(train_epoch_loss)/len(train_epoch_loss)
train_loss_hist.append(train_epoch_loss)
if len(train_loss_hist) >1:
train_delta = train_loss_hist[-1] - train_loss_hist[-2]
train_perc_delta = (train_loss_hist[-1] - train_loss_hist[-2])/(train_loss_hist[-2])*100
else:
train_perc_delta = 0.0
train_delta = 0.0
if self.scheduler:
self.scheduler.step()
# Perform validation
# TODO: Add early stopping logic to prevent overfitting
if val_dataset:
val_epoch_loss = []
with torch.no_grad():
for i, batch in enumerate(val_loader):
state_batch, action_batch, reward_batch, next_state_batch, terminal_batch, _ = batch
state_batch, action_batch, reward_batch, next_state_batch, terminal_batch = \
state_batch.to(device), action_batch.to(device), reward_batch.to(device), next_state_batch.to(device), terminal_batch.to(device)
action_batch = action_batch.reshape((action_batch.shape[0], 1))
q_values = self.qnetwork_target(state_batch.float()).gather(1,action_batch)
next_q_values = self.qnetwork_target(next_state_batch.float()).detach().max(1)[0].unsqueeze(1)
target = reward_batch + self.gamma*next_q_values.reshape(next_q_values.shape[0])*(1 - terminal_batch.float())
current = q_values.reshape(q_values.shape[0])
batch_val_loss = criterion(current, target)
val_epoch_loss.append(batch_val_loss.item())
val_epoch_loss = sum(val_epoch_loss)/len(val_epoch_loss)
val_loss_hist.append(val_epoch_loss)
if len(val_loss_hist) > 1:
val_perc_delta = (val_loss_hist[-1] - val_loss_hist[-2])/(val_loss_hist[-2])*100
val_delta = (val_loss_hist[-1] - val_loss_hist[-2])
else:
val_perc_delta = 0.0
val_delta = 0.0
else:
val_epoch_loss = 0.0
val_perc_delta = 0.0
print(f"Epoch {e}: Train Loss: {train_epoch_loss:.3f} ({train_delta:.3f}, {train_perc_delta:.3f}%) \t Val Loss: {val_epoch_loss:.3f} ({val_delta:.3f}, {val_perc_delta:.3f} %) \t lr: {self.optimizer.param_groups[0]['lr']:.3g}")
if save_model:
filename = os.path.join(MODELS_DIR, f'{self.agent_name}.pt')
torch.save(self.qnetwork_target.state_dict(), filename)
return train_loss_hist, val_loss_hist
Here is a plot of one of the best performing set of hyperparameters. Lambda = 3, Kappa = 1.7 One thing we noted is that RL agent does not generalize too well on the test set (though it is better than the simple classification below).
There are two overall metrics to judge the performance of the agent. Accuracy of classification. For the accuracy we used the f1-score, specifically the weighted f1-score from the sklearn module (f1-score). How quick the classification occurred. The average classification time across all test samples. To get a sense of how the shaped reward hyper-parameters, lambda and kappa, affect these metrics we did a rough grid search. Large values of lambda (>100) gave nonsensical agents as these large negative delay rewards severely outweigh the classification rewards (|1|). Below is a plot of the total reward on the test set, the time classification score, and the f1_score against lambda for different values of kappa.
From these plots we can clearly see the effect of lambda on the time-score. For low values of lambda, the agents are indecisive, waiting a long time time to classify and in some cases never classifying. As lambda increases, delays get more and more expensive and the agents are incentivised to classify quickly as the penalty for not classifying far exceeds classification. This quickly converges to a point where the agent classifies immediately on the first timestep. The relationship between accuracy and lambda is less clear. This is understandable as there is no direct relationship between a delay action the accuracy of the prediction. However, we can make some broad conclusions. Having no delay penalty (lambda=0) gives very poor accuracy. This is most likely because in these cases there is no RL component and we are just left with a simple classifier as there there are no “next Q values“ for the target in the loss function. For most values of kappa, the best values for lambda are when it is comparable to the magnitude of the reward for correct classification (and negative reward for incorrect) which is simply 1 in the above search. Intuitively, this reward structure would allow the rewards for correct and incorrect classification actions to be meaningful in scale to the total reward and allows for better learning.
We did some rough baseline ‘agents' to give a comparison anchor for our agents. These baseline agents do not use any RL mechanics but interact with the same environment to classify or delay and record the rewards. Currently we have built two baseline agents, a simple random action selection, and a NN classifier of with the same structure as our agent.
We build a really simple (perhaps a strawman...) NN model with the same hidden structure as the DQN for the RL agent. The major difference is that the output layer only has two nodes corresponding to the two classes. That is, there is no delay action. We do a similar training process as above with the same hyperparameters. One major thing outcome of this is that the classifier does not generalize out of sample at all. There is not overall drop in the validation loss and it bounces around. Despite this, we apply the model the test set (validation set but whatever). We implement two different approaches for classifying throughout the sequence.
Threshold classification If the softmax probability of either class exceeds some threshold then it classifies. Otherwise it delays with some delay reward (default is 0) The simple model is terrible at distinguishing the two classes with probabilities rarely exceeding 0.6.
Delay or classify with some random chance. The classification will be arg max of the classifier at that time step. Here is a set of results for the simple classifier for both delay types at different thresholds.
Note: Thresholds <=0.5 for “delay on thresh“ are meaningless and correspond to immediate classification. The only interesting plot are the accuracy scores. Here, for all thresholds, the f1-score barely sits between 0.6-0.68 for all thresholds. Our RL agent outperformed this with f1-scores above 0.7 with certain hyper-parameters.
Overall, while the RL agent results are better than a simple classification agent this does not show how it compares to other techniques specifically designed for classification of time series data. Running benchmarks for those would be more enlightening but I am not too familiar with them and I needed to finish at some point.
From the above results we can see that the RL agent beat the simple classifier which struggled to generalize from independent features with no concept of time series. The RL agent demonstrated some capacity to learn and generalize but not to a large degree. This could be due to the nature of the dataset itself or could be improved by testing different DQN structures for the agent. We can see the effect that the lambda and kappa hyper-parameters have on the how the agent performs which reinforces that the agent is learning correctly based on the reward structure of the environment.
Early classification of time series with RL is an interesting approach to the problem and could be a promising solution in some cases. More work is required in benchmarking the effectiveness compared to more traditional early classification methods.
from scipy.io import arff
import os
import pandas as pd
import numpy as np
import aeon
from aeon.datasets import load_from_tsfile
import torch
from torch.utils.data import DataLoader, Dataset
BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
DATA_DIR = ###ENTER DATA PATH HERE
def load_data(data_name, data_type = 'TRAIN'):
print(data_name)
filename = os.path.join(os.path.join(DATA_DIR, data_name), f"{data_name}_{data_type}.ts")
print(filename)
x, y = load_from_tsfile(filename)
x = np.swapaxes(x, 1, 2)
return x, y
def strip_time_series(data, labels, ):
data_tmp = []
labels_tmp = []
for i in range(data.shape[0]):
data_tmp.append(data[i].squeeze())
labels_tmp.append(np.ones(data[i].shape[0])*labels[i])
data = np.concatenate(data_tmp, axis=0)
labels = np.concatenate(labels_tmp, axis=0)
return data, labels
class QDataset(Dataset):
def __init__(self, q_data):
self.q_data = q_data
def __len__(self):
return len(self.q_data)
def __getitem__(self, idx):
return self.q_data[idx]
class classDataset(Dataset):
def __init__(self, data, labels, eval=False):
if not eval:
data, labels = strip_time_series(data, labels)
self.data = torch.as_tensor(data, dtype=torch.float64)
self.labels = torch.as_tensor(labels, dtype=torch.long)
else:
# self.data = data
# self.labels = labels
self.data = torch.as_tensor(data, dtype=torch.float64)
self.labels = torch.as_tensor(labels, dtype=torch.long)
if len(self.data) != len(self.labels):
raise Exception("data and labels size does not match.")
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx], self.labels[idx]
from sklearn.metrics import f1_score, confusion_matrix
from itertools import compress
def delay_score(eval_dict, class_list):
"""
Calculates a score on the how quickly the agent classifies a time series. It is the average
timestep across for classification divided by the length of the sequence. Lower scores are better.
"""
time_metrics = {}
# Total number of unclassified
total_classified = len([i for i in eval_dict['final_classification'] if i != 0])
total = len(eval_dict['final_classification'])
classified_ratio = total_classified/total
time_metrics['classified_ratio'] = classified_ratio
# class number of unclassified
for c in class_list:
class_bool = [True if a==c else False for a in eval_dict['true_labels']]
class_actions = list(compress(eval_dict['final_classification'], class_bool))
class_classified = len([i for i in class_actions if i != 0])
class_correct_classified = class_actions.count(c)
class_total = len(class_actions)
time_metrics[c] = {}
time_metrics[c]['total_classified_ratio'] = class_classified/class_total
time_metrics[c]['correct_classified_ratio'] = class_correct_classified/class_total
# Calculate total delay score
total_time = sum(eval_dict['classification_time'])
total_sequences = len(eval_dict['classification_time'])
time_metrics['total_time_score'] = total_time/total_sequences
# Calculate class delay score
for c in class_list:
class_bool = [True if a==c else False for a in eval_dict['true_labels']]
class_times = list(compress(eval_dict['classification_time'], class_bool))
class_time = sum(class_times)
class_total = len(class_times)
time_metrics[c]['class_time_score'] = class_time/class_total
return time_metrics
def accuracy_score(eval_dict):
"""
Calculates a score on the how quickly the agent classifies a time series. It is the average
timestep across for classification divided by the length of the sequence. Lower scores are better.
"""
acc_metrics = {}
# Accuracy scores
score = f1_score(eval_dict['true_labels'], eval_dict['final_classification'], average='weighted')
# Confusion matrix
cm = confusion_matrix(eval_dict['true_labels'], eval_dict['final_classification'])
acc_metrics['f1_score'] = score
acc_metrics['cm'] = cm
return acc_metrics
import os
import sys
import numpy as np
BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
SRC_DIR = os.path.join(BASE_DIR, 'src')
UTILS_DIR = os.path.join(SRC_DIR, 'utils')
sys.path.append(SRC_DIR)
from utils.load_data import load_data
class env():
"""RL environement which defines an action and reward structure for the early stopping problem.
Keyword Args:
num_feats (int) : Number of contextual features at each timestep for a sequence.
class_list (list) : List of possible classifications of the sequence.
action_dict (dict) : Dictionary mapping integer actions to their classifications and a delay
action. Delay is typically 0.
lamb (float) : Hyperparamter of reward function. Tunes the importance of earliness compared
to accuracy. Linearly scales the magnitude of the delay penalty
kappa (float) : Hyperparamter of reward function. Tunes the penality of the delay as the
sequence progresses.
reward_type (string) : Either "discount" or "shaped".
"discount" -> Standard discount reward structure. Positive rewards for correct classification
negative for incorrect. No reward for delay. Discounted indirectly by agent's gamma.
"shaped" -> Time series specific reward structure. Environment explicit penalizes delays
based on "lamb" and "kappa" hyper parameters in addition to Positive rewards for
correct classification negative for incorrect.
terminal_delay_penalty (float): Magnitude of penalty for finishing sequence with no
classification. Handled implicity by shaped reward structure.
"""
def __init__(self, reward_type='discount', lamb=10, kappa=10, terminal_delay_penalty=0):
super(env, self).__init__()
self.num_feats = None
self.class_list = None
self.action_dict = None
self.lamb = lamb
self.kappa = kappa
self.reward_type = reward_type
self.terminal_delay_penalty = terminal_delay_penalty
def step(self, action):
"""
Perform a step along the time series for a single longitudinal data point
"""
pass
def _transition(self, action):
"""
For a given action return return the next state of the environment. Can also terminate.
"""
pass
def _discount_reward(self, label, action, timestep, is_last, seq_length):
"""
Standard reward which is discounted by the agent and not by the environment.
Keyword Args:
label (int) : True label of the sequence.
action (int) : Action chosen by the agent.
timestep (int) : Timestep in sequence action was taken.
is_last (Boolean) : Whether this is the final timestep in sequence.
seq_length (int) : Total number of timesteps in sequence.
"""
if is_last and action == 0:
# Penalize no classifying by end of sequence
r = self.terminal_delay_penalty
elif action == label:
# Correct classification
r = 1
elif (action != 0) and (action != label):
# Incorrect classification
r = -1
elif action == 0:
# Delay classification
r = 0
return r
def _shaped_reward(self, label, action, timestep, is_last, seq_length):
"""
Reward shaped by timeseries.
Keyword Args:
label (int) : True label of the sequence.
action (int) : Action chosen by the agent.
timestep (int) : Timestep in sequence action was taken.
is_last (Boolean) : Whether this is the final timestep in sequence.
seq_length (int) : Total number of timesteps in sequence.
"""
if action == label:
# Correct classification
r = 1
elif (action != 0) and (action != label):
# Incorrect classification
r = -1
elif action == 0:
# r = (-1)*self.lamb*(np.power(self.kappa, timestep)/(np.power(self.kappa, seq_length) - 1)) # Paper reward
r = (-1)*self.lamb/(np.power(self.kappa, seq_length - timestep)) # Simplified reward
if r == (-1)*np.inf:
r = 0.0
return r
def _gen_reward(self, label, action, timestep, is_last, seq_length):
"""
generate the reward for a given action
"""
if self.reward_type == 'discount':
return self._discount_reward(label, action, timestep, is_last, seq_length)
elif self.reward_type == 'shaped':
return self._shaped_reward(label, action, timestep, is_last, seq_length)
def load_exhaustive_dataset(self, data_dict):
"""
Offline generation of training and validation training set. A benefit of this classification
or delay problem is that for each training sequence we have, for each timestep we can
calculate rewards and next_states for every action. This is because all actions except
"delay" terminates the sequence (No next state) and "delay" moves us to the next timestep in
the data. Therefore we can generate an exhaustive training and validation set.
Keyword Arguments:
data_dict (dictionary): dictionary with numpy dataset and metadata
"""
data = data_dict['data']
labels = data_dict['labels']
classes = data_dict['class_list']
action_dict = data_dict['action_dict']
num_feats = data_dict['num_feats']
# Define specific environment information. Instantiated as Nones.
self.num_feats = num_feats
self.class_list = classes
self.action_dict = action_dict
# Reverse action dictionary to get actions asscociated with true labels.
rev_action_dict = dict((v, k) for k, v in action_dict.items())
labels = [rev_action_dict[i] for i in labels]
# print(f"True labels: {labels}")
# print(data[0])
# Generate all possible training q tuples from offline data
data_tuples = []
num_actions = len(self.class_list ) + 1
exhaustive_data = []
# Iterate over sequences (first axis of data)
for i in range(data.shape[0]):
num_timesteps = data[i].shape[0]
# Iterate of each timestep of sequence
for t in range(num_timesteps):
# Append state as contextual features of timestep.
state = data[i][t]
# Apply all actions to timestep.
for action in range(num_actions):
# print(action)
if t == num_timesteps-1:
is_last = True
else:
is_last = False
reward = self._gen_reward(label=labels[i], action=action, timestep=t+1, is_last=is_last, seq_length=num_timesteps)
# Get next state of sequence given action.
if action == 0:
if not is_last:
next_state = data[i][t+1]
terminal = 0
else:
next_state = np.zeros(num_feats)
terminal = 1
else:
# Placeholder terminal state. Multiplied out in pytorch.
next_state = np.zeros(num_feats)
terminal = 1
q_tpl = (state, action, reward, next_state, terminal, is_last)
# Append tuple to exhaustive data.
exhaustive_data.append(q_tpl)
return exhaustive_data
def load_eval_data(self, data_dict, data_type = 'TEST'):
"""
Generation of tuples of test set. Each tuple consists of the state, label, timestep and
whether it is the sequences final state. To be passed into agent.
Keyword Arguments:
data_dict (dictionary): dictionary with numpy dataset and metadata
"""
data = data_dict['data']
labels = data_dict['labels']
classes = data_dict['class_list']
action_dict = data_dict['action_dict']
num_feats = data_dict['num_feats']
rev_action_dict = dict((v, k) for k, v in action_dict.items())
labels = [rev_action_dict[i] for i in labels]
test_set = []
for i in range(data.shape[0]):
sample = []
num_timesteps = data[i].shape[0]
# print(num_timesteps)
for t in range(num_timesteps):
if t == num_timesteps-1:
is_last = True
else:
is_last = False
state = data[i][t]
label = labels[i]
timestep = t
sample.append((state, label, timestep, is_last))
test_set.append(sample)
return test_set
import torch
import torch.nn as nn
import torch.nn.functional as F
class QNetwork(nn.Module):
""" Actor (Policy) Model."""
def __init__(self, state_size, action_size, seed, fc1_unit=64, fc2_unit = 64):
"""
Initialize parameters and build model.
Params
=======
state_size (int): Dimension of each state
action_size (int): Dimension of each action
seed (int): Random seed
fc1_unit (int): Number of nodes in first hidden layer
fc2_unit (int): Number of nodes in second hidden layer
"""
super(QNetwork,self).__init__() ## calls __init__ method of nn.Module class
if seed:
self.seed = torch.manual_seed(seed)
self.fc1= nn.Linear(state_size,fc1_unit)
self.fc2 = nn.Linear(fc1_unit,fc2_unit)
self.fc3 = nn.Linear(fc2_unit,action_size)
def forward(self,x):
# x = state
"""
Build a network that maps state -> action values.
"""
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return self.fc3(x)
import numpy as np
import random
import os
import sys
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
import torch
BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
SRC_DIR = os.path.join(BASE_DIR, 'src')
UTILS_DIR = os.path.join(SRC_DIR, 'utils')
MODELS_DIR = os.path.join(SRC_DIR, 'trained_models')
sys.path.append(SRC_DIR)
from models.agent_nn import QNetwork
from torch.utils.data import DataLoader, Dataset
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
class Agent():
"""RL environement which defines an action and reward structure for the early stopping problem.
Keyword Args:
agent_name (string) : Name of the agent to save/load a model.
state_size (int) : Number of features defining each state.
action_size (int) : Number of possible classifications and the delay action.
env (env class) : Environment class.
learning_rate (float) : Update parameter for DQN
gamma (float) : Discount factor of future rewards.
tau (float) : Interpolation parameter for soft updating target DQN model with local DQN model.
"""
def __init__(self, agent_name, state_size, action_size, env, learning_rate=5e-4, gamma=0.9, tau=1e-3, layer_1_size=64, layer_2_size=64,seed=None, lr_sched='exponential'):
super(Agent, self).__init__()
self.agent_name = agent_name
self.state_size = state_size
self.action_size = action_size
self.seed = random.seed(seed)
self.env = env
self.layer_1_size = layer_1_size
self.layer_2_size = layer_2_size
# Hyperparameters
self.learning_rate = learning_rate
self.gamma = gamma
self.tau = tau
#Set up both the local and target networks. They must have identical structures
self.qnetwork_local = QNetwork(
state_size=self.state_size,
action_size=self.action_size,
seed=self.seed,
fc1_unit=self.layer_1_size,
fc2_unit=self.layer_2_size,
)
self.qnetwork_target = QNetwork(
state_size=self.state_size,
action_size=self.action_size,
seed=self.seed,
fc1_unit=self.layer_1_size,
fc2_unit=self.layer_2_size,
)
# Send networks to device
self.qnetwork_local.to(device)
self.qnetwork_target.to(device)
# self.load_model()
# Define the NN optimizer for the trainable network (local)
self.optimizer = optim.Adam(self.qnetwork_local.parameters(), lr=self.learning_rate)
if lr_sched == 'exponential':
self.scheduler = lr_scheduler.ExponentialLR(self.optimizer, gamma=0.9)
def load_model(self, filename=None):
if filename:
saved_name = filename
else:
saved_name = f"{self.agent_name}.pt"
path = os.path.join(MODELS_DIR, saved_name)
# Load the model into both DQNS
self.qnetwork_local.load_state_dict(torch.load(path))
self.qnetwork_target.load_state_dict(torch.load(path))
def _soft_update_target(self, local_model, target_model, tau):
"""
Update the target NN with the local parameters through interpolation every time step
Keyword Args:
local model (PyTorch model): weights will be copied from.
target model (PyTorch model): weights will be copied to.
tau (float) : Interpolation parameter for soft updating target DQN model with local DQN model.
"""
for target_param, local_param in zip(target_model.parameters(), local_model.parameters()):
target_param.data.copy_(tau*local_param.data + (1-tau)*target_param.data)
def _batch_update_target(self, local_model, target_model):
"""
Update the target NN with the local parameters competely. Should be done after some number
of training batches on the local model.
Keyword Args:
local model (PyTorch model): weights will be copied from.
target model (PyTorch model): weights will be copied to.
"""
for target_param, local_param in zip(target_model.parameters(), local_model.parameters()):
target_param.data.copy_(local_param.data)
def train_offline(self, epochs, batch_size, train_dataset, val_dataset=None, target_update_type='batch', update_batches=20, tau=1e-3, save_model=None):
"""
Pretrain the model with historical data. This is essentially like copying all the data into
the experience replay and training on it.
Keyword Args:
epochs (int) : Number of epochs to train over.
batch_size (int): size of batch to train at once.
train_dataset (Pytorch dataset): Training Dataset containing only tuples of
(state, reward, next_state, terminal_flag, is_last).
val_dataset (Pytorch dataset): Validation Dataset containing only tuples of
(state, reward, next_state, terminal_flag, is_last).
target_update_type (str) : One of 'batch' or 'soft'. Determines how to target QNN
is updated.
update_batches (int) : number of batched required to update target model.
tau (float): interpolation parameter for soft update.
save_model (Boolean) : Whether to save to model as a pickle file.
"""
# Load dataset into a loader to iterate.
loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
if val_dataset:
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=True)
# Track historical training loss
train_loss_hist = []
# Track historical validation loss
val_loss_hist = []
print("Training")
for e in range(epochs):
# Track epoch loss
train_epoch_loss= []
for i, batch in enumerate(loader):
state_batch, action_batch, reward_batch, next_state_batch, terminal_batch, _ = batch
state_batch, action_batch, reward_batch, next_state_batch, terminal_batch = \
state_batch.to(device), action_batch.to(device), reward_batch.to(device), next_state_batch.to(device), terminal_batch.to(device)
action_batch = action_batch.reshape((action_batch.shape[0], 1))
criterion = torch.nn.MSELoss()
# Set the local NN for training. This will update the weights each batch.
self.qnetwork_local.train()
# Set the target NN for evaluation
# Target model is one with which we need to get our target (next state Q values).
# So that when we do a forward pass with target model it does not calculate gradient.
# We will update target model weights with soft_update function
self.qnetwork_target.eval()
# Calculate Q values of current state for the action that is taken. This is done with
# a forward pass of out local model.
q_values = self.qnetwork_local(state_batch.float()).gather(1,action_batch)
# calculate the max Q values of the next state after the action with the target network.
# Choose the Q - value with the largest value (best action). Terminal state logic is
# implemented in the following step. Q values for the placeholder will be used instead.
with torch.no_grad():
next_q_values = self.qnetwork_target(next_state_batch.float()).detach().max(1)[0].unsqueeze(1)
# Generate the target from the reward and the Q - value of the next state. If this was a
# terminal state/action, only use the reward.
target = reward_batch + self.gamma*next_q_values.reshape(next_q_values.shape[0])*(1 - terminal_batch.float())
current = q_values.reshape(q_values.shape[0])
# Confirm dtype
target = target.to(torch.float32)
current = current.to(torch.float32)
# Loss between target and current to minimize.
batch_train_loss = criterion(current, target)
self.optimizer.zero_grad()
batch_train_loss.backward()
train_epoch_loss.append(batch_train_loss.item())
self.optimizer.step()
if target_update_type == 'soft':
# Soft update target QNN
self._soft_update_target(self.qnetwork_local,self.qnetwork_target,tau)
if target_update_type == 'batch' and i % update_batches == 0:
# print("Updating target nn")
self._batch_update_target(self.qnetwork_local,self.qnetwork_target)
# Copy final local qnn to target qnn
self._batch_update_target(self.qnetwork_local, self.qnetwork_target)
# Calculate epoch loss
train_epoch_loss = sum(train_epoch_loss)/len(train_epoch_loss)
train_loss_hist.append(train_epoch_loss)
if len(train_loss_hist) >1:
train_delta = train_loss_hist[-1] - train_loss_hist[-2]
train_perc_delta = (train_loss_hist[-1] - train_loss_hist[-2])/(train_loss_hist[-2])*100
else:
train_perc_delta = 0.0
train_delta = 0.0
if self.scheduler:
self.scheduler.step()
# Perform validation
# TODO: Add early stopping logic to prevent overfitting
if val_dataset:
val_epoch_loss = []
with torch.no_grad():
for i, batch in enumerate(val_loader):
state_batch, action_batch, reward_batch, next_state_batch, terminal_batch, _ = batch
state_batch, action_batch, reward_batch, next_state_batch, terminal_batch = \
state_batch.to(device), action_batch.to(device), reward_batch.to(device), next_state_batch.to(device), terminal_batch.to(device)
action_batch = action_batch.reshape((action_batch.shape[0], 1))
q_values = self.qnetwork_target(state_batch.float()).gather(1,action_batch)
next_q_values = self.qnetwork_target(next_state_batch.float()).detach().max(1)[0].unsqueeze(1)
target = reward_batch + self.gamma*next_q_values.reshape(next_q_values.shape[0])*(1 - terminal_batch.float())
current = q_values.reshape(q_values.shape[0])
batch_val_loss = criterion(current, target)
val_epoch_loss.append(batch_val_loss.item())
val_epoch_loss = sum(val_epoch_loss)/len(val_epoch_loss)
val_loss_hist.append(val_epoch_loss)
if len(val_loss_hist) > 1:
val_perc_delta = (val_loss_hist[-1] - val_loss_hist[-2])/(val_loss_hist[-2])*100
val_delta = (val_loss_hist[-1] - val_loss_hist[-2])
else:
val_perc_delta = 0.0
val_delta = 0.0
else:
val_epoch_loss = 0.0
val_perc_delta = 0.0
print(f"Epoch {e}: Train Loss: {train_epoch_loss:.3f} ({train_delta:.3f}, {train_perc_delta:.3f}%) \t Val Loss: {val_epoch_loss:.3f} ({val_delta:.3f}, {val_perc_delta:.3f} %) \t lr: {self.optimizer.param_groups[0]['lr']:.3g}")
if save_model:
filename = os.path.join(MODELS_DIR, f'{self.agent_name}.pt')
torch.save(self.qnetwork_target.state_dict(), filename)
return train_loss_hist, val_loss_hist
def eval_offline(self, dataset):
"""
Evaluate the trained model on the test dataset. Returns raw actions and rewards.
Keyword Args:
dataset (Pytorch dataset): Dataset containing only tuples of
(state, terminal_flag, is_last).
"""
eval_dict = {}
rewards = []
self.qnetwork_target.eval()
total_reward = 0
sample_reward_list = []
classification_time = []
action_list = []
final_classification = []
true_labels = []
for i in dataset:
sample_reward = 0
counter = 0
total_timesteps = len(i)
actions = []
final_action = 0
for tpl in i:
state, label, timestep, is_last = tpl
with torch.no_grad():
action = self.qnetwork_target(torch.tensor(state).float().to(device)).detach().argmax()#(1)[0].unsqueeze(1)
actions.append(action.item())
# print(action)
reward = self.env._gen_reward(label=label, action=action, timestep=timestep+1, is_last=is_last, seq_length=total_timesteps)
total_reward += reward
sample_reward += reward
counter += 1
if action != 0:
final_action = action.item()
break
true_labels.append(label)
final_classification.append(final_action)
classification_time.append(counter)
sample_reward_list.append(sample_reward)
action_list.append(actions)
eval_dict['true_labels'] = true_labels
eval_dict['total_reward'] = total_reward
eval_dict['sample_reward_list'] = sample_reward_list
eval_dict['classification_time'] = classification_time
eval_dict['action_list'] = action_list
eval_dict['final_classification'] = final_classification
return eval_dict
import os
import sys
import numpy as np
import pickle
BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
DATA_DIR = os.path.join(BASE_DIR, 'data')
SRC_DIR = os.path.join(BASE_DIR, 'src')
MODELS_DIR = os.path.join(SRC_DIR, 'trained_models')
sys.path.append(SRC_DIR)
from agents.ts_agents import Agent
from utils.load_data import load_data, QDataset
from utils.metrics import delay_score, accuracy_score
from envs.environment import env
def run_grid_search(train_data_dict, val_data_dict, eval_dataset):
reward_type = ['shaped'] #['discount', 'shaped']
kappa = [1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2, 2.2, 2.5, 2.8, 5, 8]
lamb = [0, 0.1, 0.5, 1, 2, 3, 4, 5, 10, 15, 20, 30, 40, 80, 100, 200, 400, 1000, 2000, 4000, 10000]
gamma = [0, 0.1, 0.2, 0.5, 0.9, 0.99, 1]
terminal_penalty = [0, -1, -5, -10]
lr = 0.0001
param_list = []
class_list = train_data_dict['action_dict'].keys()
for rt in reward_type:
if rt == 'shaped':
g = 1
term=0
for k in kappa:
for l in lamb:
param_list.append((rt, lr, g, l, k, term))
elif rt == 'discount':
l = None
k = None
for g in gamma:
for term in terminal_penalty:
param_list.append((rt, lr, g, l, k, term))
gs_dict = {}
for tpl in param_list:
rt, lr, g, l, k, term = tpl
print(f"Discount type: {rt}, gamma: {g}, lambda: {l}, kappa:{k}, terminal pen:{term}")
e = env(
reward_type=rt,
lamb=l,
kappa=k,
terminal_delay_penalty=term
)
train_q_data = e.load_exhaustive_dataset(data_dict=train_data_dict)
val_q_data = e.load_exhaustive_dataset(data_dict=val_data_dict)
train_dataset = QDataset(q_data=train_q_data)
val_dataset = QDataset(q_data=val_q_data)
agent = Agent(
agent_name='heartbeat',
state_size=61,
action_size=3,
env = e,
learning_rate=lr,
gamma=g,
tau=1e-3,
seed=None,
layer_1_size=128,
layer_2_size=128,
)
train_loss_hist, val_loss_hist = agent.train_offline(
epochs= 50,
batch_size=128,
train_dataset=train_dataset,
val_dataset=val_dataset,
)
result_dict = agent.eval_offline(dataset=eval_dataset)
delay_dict = delay_score(eval_dict=result_dict, class_list=class_list)
acc_dict = accuracy_score(eval_dict=result_dict)
gs_dict[tpl] = {}
gs_dict[tpl]['results'] = result_dict
gs_dict[tpl]['time_metrics'] = delay_dict
gs_dict[tpl]['acc_metrics'] = acc_dict
gs_dict[tpl]['training'] = train_loss_hist
gs_dict[tpl]['validation'] = val_loss_hist
with open(os.path.join(MODELS_DIR, 'lr_grid_search_results.pkl'), 'wb') as handle:
pickle.dump(gs_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
def main():
print(BASE_DIR)
data_name = 'Heartbeat'
# Load training data
train_data, train_labels = load_data(data_name=data_name, data_type='TRAIN')
# Load testing data
test_data, test_labels = load_data(data_name=data_name, data_type='TEST')
class_list = np.unique(train_labels)
action_dict = {i+1: v for i, v in enumerate(class_list) }
action_dict[0] = 'Delay'
num_feats = train_data.shape[-1]
print(f"Classes: {class_list}")
print(f"Number of features {num_feats}")
print(f"Length of sequence {train_data.shape[1]}")
train_data_dict = {
'data' : train_data,
'labels' : train_labels,
'class_list' : class_list,
'action_dict' : action_dict,
'num_feats' : num_feats,
}
val_data_dict = {
'data' : test_data,
'labels' : test_labels,
'class_list' : class_list,
'action_dict' : action_dict,
'num_feats' : num_feats,
}
e = env(
reward_type='shaped', #'discount',
lamb=2,
kappa=2,
terminal_delay_penalty=0
)
train_q_data = e.load_exhaustive_dataset(data_dict=train_data_dict)
val_q_data = e.load_exhaustive_dataset(data_dict=val_data_dict)
train_dataset = QDataset(q_data=train_q_data)
val_dataset = QDataset(q_data=val_q_data)
agent = Agent(
agent_name='heartbeat',
state_size=num_feats,
action_size=len(action_dict.keys()),
env = e,
learning_rate=0.0001,
gamma=1,
tau=1e-3,
seed=None,
layer_1_size=128,
layer_2_size=128,
)
# Load model if exists
model_name = 'model' #'heartbeat.pt'
file_exists = os.path.exists(os.path.join(MODELS_DIR, model_name))
if file_exists:
agent.load_model(model_name)
print("Loaded pretrained model")
else:
agent.train_offline(
epochs= 30,
batch_size=128,
train_dataset=train_dataset,
val_dataset=val_dataset,
)
print("Trained new model")
eval_data = e.load_eval_data(data_dict=val_data_dict, data_type='TEST')
# # Truncate dataset for testing
eval_dataset = QDataset(q_data=eval_data)
run_grid_search(train_data_dict, val_data_dict, eval_dataset)
pass
if __name__ == "__main__":
main()
import os
import sys
import numpy as np
import torch.optim as optim
import torch.nn as nn
import torch
from torch.utils.data import DataLoader, Dataset
import pickle
BASE_DIR = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
DATA_DIR = os.path.join(BASE_DIR, 'data')
SRC_DIR = os.path.join(BASE_DIR, 'src')
MODELS_DIR = os.path.join(SRC_DIR, 'trained_models')
sys.path.append(SRC_DIR)
from models.classification_nn import Classifier
from utils.load_data import load_data, strip_time_series, classDataset
from utils.metrics import delay_score, accuracy_score
from agents.benchmarks import agentClassifier
def raw():
print(BASE_DIR)
data_name = 'EthanolConcentration'
train_data, train_labels = load_data(data_name=data_name, data_type='TRAIN')
class_list = np.unique(train_labels)
class_dict = {i: v for i, v in enumerate(class_list)}
rev_class_dict = dict((v, k) for k, v in class_dict.items())
train_labels = [rev_class_dict[i] for i in train_labels]
num_feats = train_data.shape[-1]
print(train_data)
print(train_labels)
#Put into dataset
train_dataset = classDataset(data=train_data, labels=train_labels)
test_data, test_labels = load_data(data_name=data_name, data_type='TEST')
test_labels = [rev_class_dict[i] for i in test_labels]
test_dataset = classDataset(data=test_data, labels=test_labels)
classifier = Classifier(
input_size=3,
class_size=4
)
print(classifier)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(classifier.parameters(), lr=0.00001, momentum=0.5)
# Create dataloader
train_loader = DataLoader(train_dataset, batch_size=24, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=24, shuffle=True)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = classifier(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
classifier.eval()
with torch.no_grad():
val_loss = 0
for j, batch in enumerate(test_loader):
inputs, labels = data
outputs = classifier(inputs)
loss = criterion(outputs, labels)
val_loss += loss.item()
print(f'[{epoch + 1}, {i + 1:5d}] train_loss: {running_loss / 2000:.3f}, val_loss : {val_loss:.3f}')
running_loss = 0.0
val_loss = 0.0
classifier.train()
print('Finished Training')
def grid_search(agent, test_dataset):
delay_type = ['chance_delay', 'delay_on_thresh']
thresh = [0.1, 0.2, 0.3, 0.4, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.9]
param_list = []
gs_dict = {}
for dt in delay_type:
for t in thresh:
param_list.append((dt, t))
for tpl in param_list:
print(tpl)
eval_dict = agent.evaluate(
test_dataset,
action_strategy=tpl[0],
threshold=tpl[1]
)
delay_dict = delay_score(eval_dict=eval_dict, class_list=[0,1])
acc_dict = accuracy_score(eval_dict=eval_dict)
gs_dict[tpl] = {}
gs_dict[tpl]['results'] = eval_dict
gs_dict[tpl]['time_metrics'] = delay_dict
gs_dict[tpl]['acc_metrics'] = acc_dict
with open(os.path.join(MODELS_DIR, 'benchmark_classifier_grid_search_results.pkl'), 'wb') as handle:
pickle.dump(gs_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)
def main():
data_name = 'Heartbeat'
# Load training data
train_data, train_labels = load_data(data_name=data_name, data_type='TRAIN')
print(f"Number of train sequences {len(train_labels)}.")
cnts = np.unique(train_labels, return_counts=True)
print(cnts)
print(f"Number of abnormal class: {cnts[1][0]}")
print(f"Number of normal class: {cnts[1][1]}")
lst = [cnts[1][1]/len(train_labels), cnts[1][0]/len(train_labels)]
print(f"Class ratio: {lst}")
# Load testing data
test_data, test_labels = load_data(data_name=data_name, data_type='TEST')
print(f"Number of test sequences {len(test_labels)}.")
cnts = np.unique(test_labels, return_counts=True)
print(f"Number of abnormal class: {cnts[1][0]}")
print(f"Number of normal class: {cnts[1][1]}")
lst = [cnts[1][0]/len(test_labels), cnts[1][1]/len(test_labels)]
print(f"Class ratio: {lst}")
class_list = np.unique(train_labels)
action_dict = {i+1: v for i, v in enumerate(class_list) }
action_dict[0] = 'Delay'
num_feats = train_data.shape[-1]
class_dict = {i: v for i, v in enumerate(class_list)}
rev_class_dict = dict((v, k) for k, v in class_dict.items())
train_labels = [rev_class_dict[i] for i in train_labels]
test_labels = [rev_class_dict[i] for i in test_labels]
train_dataset = classDataset(data=train_data, labels=train_labels)
val_dataset = classDataset(data=test_data, labels=test_labels)
print(train_dataset)
agent = agentClassifier(
num_feats=num_feats,
class_list=class_list,
layer_1_size=128,
layer_2_size=128,
lr=0.001
)
agent.train(
epochs=20,
batch_size=24,
train_dataset=train_dataset,
val_dataset=val_dataset,
)
print("Training finished")
test_data, test_labels = load_data(data_name=data_name, data_type='TEST')
test_labels = [rev_class_dict[i] for i in test_labels]
test_dataset = classDataset(data=test_data, labels=test_labels, eval=True)
eval_dict = agent.evaluate(
test_dataset,
action_strategy='delay_on_thresh',
threshold=0.7
)
delay_dict = delay_score(eval_dict=eval_dict, class_list=[0,1])
acc_dict = accuracy_score(eval_dict=eval_dict)
print(delay_dict)
print(acc_dict)
grid_search(agent, test_dataset)
if __name__ == "__main__":
main()