LSTM-RNN Categorical Crypto-Currency Prediction Model

View the Jupyter Notebook on Google Colab here:

https://colab.research.google.com/drive/1VUgIl87CL_S0Y5hBFGiQi97vSXkczyPm?usp=sharing

The intended purpose of this research was to see if a classification model could be implemented to help predict cryptocurrency prices. The simple classification approach aimed to predict 3 classes; Increase or Decrease above or below a specified threshold; or no change. The goal was to predict (n) minutes into the future using the previous (n) minutes of recorded data. This was a semester project for my Deep Learning Course.

CONCLUSION:

Finding the appropriate correlations between currencies to use as supporting data for predicting models is difficult. Recommendations from online sites such as Coinbase lists LINK, LTC, and NEO to be directly correlated with DASH for example. Using such dependencies may have improved training results, although given the overall lackluster performance of all models, it’s hard to distinguish any benefit.

Finding the best threshold to predict for was also a contributing factor in success. Finding an even balance between increase, decrease, and even classes was critical in balancing results. Certain thresholds often left one class too heavily focused, which resulted in significant data loss after balancing.

Predicting (n) minutes in the future was another factor that effected data and results heavily. Tests showed (5 min) – (15 min) ideal. However given the various possibilities of data, thresholds and sequence lengths, any possible combination may be ideal and yet to be found.

Most tests were ran with a sequence length of 60 min, which seemed to give the best results. Again the vast amount of possibilities for data combinations makes any length a possible candidate.

Furthermore, errors in the data set may not be completely known, and finding vast amounts of free available data can become challenging. The data used here was sourced from a site called https://www.cryptodatadownload.com/ which linked data from Binance, an Asian cryptocurrency exchange market. Observations from tests did show a higher success rate with a greater number of data.

Errors and bugs in the normalization process, data set combining, sequencing of data, randomization of data, Y-target creation or many other processes involved may have been a contributing factor in the low prediction scores.

Analyzing the vast amount of models generated also proved to be a tedious task and has left room for automation in the future. While most models range in accuracy from 25% – 35%, a select bunch could be found in the 45% – 50% general accuracy. Also analyzing specific metrics proved that some models were better suited for specific tasks. For example one model might be predicting Increases with 47% accuracy, but predicting decreases with only 19%.

Overall the strategy using a 3 class classifier in RNN is probably obsolete or irrelevant. The approach was intended with the hypothesis that by having only 3 classes, a deep learning model might be able to achieve greater accuracies or make predictions easier. The results I think show this is not the case, with linear regression models available easily outperforming these models.