I just saved a couple charts, the green is the predicted output, the red is the actual values. The predicted outputs seem clamped with these settings:
../vowpalwabbit/vw -d training.txt -k -c -f btce.model --loss_function squared -b 25 --passes 20 -q ee --l2 0.0000005
No decimation (downsampling) ~20K datapoints:
Downsampled with a factor of 8 (~2.5K datapoints):
../vowpalwabbit/vw -d training.txt -k -c -f btce.model --loss_function squared --passes 20 --l2 0.0000005
This model worked better, looking at it closely you can see:
And this is only working with about a fifth of the data collected so far. Crazy that it actually seems to work sort of... in a muted sense.
Here's the graphing code for good measure:
#! /usr/bin/python2 import numpy as np import matplotlib.pyplot as plt from scipy import signal actual_values =  predicted_values =  with open('test.txt', 'r') as test_f: for line in test_f: actual_values.append(float(line.partition("|"))) with open('predictions.txt', 'r') as predictions_f: for line in predictions_f: predicted_values.append(float(line)) # Decimate the charts # actual_values = signal.decimate(actual_values, 10) # predicted_values = signal.decimate(predicted_values, 10) data_len = len(actual_values) print data_len x = np.arange(0, data_len) plt.plot(x, actual_values, 'r-', x, predicted_values, 'g-') plt.show()
I'll probably work on working with gzipped datasets/vw cache files exclusively as I move forward. I'm just worried about dealing with bigger datasets. Whee!