Home | Login

Charts

12/21/2014

I just saved a couple charts, the green is the predicted output, the red is the actual values.  The predicted outputs seem clamped with these settings:

../vowpalwabbit/vw -d training.txt -k -c -f btce.model --loss_function squared -b 25 --passes 20 -q ee --l2 0.0000005

No decimation (downsampling) ~20K datapoints:

Downsampled with a factor of 8 (~2.5K datapoints):

../vowpalwabbit/vw -d training.txt -k -c -f btce.model --loss_function squared --passes 20 --l2 0.0000005

This model worked better, looking at it closely you can see:

And this is only working with about a fifth of the data collected so far.  Crazy that it actually seems to work sort of... in a muted sense.

Here's the graphing code for good measure:

#! /usr/bin/python2

import numpy as np
import matplotlib.pyplot as plt

from scipy import signal

actual_values = []
predicted_values = []

with open('test.txt', 'r') as test_f:
    for line in test_f:
        actual_values.append(float(line.partition("|")[0]))

with open('predictions.txt', 'r') as predictions_f:
    for line in predictions_f:
        predicted_values.append(float(line))

# Decimate the charts
# actual_values = signal.decimate(actual_values, 10)
# predicted_values = signal.decimate(predicted_values, 10)

data_len = len(actual_values)
print data_len

x = np.arange(0, data_len)

plt.plot(x, actual_values, 'r-',  x, predicted_values, 'g-')
plt.show()

I'll probably work on working with gzipped datasets/vw cache files exclusively as I move forward.  I'm just worried about dealing with bigger datasets. Whee!

#python #machine learning #vw #matplotlib #