raML - Near Goals

Big: 1. Model compilation 2. Validation 3. Optimizers Small: 1. Lambda Layer 2. Data normalization as a layer (maybe?) Progress so far: Implemented model compilation. Now, creating a Deep Neural Network is as easy as it is in Keras model = Sequential([  Dense(size=3, input_shape=X.shape),  Dense(size=1, activation=Sigmoid) ]) model.compile(cost=MSE(), metrics=[RMSE()]) Looks just like Keras, you say? Well good, cause Keras does model creation the right way! I've also added Relu, but still testing to make sure it's working right. This actually made me realize, I should organize optimizers! Update After investigating, found out that the problem is most likely in exploding gradients. Didn't expect it to appear that early! Update 2 Oh this is so cool! After finding out the exploding gradient in a relatively small network, I knew that it probably wasn't due to the learning rate (although making it smaller did help), but rather it was due to weight initialization - that's actually worth writing a separate blog post about, but basically, I used to sample from a uniform 0,1 distribution, but it's much better to sample from a (normal) distribution centered at 0 (note: that doesn't fully solve it, for best performance, one need to take into account variation also, which should depend on layer's depth)

Comments (0)

Morning Goal

Let's see if I can stick to this new rubric. (Ramil from the future: "No") raML: 1. Lambda Layer 2. Implement More Datasets 3. Add more cost functions (RMSE) 4. Come up with a better DNN model creation procedure Update. Progress so far: Added more datasets, added Metrics, improved tqdm (the progress bar thing in terminal that tracks training progress). Here we have MSE Loss and RMSE metric tracked for model training on a Swedish Auto Insurance dataset. So beautiful

Comments (0)

Human Interactions are Hard!

It's never clear what the right thing to say is, at the moment, especially, when your words can and will be used against you . Stress, man, stress! On a different note, raML project (yeap, what an awesome name) is going great! Here is a sigmoid trained with MSE. Yeah, yeah, I shouldn't use MSE for logistic, but that's not the point! Update: Alright, fine! To all the (non-existing) haters, I've added the CrossEntropy Loss. Kids, the demo below is why you should use appropriate loss function. We can see that MSE after 100k epochs is as good as CrossEntropy after 10k! Woah, that's cool! (Note to skeptics: yes, I've also compared both at 10k and MSE is much worse. Note to skeptics^2: yes, all initial conditions were the same, stop doubting!)

Comments (0)

Yay

The ML framework is coming along! Thinking deeply and carefully about all of the involved math and cs. Looking really good so far! Wrote the Dense layer, the logic for forward, and backpropagation. Next up: activation functions and better cost functions! Oh, and of course, since I have the generalized Dense layer coded, I'm gonna write the Sequential model wrapper. NO, I'M NOT STEALING FROM KERAS. This is a screenshot of me training a simple Linear Regression.

Comments (0)