New Hardware for Machine Learning

It’s been over six weeks since Christmas but I’m waiting for a gift… I just ordered a new laptop to take my data science education to new spheres. After reading innumerable reviews about what laptop to buy for a data science/machine learning sandbox, I settled on an Eluktronics machine based on the universally rave reviews. Based on specs, it certainly is in a league far beyond any laptop I’ve owned before:

  • eluktronicsIntel i7-quadcore
  • 32GB RAM
  • 4GB NVIDIA GeForce GTX,
  • 256GB SSD
  • 2TB HD

I thought about going with a 6GB NVIDIA Geoforce but that wasn’t in the budget for now. Of course, neither is a Tesla P100 GPU. But I hope to at least be able to tinker with TensorFlow using this new rig.

Also after finally getting back to Linux, which I hadn’t played with in a few years (just replaced Ubuntu 8 with 16.04 on another laptop, I am going to try and run multiple Linux VMs on this new machine, basically have my own distributed cluster of machines. So in case you haven’t noticed, I’m having a blast here.

 

Epilogue to ‘My First Neural Network’

Here are a few more (after)thoughts and lessons learned with regard to my last post.

Grokking activation functions

I initially tried to work with a step function (i.e. default to use 1 if dot-product of inputs x weights > 0, else use -1). But I ran into issues with np.array() and determing which condition was met.Ways to do implement a step function would  be this would be:

>>> output = np.array([[3],[5],[-7]])
>>> def step(x):
	return np.where(x > 0,1,-1)

>>> step(output)
array([[ 1],
       [ 1],
       [-1]])
>>>
>>> def step2(x):
	return 1 * (x > 0)

>>> step2(output)
array([[1],
       [1],
       [0]])
>>>

So using the above step2 function for activation and leaving out the use of the derivative, the output from my perceptron becomes:

>>> 
Predictions:  [[1]
 [1]
 [1]
 [0]
 [0]
 [0]]
Final weights:  [[-0.21595599]
 [ 0.42064899]]
>>>

A great overview of activation functions can be found on medium.com. It lists a number of other functions I haven’t even yet looked at.

Why should you normalize inputs for your network?

Because it helps the network understand what it’s supposed to be learning where data isn’t already in some numeric format (see this github post).

sigmoidAnd because you want your input to fall into the range of values where your activation function has significant gradient so the network can learn. For sigmoid that’s somewhere between -3 and +3 (this is me looking at the graph). For more information, see this StackExchange thread.

The above is another reason why Use of the sigmoid function is so common; it has the advantage of being differentiable, allowing its derivative to be used in updating weights and minimizing the error, gradient descent. This is called gradient descent. The process by which this is achieved with neural networks is what’s referred to as ‘backpropagation’. You take the error in an output and feed it back through the network to get better results. The actual weighting of inputs through the network is referred to as ‘feedforward’. So backpropagation is really a mechanism of training your network. Once it’s trained, it operates in ‘feedforward’ mode.

More about the Sigmoid

Another issue I ran into was with using the sigmoid function: To use it with a numpy array as input, be sure to call numpy.exp(x) in the denominator. Trying math.exp(x), will get you a :

>>> #starting with same output array from above
>>> import math
>>> import numpy as np
>>> output
array([[ 3],
[ 5],
[-7]])
>>> math.exp(output)
Traceback (most recent call last):
File "<pyshell#204>", line 1, in <module>
math.exp(output)
TypeError: only length-1 arrays can be converted to Python scalars
>>> np.exp(output)
array([[  2.00855369e+01],
[  1.48413159e+02],
[  9.11881966e-04]])
>>>

Finally, you may hear mention of the logistic function, which turns out to be similar to the sigmoid in both behavior and use. That’s because the sigmoid is a special case of the logistic.

Logistic
f(x) = L / (1 + exp(-k(x-xo)))

Sigmoid
f(x) = 1 / (1 + exp(-x))

Taking a Look at Data Science

Realizing that I’m not alone in this, I have to admit I have become very curious about Data Science. I realize it’s one of the Buzz Word Trifacta (Data Science, Machine Learning, Big Data). But it sounds like both a fascinating field with huge untapped potential and an exciting space where sophisticated hunches meet statistical nuts and bolts (academica!) and a hacker’s skillset is as useful as more conventional database and computing systems experience. It sounds like a cross-disciplinary breeding ground for great ideas and new insights.

It’s especially the notion of uncovering insights hidden from traditional analytical tools or techniques through the use of Machine Learning or Big Data technology. It’s the discovery of patterns you weren’t even looking for or even knew you should consider. I like to speculate on any new project or similar scoping rt what the questions might be that I’m not asking because I don’t know I’m supposed to ask them. The Rumsfeldian unknown Unknowns. Data Science doesn’t provide all those questions and questions. But it can help with identifying some predictive factors.

Granted, you only have to spend a few minutes mulling over the examples of predictive analytics, say in e-commerce, to realize that we’re quickly moving into an age where ethics around capturing, ingesting, and interpreting data is going to be as important as the development of new algorithms. But there is so much potential for these insights helping us with energy efficiency, science and biotech research, agriculture… you name it, that I am hopeful.

pflach_mlI have a couple of books on my bookshelf that are introductions to the field. “Data Science for Business” by Foster Provost and Tom Fawcett and “Machine Learning” by Peter Flach. There is no doubt which of the two is better suited for bedtime reading – Provost & Fawcett, which is a largely non-technical introduction that focuses on business applications, and which one falls more into the category of textbook – Flach, which is substantially more infused with formulas and their derivations.

But I think both are good investments. What strikes meprovost is that the material they cover is very similar – various types of analytical models, types of data classification, segmentation – supervised (when classification has a target) versus unsupervised (where clustering occurs but you may not know why).

Provost & Fawcett provide an overview of what’s possible with these techniques, especially for business. Flach focuses on how to do it. So what’s really the difference between the two? I Data Science the discipline and Machine Learning the toolkit?

Since I am always big on definitions, I looked up both Data Science and Machine Learning.

“Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.” (https://en.wikipedia.org/wiki/Data_science)

This definition does not seem to do justice to the recent hype about the sexiness of data science. What’s missing is an emphasis on technology that allows doing things that used to not be possible until very recently, eventhough they might be based in math and statistics.

But then I’m reading on wikipedia that the term “data science” is actually 30 years old. So maybe it’s the buzz factor that is distorting my expectations.

I see Data Science at the intersection of science/math, coding/automation, and understanding of business or other domain’s needs. It’s multi-disciplinary and implies wearing many hats. There was a great analogy to the tasks involved in preparing a meal on this page

https://www.quora.com/What-is-data-science-and-what-is-it-not

It’s a combination of using various skills and knowledge to extract as much knowledge (and meaning?) from data as possible. (There are actually various good contributions on that same page, including a curriculum of what to study if you want to be a data scientist. But I’ll save that for another post.)

So how about Machine Learning? Is it just part of data science, one of the things a Data Scientist does, one of the main things? My butchered quote from Wikipedia…

“Machine learning […] gives ‘computers the ability to learn without being explicitly programmed. (Arthur Samuel)’ …evolved from the study of pattern recognition and computational learning theory in artificial intelligence, […] explores the study and construction of algorithms that can learn from and make predictions on data […] algorithms overcome following strictly static program instructions by making data-driven predictions or decisions […] is employed in a range of computing tasks where designing and programming explicit algorithms with good performance is difficult or infeasible […] ; example applications include email filtering, detecion of network intruders” (https://en.wikipedia.org/wiki/Machine_learning)

Long story short, I think this is a very exciting field simply because it’s so vast, so loosely defined in the disciplines it covers. It heavily leverages the newest technologies and is based in science, without being too academic, it’s all about learning, experimenting, it’s fail fast/learn fast, it’s to statistics what hacking is to programming (ok, now I’m getting poetic and muddying up more definitions).

I would love to find an opportunity to be part of a data science team. So if you got an opening for someone with Python/SQL skills and never fading curiosity, let me know.