In their book Computer Age Statistical Inference, Brad Efron and Trevor Hastie give a nice description of neutral networks and deep learning.
The knee-jerk response [to neural networks] from statisticians was “What’s the big deal? A neural network is just a nonlinear model, not too different from many other generalizations of linear models.”
While this may be true, neural networks brought a new energy to the field. They could be scaled up and generalized in a variety of ways … And most importantly, they were able to solve problems on a scale far exceeding what the statistics community was used to. This was part computing scale expertise, part liberated thinking and creativity on the part of this computer science community.
After enjoying considerable popularity for a number of years, neural networks were somewhat sidelined by new inventions in the mid 1990’s. … Neural networks were passé. But then they re-emerged with a vengeance after 2010 … the reincarnation now being called deep learning.
i would like to add a bit about the colourful and controversial past of Neural Networks which makes them certainly not neutral.
The single layer variety of neural networks was around since the 50’s but it had the limitation of not being able to “learn” linearly-inseparable problems, e.g. XOR. The multi-layer variety (i.e. with an added intermediate – hidden – layer) could solve these kind of problems but as it was alleged in Minsky’s and Papert’s book “Perceptrons”, 1969, there was no known method for “training”, i.e. finding the optimal set of weights. Despite the fact that the ideas leading to what we call today “backpropagation” (the algorithm for training feed-forward neural networks) were around even before Minsky’s critique.
His co-author Seymour Papert in 1988 paralleled their critique to the story of Snow-white – the hunters who slew perceptrons being him and Minsky and the reward was Lord-DARPA’s coffers, the USA defence grants.
In the mid-80s “back-propagation” was formalised and the development of computer hardware and language abstraction enabled the revival and widespread use of neural networks to many diverse applications, including a japanese rice-cooker and the Terminator (2?) “I have a neural-net chip in my head” (with thick teutonic accent).
While, during Neural Network’s absence (1970-1980), the pompous proponents of top-down, classical AI have managed to single-handedly lead the field to a catastrophic, humiliating and financially disastrous (for lord DARPA) collapse ending with the burial of the “Fifth Generation Project”.
My PhD in the 90’s was about how single feed forward neural networks could be combined into creating something bigger which would not suffer from the scaling of monolithic neural networks (yes scaling up feed forward neural networks creates a lot of problems). This is called Feed Forward NN Entities and comes with its own “backpropagation” learning scheme.
As a personal opinion, not lacking pompousness I admit, allow me to say:
At the end, “top-down” versus “bottom-up” are THE two schools of thought which have led to the most dichotomies in history of science, philosophy and politics. Neural networks are only one aspect of what we call “Connectionism”, “the whole is greater than the sum of its parts”.
The value of Connectionism goes far beyond whether NN can do regression or “deep learn” or whatever.
Andreas Hadjiprocopis
Yeah but the chapter is still pretty short!
@Andreas
Thanks for your insight. Care to share your PhD link, papers, blog, website?
P.S. I hope that our kind host would not mind that.
@analytics
sure!
my thesis is at:
http://nfkb.scienceontheweb.net/various/andreas_hadjiprocopis_phd_FFNNEntities_2000.pdf
Chapter 2 talks a bit about Connectionism and the history of Neural Networks including what I mentioned in my earlier comment.
Some other info is at
http://nfkb.scienceontheweb.net/various
including my email.
A link to a paper relevant to FFNN Entities is:
http://link.springer.com/chapter/10.1007/BFb0032493
bw
Andreas
In the late ’80’s and early ’90’s I developed software for measurement instrumentation based on “novel physics” sensors. Physicists would typically present our engineering team with a working proof-of-concept lab-bench system that we’d need to reduce to a manufacturable prototype.
One such lab sensor included a PC stuffed with neural network accelerator boards developed by another group in the company. To meet its goals, the final instrument needed to be portable (ideally hand-held) and battery operated. My first task was to eliminate the PC and its NN boards.
My employer was only a few miles away from the UC San Diego labs of Robert Hect-Nielsen and Bart Kosko, so I started attending their classes and colloquia to learn about ways to convert trained NNs into algorithms that could be run in real-time on embedded DSPs or microprocessors. It turned out there was very little work being done in this area, since the trained NN was considered to be an end-goal in an of itself.
I used graph analysis to see if a trained NN could be converted to forms friendlier to embedded real-time processing. Basically, I needed to eliminate well over 99.9% of the computation being performed by the NN hardware while achieving comparable results.
It turned out that another very popular topic of that same period, Fuzzy Logic, had some interesting overlaps with trained NNs while being significantly simpler to compute. At the time, many viewed FL and NNs as competitors, instead of potential collaborators.
I started down the rabbit hole of trying to find an algorithm to convert the connections, weights and node characteristics of a trained NN to a functionally equivalent FL system. Initial efforts using ‘toy’ multi-layer NNs were encouraging, but the resources needed to generalize and expand the work to the scope needed for our instrument would have massively overrun its development schedule and budget, so the project was canceled before I could make further progress.
I moved on to other projects, and before long embedded processors had become fast enough to support more complex statistical analysis (e.g., FFT, PCA, SVD), and I was never able to revisit what I called “neural net unwinding”.
Do neural nets represent the “leading edge of ignorance” in signal processing? That is, are trained NNs being chased into new domains because more conventional (less resource-intensive) methods have evolved to push them out of earlier domains?
I wonder if our good host would mind my replying to Bob Cunningham’s comment on converting neural networks into simpler forms.
My DPhil back in 1990’s was on obtaining an explanation of how a neural network worked. It happens that an explanation for humans is also easy an easy one to compute. So this might be of interest. A copy is available from http://www.corbettclark.com (DPhil thesis is towards the bottom).
(incidentally, my supervisor Lionel Tarassenko was responsible for the neural network in the microwave oven mentioned in the article!)
Could it be that contemporary NN systems are far more computationally efficient than those of 30 years ago, and that conversion to other forms would not gain much performance?
I hope to soon have an opportunity to play with TensorFlow (under Python), and will see if I can get at least one data point in that area.