Since a long time some of us, including myself, are looking for a "real" artificial intelligence (intelligent program? - you name it).
Able to learn.
So how do
we learn?
As mentioned in a previous post we can consider ourself as bunches of:
- sensors operating since our birth and even before that (eyes, ears, skin, ...)
- input interpretation services (brain)
- restitution services (memory)
- production services (voice, body)
- creation services (brain)
When we receive
useful input we can do something with it. Else the only thing we can do is ... well store it and see if it can be made useful at some point in time (or throw it away immediately).
I think sensor input is
useful in two situations.
In one situation the input looks like something perceived before. Not necessary identical but similar. So we can associate the new input with something in our memory and thus with everything related to it.
Is that learning? I think so because the variations in the input improves the related "pattern recognizer" and enables generalization.
In a second situation, to make an unrecognized input useful, it should be associated with some context: an input from another kind of sensor. Something you see and the word for it (read or heard) or how to use it. Learning foreign languages works best that way.
Association seems to me the most important word here. If input is not associated with something it is just... isolated (prfest) and not useful (yet).
I know there is more to it but this is enough to put you on the track for the rest of this post.
In our digital world all we have are bits and pieces called bytes.
And I like to look at them as a
flow.
A written text is nothing more then a flow of bytes, characters, words, sentences and paragraphs depending at what scale you look at them.
Do you need an explanation for a sound track?
Even a still image can be seen as a flow of pixels one line after another.
The inputs in a stream are associated in a timely way (one after another) and / or in a spacial way (one line of an image above the other). A flow of text? Time or space? Hard to say, but does it matter?
In a video with subtitles you have all of it together of course. Save the smell, at least not yet.
Now imagine a flexible, adaptive pattern analyzer/recognizer on one of these streams. Breaking it in to chunks that are similar to previous seen chunks. Or breaking it in chunks because the stream itself has patterns that return once and a while. Would you be able to make one for a flow of text?
Adaptive? Absolutely necessary. If your premisses are wrong when configuring such a pattern analyzer it will fail at some point. If its mission is "find large patterns" and its toolbox good enough to try hard (e.g. change the scale of the scrutinizing window) it might fail (if there is no pattern) but at least it's not because it has the wrong premisses. (Large means: go beyond the 0's and 1's, that scale is not very useful.)
Oh, one important instruction: never look back.
We will need probably only a small set of instructions to make a pattern analyzer (its toolbox). One that is also able to differentiate the various kind of streams (text, sound, image, ...) but hints might help of course.
You can call these chunks neurons if you like.
They are associated with other chunks in the stream (positional dependence) and hopefully with "old" chunks seen before. And with chunks from other kind of parallel streams (e.g. a spoken and a written word or a written word and an image).
Each time associated chunks in stream (or across parallel streams) find an echo in the old chunks the link betweens these chunks is reinforced. We can imagine a decay function that weakens links over time and eliminate them (forget) if they are not reactivated. This will remove childhood errors in chunking of the pattern analyzer.
A pattern analyzer, preferably the same code but different (learned) settings, can be used to discover patterns in associated chunks. Leading to higher level concepts. (Mark the shift of chunk to concept.)
What's different compared to actual machine learning techniques?
First we could ask if this is supervised or unsupervised learning. Supervised machine learning is applied when you have a labelled data set and the program finds the optimal parameters to match the data with the provided labels (a good example is optical character recognition OCR).
You could look at two parallel streams as one being the data and the other the labels. Only here you do not look for parameters to match both. They are matched (I should say associated) by definition.
So it doesn't really look like supervised learning.
And unsupervised? With this kind of machine learning you have (preferably a large amount of) identical kind of data (like in
this Google experiment) and try to find the higher level concepts in it. One of the essential differences here is that we look at different kinds of things (written and spoken words).
Doesn't feel like a good match with unsupervised learning either.
Second, this is life long learning (and appropriate forgetting) from streams of data. Not just training on a fixed set and then applying on other data. Google's self driving car does the same thing, but not all do.
I've the feeling (not only a dream) that this is the right track to go.
And you?
Next chapter