The Brain-Mind Institute (BMI) has been established to faciliate the communication, education, and research on the science of our brains, including how each individual brain works and how brain groups work together.
Historically, public acceptance of science was slow. For example, Charles Darwin waited about 20 years (from the 1830s to 1858) to publish his theory of evolution for fear of public reaction. About 20 years later (by the 1870s) the scientific community and much of the general public had accepted evolution as a fact. Of course, the debate on evolution still goes on today.
Is the public acceptance of science faster in modern days? Not necessarily so, even though we have now better and faster means to communicate. The primary reason is still the same but much more severe—the remaining open scientific problems are more complex and the required knowledge goes beyond a typical single person.
For instance, network-like brain computation — connectionist computation (e.g., J. McClelland and D. Rumelhart, Parallel Distributed Processing, 1986) — has been long doubted and ignored by industry. Deep convolutionalnetworks appeared by at least 1980 (K. Fukushima). Max-pooling technique for deep convolutional networks was published by 1992 (J. Weng et al.). However, Apple, Baidu, Google, Microsoft, Samsung, and other major related companies did not show considerable interest till after 2012. That is a delay of about 20 years. The two techniques above are not very difficult to understand. However, these two suddenly hot techniques have already been proved obsolete by the discoveries of more fundamental and effective principles of the brain, six of which are intuitively explained below.
Industrial and academic interests have been keen on a combination of two things — easily understandable tests (e.g., G. Hinton et al. NIPS 2012, congratulations!) and major companies are involved (e.g., Google, thanks!). We have read statements like “our results can be improved simply by waiting for faster GPUs and bigger datasets to become available” (G. Hinton et al. NIPS 2012). However, the newly known brain principles have told us that the ways to conduct such tests (e.g., ImageNet) will give only vanishing gains that do not lead to a human-like zero error rate, regardless how long the Moore’s Law can continue and how many more static images are added to the training set. Why? All such tests used static images in which objects mix with the background. Such tests therefore prevent participating groups from seriously considering autonomous object segmentation (free of handcrafted object model). Through synapse maintenance (Y. Wang et al. ICBM 2012), neurons in a human brain automatically cut off inputs from background pixels if background pixels matched badly compared with attended object pixels. Our babies spend much more time in dynamic physical world than seeing static photos.
Our industry should learn more powerful brain mechanisms that went beyond conventional well-known, well-tested techniques. The following gives some examples:
(1) Deep Learning Networks (e.g., J. Weng et al. IJCNN 1992, Y. LeCun et al. Proceedings of IEEE 1998, G. Hinton et al. NIPS 2012) are not only biologically implausible but also functionally weak. The brain uses a rich network of processing areas (e.g., Felleman & Van Essen, Cerebral Cortex 1991) where connections are almost always two-way, not a cascade of modules like the Deep Learning Networks. Such a Deep Learning Network is not able to conduct top-down attention in a cluttered scene (e.g., attention to location or type in J. Weng, Natural and Artificial Intelligence, 2012 or attention to more complex object shape as reported in L. B. Smith et al.Developmental Science 2005).
(2) Convolution (e.g., J. Weng et al. IJCNN 1992, Y. LeCun et al. Proceedings of IEEE 1998, G. Hinton et al. NIPS 2012) is not only biologically implausible, but also computationally weak. Why? All feature neurons in the brain carry not only sensory information but also motor information (e.g., Felleman & Van Essen, Cerebral Cortex 1991) so that later-processing neurons become less concrete and more abstract — which is impossible to accomplish using the shift-invariant convolution. Namely, convolution is always location-concrete (even using max-pulling) and never location-abstract.
(3) Error back-propagation in neural networks (e.g., Y. LeCun et al. Proceedings of IEEE 1998, G. Hinton et al. NIPS 2012) is not only biologically implausible (e.g., a baby does not have error in his motors) but also damaging to long-term memory because of its lack of match-based competition for error-causality (such as those in SOM, LISSOM, and LCA as optimal SOM). Even though the gradient vector identifies a neuron that can reduce the current error, the current error is not the business of that neuron at all and it must keep its own long-term memory unchanged. That is why error back-propagation is well known to be bad for incremental learning and requires research assistants to try many guesses of initial weights (i.e., using the test set as the training set!). Let us not be blinded by artificially low error rates.
Do our industry and public need another 20 years?
On the other hand, neuroscience and neuropsychology have made many advances by providing experimental data (e.g., Felleman & Van Essen, Cerebral Cortex 1991). However, it has been well recognized that these disciplines are data-rich and theory-poor. The phenomena of brain circuits and brain behavior are extremely rich. Many researchers in these areas use only local tools (e.g., attracters that can only be attracted into local extrema) and consequently have been overwhelmed by the richness of brain phenomena. A fundamental reason is that they miss the guidance of the global automata theory of computer science, although previous automata do not emerge. For example, X. -J. Wang et al. Nature 2013 stated correctly that neurons of mixed selectivity were rarely analyzed but have widely observed. However, the mixed selectivity has already been well explained, as a special case, by the new Emergent Turing Machine in Developmental Networks in a theoretically complete way. The traditional Universal Turing Machine is a theoretical model for modern-day computers — how computers work — but they do not emerge. The mixed selectivity of neurons in such a new kind of Turing Machine are caused by emergent and beautiful brain circuits, but each neuron still uses a simple similarity of inner product in its high dimensional and dynamic input space.
October 2011, a highly respected multi-disciplinary professor kindly wrote: “I tell these students that they can work on brains and do good science, or work on robots and do good engineering. But if they try to do both at once, the result will be neither good science nor good engineering.” How long does it take for the industry and public to accept that the pessimistic view of the brain was no longer true even then?
The brain principles that have already been discovered could bring fundamental changes in the way humans live, the way countries and societies are organized, our industry, our economy, and the way humans treat one another.
The known brain principles have told us that the brain of anybody, regardless of his education and experience, is fundamentally shortsighted, in both space and time. Prof. Jonathan Haidt documented well such shortsightedness in his book “The Righteous Mind: Why Good People Are Divided by Politics and Religion”, although not in terms of brain computation.
In terms of brain computation, the circuits in your brain self-wire beautifully and precisely according to your real-time experience (the genome only regulates) and their various invariance properties required for abstraction also largely depend on experience. Serotonin (caused by, e.g., threats), dopamine (caused by e.g., praise), and other neural transmitters quickly bias these circuits so that neurons for more long-term thoughts lost in competition to fire. Furthermore, such bias has a long-term effect. Therefore, you make long-term mistakes but you still feel you are right. Everybody is like that. Depending on experience, shortsightedness varies in terms of subject matter.
Traditionally, many domain experts think that computers and brain appear to use very different principles. Naturally emerging Turing Machine in Developmental Networks that has been mathematically proved (see J. Weng, Brain as an Emergent Finite Automaton: A Theory and Three Theorems, IJIS, 2015) should change our intuition.
The new result proposed the following six brain principles:
- The developmental program (genome-like, task-nonspecific) regulates the development (i.e., lifetime learning) of a task-nonspecific “brain-like” network —— Developmental Network. The Developmental Network is of general-purpose—can learn any body-capable tasks, in principle. Not only pattern recognition.
- The brain’s images are naturally sensed images of cluttered scenes where many objects mix. In typical machine training (e.g., G. Hinton et al. NIPS 2012), each training image has a bounding box drawn around each object to learn, which is not the case for a human baby. Neurons in the Developmental Network automatically learn object segmentation through synapse maintenance.
- The brain’s muscles have multiple subareas where each subarea represents either declarative knowledge (e.g., abstract concepts such as location, type, scale, etc.) or non-declarative knowledge (e.g., driving a car or riding a bicycle). Not just discrete class labels in global classification.
- Each brain in the physical world is at least is a Super Turing Machine in a Developmental Network. Every area in the network emerges (does not statically exist, see M. Sur et al. Nature 2000 and P. Voss, Frontiers in Psychology 2013) using a unified area function whose feature development is nonlinear but free of local minima, contrary to engineering intuition — not convolution; not error back-propagation.
- The brain’s Developmental Network learns incrementally—taking one-pair of sensory pattern and motor pattern at a time to update the “brain” and discarding the pair immediately after. Namely, a real brain has only one pair of stereoscopic retinas which cannot store more than one pair of image. Batch learning (i.e., learn before test) is not scalable: Without a mistake in an early test, a student cannot learn how to correct the mistake later.
- The brain’s Developmental Network is always optimal—Each network update in real time computes the maximum likelihood estimate of the “brain”, conditioned on the limited computational resources and the limited learning experience in its “life” so far. One should not use the test set as a training set: report only the best network after trying many networks on the test set.
The logic completeness of a brain is (partially, not all) understood by a Universal Turing Machine in a Developmental Network. This emergent automaton brain model proposes that each brain is an automaton, but also very different from all traditional symbolic automata because it programs itself—emergent. No traditional Turing Machine can program itself but a brain Turing Machine does.
The automaton brain model has predicted that brain circuits dynamically and precisely record the statistics of experience, roughly consistent with neural anatomy (e.g., Felleman & Van Essen, Cerebral Cortex, 1991). In particular, the model predicted that “shifting attention between `humans’ and `vehicles’ dramatically changes brain representation of all categories” (J. Gallant et al. Nature Neuroscience, 2013) and that human attention “can regulate the activity of their neurons in the medial temporal lobe” (C. Koch et al.Nature, 2010). The “place” cells work of the 2014 Nobel Prize in Physiology or Medicine implies that neurons encode exclusively bottom-up information (place). The automaton brain model challenges such a view: Neurons represent a combination of both bottom-up (e.g., place) and top-down context (e.g., goal) as reported by Koch et al. and Gallant et al.
Unfortunately, the automaton brain model implies that all neuroscientists and neural network researchers are unable to understand the brain of their studies without a rigorous training in automata theory. For example, traditional models for nervous systems and neural networks focus on pattern recognition and do not have the capabilities of a grounded symbol system (e.g., “rulefully combining and recombining,” Stevan Harnad, Physica D, 1990). The automata theory deals with such capabilities. Does this new knowledge stun our students and researchers or guide them so their time is better spent?
The Brain-Mind Institute aims to prepare everybody for the up-coming new brain era, so that he would not fall behind regardless where he is located in the world.
(Updated March 24, 2015)