Submitted Successfully!
Thank you for your contribution! You can also upload a video entry related to this topic through the link below:
Check Note
Ver. Summary Created by Modification Content Size Created at Operation
1 + 2079 word(s) 2079 2021-04-27 11:11:38 |
2 update layout and reference Meta information modification 2079 2021-04-28 05:13:13 | |
3 Created a new entry based on original content and included section on future directions of the field. -198 word(s) 1881 2021-06-11 11:39:52 | |
4 Update for clarity in section 3.3 + 23 word(s) 1904 2021-06-19 06:32:27 | |
5 Further edits for clarity in section 3.3 + 33 word(s) 1914 2021-06-19 06:47:40 | |
6 Added reference and content related to "Deep Learning for AI" in Commun ACM. + 94 word(s) 2008 2021-06-24 02:26:02 | |
7 Added subsection headers and content on abstract reasoning. + 418 word(s) 2426 2021-06-25 20:17:19 | |
8 Another update for clarity in section 3.3 + 60 word(s) 2486 2021-06-28 20:47:34 | |
9 Additional edits for clarity in section 3.3.3 -52 word(s) 2434 2021-06-29 06:57:12 | |
10 Update on task learning + 642 word(s) 3076 2021-09-22 07:15:41 | |
11 Update on task learning (minor edits for clarity) + 14 word(s) 3090 2021-09-22 08:41:36 | |
12 Updates to overall organization and abstract reasoning sections + 418 word(s) 3508 2021-09-27 10:10:49 | |
13 Minor update to subheader names + 418 word(s) 3508 2021-09-27 11:01:29 | |
14 Update for clarity and reorganization of the sections. -30 word(s) 3478 2021-09-30 08:46:24 | |
15 Minor edit. -30 word(s) 3478 2021-09-30 09:00:23 | |
16 Add discussion on Contrastive behavioral similarity embeddings for generalization in RL. + 175 word(s) 3653 2021-10-01 07:35:24 | |
17 Updates to section 4.3.1 for clarity. + 182 word(s) 3660 2021-10-01 08:57:39 | |
18 Additions to reinforcement learning section on abstract reasoning. + 295 word(s) 3955 2021-10-02 10:09:24 | |
19 Updates to further acknowledge the Deepmind article on open-ended learning. + 312 word(s) 3972 2021-10-03 08:00:22 | |
20 Addition to abstract reasoning section. + 424 word(s) 4084 2021-10-04 07:27:05 | |
21 Edit for typographical error. -2 word(s) 4082 2021-10-04 07:34:06 | |
22 Edits for clarity across the Entry. -193 word(s) 3889 2021-12-16 06:42:02 | |
23 Typo correction. -193 word(s) 3889 2021-12-16 06:45:52 | |
24 Another typo. Meta information modification 3889 2021-12-16 06:48:40 | |
25 Minor edit for clarity. + 2 word(s) 3891 2021-12-16 06:51:35 | |
26 Another typo. + 1 word(s) 3890 2021-12-16 06:53:35 | |
27 update layout + 2 word(s) 3891 2021-12-16 06:53:37 | |
28 Redundant article in sentence. + 1 word(s) 3890 2021-12-16 06:56:20 | |
29 Another minor edit. + 1 word(s) 3890 2021-12-16 06:58:43 | |
30 More edits. + 1 word(s) 3891 2021-12-16 07:03:58 | |
31 More edits. + 1 word(s) 3891 2021-12-16 07:09:30 | |
32 Another typo. + 1 word(s) 3891 2021-12-16 07:11:11 | |
33 Updates for clarity + 198 word(s) 4089 2021-12-21 03:24:48 | |
34 Extra space between words + 198 word(s) 4089 2021-12-21 03:28:13 | |
35 Minor edits + 198 word(s) 4089 2021-12-21 03:31:13 | |
36 Another minor edit + 198 word(s) 4089 2021-12-21 03:34:50 | |
37 More minor edits Meta information modification 4089 2021-12-21 03:52:10 | |
38 Add references on meta-learning and acquiring conceptual knowledge. + 375 word(s) 4464 2021-12-23 08:53:49 | |
39 Edit for clarity. + 373 word(s) 4462 2021-12-23 08:57:18 | |
40 Additional clarity and a new subheading. -93 word(s) 4369 2021-12-24 05:16:14 | |
41 Spelling correction. Meta information modification 4369 2021-12-24 05:20:39 | |
42 Added section on conceptual knowledge. + 483 word(s) 4852 2021-12-26 05:28:29 | |
43 Edits for clarity and updated knowledge with recent references. + 20 word(s) 4872 2022-04-26 03:25:47 | |
44 Further edits for clarity and formatting. + 1 word(s) 4873 2022-04-26 03:58:11 | |
45 update layout Meta information modification 4873 2022-04-26 04:32:50 | |
46 Minor edits. -1 word(s) 4872 2022-04-26 11:40:19 | |
47 Edits for clarity. -4 word(s) 4868 2022-04-29 10:15:29 | |
48 Use standard character for a double quote. Meta information modification 4868 2022-04-29 10:21:14 | |
49 Add two references on reinforcement learning and transformer. -1 word(s) 4867 2022-06-05 23:18:11 | |
50 Fix style of several references. Meta information modification 4867 2022-06-05 23:26:10 | |
51 Correction for clarity. -14 word(s) 4853 2022-06-05 23:38:03 | |
52 Add reference to Deepmind's generalist agent. Meta information modification 4853 2022-06-06 05:32:20 | |
53 Use upper case for title in reference. Meta information modification 4853 2022-06-06 05:34:38 | |
54 Use standard dash character. Meta information modification 4853 2022-06-06 05:35:47 | |
55 Italicize heading 2 type; add references on multimodal transformer. Meta information modification 4853 2022-06-06 20:44:52 | |
56 Italicize section 1 heading type 2. Meta information modification 4853 2022-06-06 20:53:53 | |
57 Add section on the decision transformer. + 362 word(s) 5215 2022-06-06 21:44:16 | |
58 Section 5.2 improved for clarity. + 10 word(s) 5225 2022-06-06 21:49:33 | |
59 Further clarifications in section 5.2. + 31 word(s) 5256 2022-06-07 07:29:50 | |
60 Minor corrections in 5.2. -18 word(s) 5238 2022-06-07 07:39:02 | |
61 Add reference on transfer learning in tasks. Meta information modification 5238 2022-06-08 21:27:51 | |
62 Added references on neural network to solve math problems; and for video captioning. -2 word(s) 5236 2022-06-10 00:11:21 | |
63 Minor corrections to reference and text. -3 word(s) 5233 2022-06-10 00:26:06 | |
64 Added paragraph in section 5.2; minor correction for clarity. + 61 word(s) 5294 2022-06-10 07:08:49 | |
65 Add and update references; minor corrections; add references on modularity and sparsity. -1 word(s) 5293 2022-06-11 07:06:28 | |
66 Minor correction to a reference; add citation for figures. Meta information modification 5293 2022-06-11 07:28:13 | |
67 Add discussion and references on video-based learning. + 87 word(s) 5380 2022-06-24 11:26:06 | |
68 Fix author style in reference. Meta information modification 5380 2022-06-24 11:31:01 | |
69 Add a supporting reference. Meta information modification 5380 2022-06-28 13:57:28 | |
70 Redundant name in author list. Meta information modification 5380 2022-06-28 14:02:36 | |
71 Edits for clarity in first half of the Entry. -9 word(s) 5371 2022-07-01 19:45:48 | |
72 Edits for clarity in second half of the Entry. + 32 word(s) 5403 2022-07-01 20:45:34 | |
73 Minor fixes to punctuation. Meta information modification 5403 2022-07-01 20:49:58 |

Cognition is the acquisition of knowledge by the mechanical process of information flow in a system. In animal cognition, input is received by the sensory modalities and the output may occur as a motor or another response. The sensory information is internally transformed to a set of representations which is the basis for cognitive processing. This is in contrast to the traditional definition based on mental processes and originating from metaphysical philosophy.

  • animal cognition
  • cognitive processes
  • physical processes
  • mental processes

1. Definition of Cognition

1.1. A Scientific Definition of Cognition

Dictionaries commonly define cognition as a mental process for the acquiring of knowledge. However, this view originated from the assignment of mental processes to the act of thought. These mental processes originate from a metaphysical description of cognition that includes the concepts of consciousness and intentionality[1][2]. This also assumes that objects in nature are reflections of a true and determined form.

Instead, a material description of cognition is restricted to the physical processes available in nature. An example is from a study of primate face recognition where measurements of facial features serve as the basis for object recognition[3]. This perspective also excludes the concept that there is an innate and prior knowledge of objects, so instead cognition forms a representation of objects from their constituent parts[4][5], and the physical processes of cognition are not inherently and functionally deterministic.

1.2. Mechanical Perspective of Cognition

Scientific work generally accedes to a mechanical description of information processing in the brain. However, a perspective based on the duality of physical and mental processing is retained in various academic disciplines. For example, there is a conjecture about the relationship between the human mind and a simulation of it[6]. This idea is based on assumptions about intentionality and the act of thought. Instead, a physical process of cognition is defined by the generation of action by neuronal cells without dependence on non-material processes[7].

Another result of physical limits on cognition is observable in the intention of moving a body limb, such as a person reaching for an object on a table. Instead, studies have replaced the assignment of intentionality with a material interpretation of this action, and show that the relevant neural activity occurs before the perceptual awareness of the motor action[8].

Across the natural sciences, the neural system is studied at various biological scales, including at the molecular level to the higher level of information processing and synthesis[9][10]. At this higher order perspective, the neural systems are functionally analogous to the deep learning models of computer science[11][12]. This allows for a comparative approach for understanding cognitive processes. However, at the lower scale, the artificial neural system is dependent on an abstract model of neurons and the network, so at this scale the animal neural system is not likely comparable.

1.3. Scope of this Definition

The definition of cognition as used here is restricted to a set of mechanical processes. Also, the cognitive processing is described from a broad perspective with examples from the visual system and the deep learning approaches from computer science.

2. Visual Cognition

2.1. Evolution and Probabilistic Processes

The visual cortical system occupies about one-half of the cerebral cortex. Along with language processing in humans, vision is a major source of sensory input and recognition of the outside world. The complexity of the sensory systems reveals an important aspect of the evolutionary process, as observed across cellular organisms, along with their countless forms and novelties. Evolution depends on physical processes, such as mutation and population exponentiality, along with a dependence on geological time scales for building biological complexity as observed at all scales of life. These effects have formed and shaped the biosphere of the Earth.

Life's vast complexity is seen by deconstruction of the camera eye. This novel form emerged over time from a simpler organ, such as an eye spot, and which depended on a sequence of adaptive changes[13][14]. These rare and unique events did not hinder an independent formation of a camera eye, as it occurs in the lineages of vertebrates and cephalopods. This is an example of evolution as a powerful generator of change in physical traits, although counterforces restrict the generation of an infinite number of varieties, including by the finiteness of the genetic code and the constraints of physical laws on traits, where these traits range at all biological scales.

The evolution of cognition and neural systems are expected to occur by a similar probabilistic process as theorized for the origin and design of the camera eye. Therefore, the neural systems are expected to show an efficiency in design, as in the other biological systems, especially since the neural system undergoes a coevolution and continual adaptation for operation with the sensory organs. However, this efficiency is also constrained by the limits of molecular and cellular processes.

2.2. Abstract Encoding of Sensory Input

"The biologically plausible proximate mechanism of cognition originates from the high-dimensional information from the outside world. In the case of vision, the sensory data consist of reflected light rays that are absorbed across the 2-dimensional surface, the retinal cells, of the eye. These light rays may range across the electromagnetic spectra, but the retinal cells are specific to a small subset of these light rays"[1].

Figure 1 shows the above view, in abstract form, as a sheet of neuronal cells that receive sensory input from the outside world. The input is processed by cell surface receptors and communicated downstream for processing by the neural system. The sensory neurons and their receptors can be imagined as a set of activation values that are undergoing change over time, and abstractly described as a dynamical system.

Neurosci 02 00010 g001 550

Figure 1. An abstract representation of information that is received by a sensory organ, such as the light rays absorbed by neuronal cells across the retinal surface of the camera eye[1].

The information processing at the sensory organs are tractable to study, but the downstream cognitive processes are less understood at a proximate level. The cognitive processes include the generalizing of knowledge, also referred to as transfer learning, and is a higher level of organization that is constructed from the sensory input[5][15][16]. Transfer learning is dependent on segmentation of the sensory world and identification of sensory objects with resistance to viewpoint, or perspective (Figure 2)[17]. In computer science, there is a model[4] designed for segmentation and robust recognition of objects. This approach includes the sampling of sensory input, the parts of sensory objects, and encoding the information in an abstract form for presentation to downstream neural processes. The encoding scheme is expected to include a set of discrete representational levels of unlabeled objects, and then employ a probabilistic approach for matching these representations to known objects in memory. Without the potential for a labeled memory that describes an object, then there is no opportunity for knowledge of the object, and a basis for knowledge in general.

Neurosci 02 00010 g002 550

Figure 2. The first panel is a drawing of the digit nine (9) while the next panel is the same digit as transformed by rotation[1].

3. General Cognition

3.1. Algorithmic Description

Experts have investigated the question on whether there is an algorithm that explains brain computation[18]. They concluded that this is an unsolved problem, even though natural processes are inherently representable by a quantitative model. However, information flow in the brain is a product of a non-linear dynamical system, a complex phenomenon that is analogous to the physics of fluid flow, a complexity that may exceed the limits of computational work. Similarly, these systems are highly complex and not easily mirrored by simple mathematical descriptions[18][19]. They recommend an empirical approach for disentangling these kinds of systems.

An artificial neural system, such as the deep learning architecture, has a strong potential for testing the elements of of natural cognition. The reason is engineered systems are built from parts and relationships that are known, whereas in nature the origin and history of the system is obscured by time and a large number of events, and scientific knowledge require extensive experimentation that is often confounded with error from sources, both known and unknown.

3.2. Encoding of Knowledge

It is possible to hypothesize about a model of object representation in the brain, and its artificial analog from deep learning. First, these cognitive systems are expected to encode objects by their parts, the basic elements of an object[3][4][5]. Second, it is expected that the process is stochastic as in other natural processes.

The neural network system is a programmable system[20], encoded with weight values along the connections in the network, and with activation values at the nodes. It is expected that the brain functions analogously at the level of information processing, since these systems are both based on non-linear dynamic principles of a network and the distributed representations of objects[5][18][21][22]. The encoding schemes are likely abstract and generated by probabilistic processes, and therefore, it follows that the coding is not generated by top-down deterministic processes.

Moreover, a physical interpretation of cognition requires the matching of patterns for generalization of knowledge. This is consistent with a view of cognition as a statistical machine with a reliance on sampling for robust information processing. With advancement in deep learning methods, such as invention of the transformer architecture[5][23], it is possible to sample and search for exceedingly complex patterns in a sequence of information, including in the case of object detection in a visual scene[24].

This sampling of the world is dependent on sensory modalities, such as vision and speech, which is the information for processing and robust formation of the internal representations.[25]

3.3. Representation of Concepts

Microsoft Research released a deep learning method, based on the transformer architecture, along with formal inclusion of curated and structured data, to achieve parity with humans in common sense reasoning[26]. Their example of this kind of reasoning is described by a question on what people do while playing a guitar. The common sense answer is for people to sing. This association is not a naive one since the concept of singing is not a property of a guitar.

They designed the exams that have questions and multiple choice answers. Their achievement of parity with humans is possible by the addition of the curated and structured data, and concatenation of it with the conventionally processed data from online and free-formed documents.

Their finding showed that an online corpus is insufficient for a full knowledge of concepts. The conventional transformer architecture is dependent and limited by the information inherent in a sequence of data for representation of conceptual knowledge. In their case, the missing component is the curated and structured data, and the results show a competitive capability for building concepts from representations as derived from the input data.

The use of a large sample of representations that correspond to an abstract or non-abstract object, or an event, is expected to increase robustness in the models of cognition[27]. A knowledge of concepts is expected to form in the same manner. If there are missing parts of a concept, then a person would have difficulty in forming the whole concept and applying it during problem solving.

3.4. Future Directions in Cognitive Science

3.4.1. Dynamics of Cognition

Is animal cognition as interpretable as a deep learning system? This question arises from the difficulty in disentangling the mechanisms of the animal neural system, whereas it is possible to record the changing states of an artificial system since its design is known. If the artificial system is analogous, then it is possible to gain insight into animal cognition[5][28]. However, an assumption for the analogy may not hold. For example, it is known that the mammalian brain is highly dynamic, such as in the rates of sensory input and the downstream activation of internal representations[18]. These dynamic properties are typically not feasibly modeled in deep learning systems, a constraint of hardware design and efficiency[18]. This was an impediment to design of an artificial system that is approximate of animal cognition, although there are concepts for modeling these dynamics, such as an architecture that includes “fast weights” and provides a form of true recursion in the network[5][18]. Recently, there are studies for this architecture that overcome the performance problem[29][30].

The artificial neural networks continue to scale in size and efficiency. This work is accompanied by empirical approaches for exploring sources of error in these systems, and this effort is dependent on a thorough understanding on the construction of the models. One avenue for increasing the robustness in the output is by combining many sources of sensory data, such as from the visual and language domains. Another approach is to establish unbiased measures in the reliability of model output[24]. Likewise, error in information processing is not resistant to bias in animals, such as in human cognition with documented biases in speech perception[31].

These approaches are integratable for emulating the modularity and breadth of function across an animal brain. For achieving this aim, meta-learning methods can create a formal, modular[32], and structured framework for combining disparate sources of data. This scalable approach would lead to building complex information systems and encompass the general processes of cognition[33][34].

3.4.2. Generalization of Knowledge

Another area of importance is the property of generalization in a model of cognition. This property may be approached by processing the levels of representation as formed from sensory input[4][35][36]. In an abstract context, this generalizability is based on the premise that information of the outside world is compressible, such as in its repeatability of patterns of sensory information, so that it is possible to classify objects and obtain knowledge of the world.

There is also the question of how to reuse knowledge outside the environment where it is learned, "being able to factorize knowledge into pieces which can easily be recombined in a sequence of computational steps, and being able to manipulate abstract variables, types, and instances"[5]. Therefore, it is relevant to have a model of cognition that includes the higher level representations of the parts of an object, whether derived from sensory input or internal to the neural network. However, the dynamic and various states of the internal representations are also likely contributors to the processes of reasoning.

3.4.3. Embodiment in Cognition

Lastly, there is uncertainty on the dependence of animal cognition on the outside world. This dependence has been characterized as the phenomenon of embodiment, so an animal cognition is also an embodied cognition, even in the case where the world is a machine simulation[18][37][38]. In essence, this is a property of a robotic and mechanical system, where its functions are fully dependent on input and output from the world. Although a natural system receives input, produces output, and learns at a time scale constrained by the physical world, an artificial system is not as constrained, such as in reinforcement learning[38][39][40], a method that can also reconstruct the sensorimotor functions in animals.

Deepmind[38] developed artificial agents in a 3-dimensional space that learn in a continually changing world. The method employs a deep reinforcement learning method in conjunction with dynamic generation of environments that lead to the unique arrangement of each world. Each of the worlds contain artificial agents that learn to handle tasks and receive rewards for completion of specified objectives. An agent observes a pixel image of an environment along with receipt of a "text description of their goal"[38]. Task experience is sufficiently generalizable that the agents are capable of adapting to tasks that are not yet known from prior experience. This reflects an animal that is embodied in a world and is learning interactively by the performance of physical tasks. It is known that animals navigate and learn from the world around them, so the above approach is a meaningful experiment in a virtual world. However, the above approach has fragility to tasks outside of its learned distribution.

4. Abstract Reasoning

4.1. Abstract Reasoning as a Cognitive Process

Abstract reasoning is often associated with a process of thought, but the elements of the process are ideally represented and restricted to physical processes. This restriction constrains the explanations for the emergence of abstract reasoning, as in the formation of new concepts in an abstract world. Moreover, a process of abstract reasoning may be compared against the more intuitive forms of cognition in vision and speech, as formed by sensory input to the neural network. Without the sensory input, the layers of the neural system are not expected to encode new information by a pathway, such as expected in the recognition of visual objects. Therefore, it is expected that any information system is dependent on an external input for learning, the essential process for knowledge by experience.

It follows that abstract reasoning is formed from an input source as received by the neural system. If there is no input that is relevant to a pathway of abstract reasoning, then the system is not expected to encode that pathway. This also leads to the hypothesis on whether abstract reasoning is comprised of one or more pathways, and the contribution of other unrelated pathways in cognition. It is probable that there is no sharp division between abstract reasoning and the other types of reasoning, and the likelihood that there is more than one form of abstract reasoning, such as in the case of solving puzzles that require manipulation of objects in the visual world.

Another hypothesis is on whether the main source of abstract objects is the internal representations. If true, then a model of abstract reasoning would involve the true forms of abstract objects, in contrast to the recognition of an object by reconstruction from sensory input to the neural network.

Since abstract reasoning is dependent on an input source, there is an expectation that deep learning methods, modeling the non-linear dynamics, are sufficient to model one or more pathways involved in abstract reasoning. This reasoning involves recognition of objects that are not necessarily sensory objects with definable properties and relationships. As with the training process to learn of sensory objects, it is expected that there is a training process to learn about the forms and properties of abstract objects. This class of problem is of interest since the universe of abstract objects is boundless, and their properties and interrelationships are not constrained by the limits in the physical world.

4.2. Models of Abstract Reasoning

A model of higher level cognition includes abstract reasoning[5]. This is a pathway or pathways that are expected to learn the higher level representations of sensory objects, such as from vision or speech, and that the input is processed and generative of a generalizable rule set. These may include a single rule or a sequence of rules. One model is for the deep learning system to learn the rule set, such as in the case of puzzles solvable by a logical operation[41]. This is likely the basis for a person to play a chess game, by memorizing prior patterns of information and events on the game board, which leads to general knowledge of the game system as a world model.

Similarly, another kind of visual puzzle is the Rubik's Cube. However, in this case the final state is known, where each face of the cube will share a single and unique color. Likewise, if there is a detectable rule set, then there must be patterns of information that allow construction of a generalized rule set.

The pathway to a solution can include the repeated testing of potential rule sets against an intermediate or final state of the puzzle. This iterative process is approachable by a heuristic search algorithm[5]. However, these puzzles are typically low-dimensional as compared to the abstract verbal problems, such as found in inductive reasoning. The acquisition of rule sets for verbal reasoning require a search for patterns in a high-dimensional space. In either of these cases of pattern searching, whether complex or simple, they are dependent on the detection of patterns that represent sets of rules.

It is simpler to imagine a logical operation as the pattern that offers a solution, but it is expected that inductive reasoning involves high-dimensional representations than an operator that combines boolean values. It is also probable that these representations are dynamic, so that there is a possibility to sample many valid representations.

4.3. Future Directions in Abstract Reasoning

4.3.1. Embodiment in a Virtual and Abstract World

While the phenomenon of embodiment refers to an occupant of the three dimensional world, this is not necessarily a complete model for reasoning on abstract concepts. However, it is plausible that at least some abstract concepts are approachable in a virtual three dimensional world, and similarly, Deepmind[38] showed a solution to visual problems across a generated set of three dimensional worlds.

A population and distribution of tasks are also elements in Deepmind's approach. They show that learning a task distribution leads to knowledge for solving tasks outside the prior task distribution[38][39]. This leads to the potential for generalizability in solving tasks, along with the promise that increased complexity in the worlds would lead to a further expansion in task knowledge.

However, the problem of abstract concepts extends beyond the conventional sensory representations as formed by cognition. Examples include visual puzzles with solutions that are abstract and require the association of patterns that extend beyond the visual realm, along with the symbolic representations from areas of mathematics[42][43].

Combining these two approaches, it is possible to construct a world that is not a reflection of the 3-dimensional space as inhabited by animals, but instead to construct a virtual world of abstract objects and the sets of tasks[39]. The visual and symbolic puzzles, such as in the case of chess and related boardgames[40], are solvable by deep learning approaches, but the machine reasoning is not generalized across a space of abstract environments and objects. The question is whether the abstract patterns to solve chess are also useful in solving other kinds of puzzles. It seems a valid hypothesis that there is at least overlap in the use of abstract reasoning between these visual puzzles and the synthesis of knowledge from other abstract objects and their interactions[38], such as in solving problems by the use of mathematical symbols and their operators[43][44]. Since humans are capable of abstract thought, it is plausible that the generation of a distribution of general abstract tasks would lead to a working system to solve a wide set of abstract problems.

If instead of a dynamic generation of three dimensional worlds and objects, there is a vast and dynamic generation of abstract puzzles, for example, then the deep reinforcement learning approach could train on solving these problems and acquiring knowledge of these tasks[38]. The question is whether the distribution of these applicable tasks is generalizable to an unknown set of problems, those unrelated to the original task distribution, and the compressibility of the space of tasks.

4.3.2. Reinforcement Learning and Generalizability

Google Research showed that an unmodified reinforcement learning approach is not necessarily robust for acquiring knowledge of tasks outside the trained task distribution[39]. Therefore, they introduced an approach that incorporates a measurement for similarity among worlds that are generated by the reinforcement learning procedure. This similarity measure is estimated by behavioral similarity, corresponding to the salient features by which an agent finds success in any given world. Given these salient features are shared among the worlds, the agents have a path for generalizing knowledge for success in worlds outside their experience. Procedurally, the salient features are acquired by a contrastive learning procedure, and embeds these values of behavioral similarity in the neural network itself[45].

The above reinforcement learning approach is dependent on both a deep learning framework and an input data source. The source of input is typically a two or three dimensional environment where an agent learns to accomplish tasks within the confines of the worlds and rules[38][39]. One approach is to represent the salient features of tasks and the worlds in a neural network. As Google Research showed[39], the process required an additional step in extracting the salient information for creating better models of the tasks and worlds. They found that this method is more robust to generalization of tasks. Similarly, in animal cognition, it is expected that the salient features to generalize a task are also stored in a neuronal network.

Therefore, a naive input of visual data from a two dimensional environment is not an efficient means for coding tasks that consistently generalize across environments. To capture the high-dimensional information in a set of related tasks, Google Research extended the reinforcement learning approach to better capture the task distribution[39], and it may be possible to mimic this approach by similar methods. These task distributions provide structured data for representing the dynamics of tasks among worlds, and therefore generalize and encode the high-dimensional and dynamic features in a low-dimensional form.

It is difficult to imagine the relationship between two different environments. A game of checkers and that of chess appear as different game systems. Encoding the dynamics of each of these in a deep learning framework may show that they relate in an unintuitive and abstract way[38]. This concept is expressed in the above article[39], that short paths of a larger pathway may provide the salient and generalizable features. In the case of boardgames, the salient features may not correspond to a naive perception of visual relatedness. Likewise, our natural form of abstract reasoning shows that patterns are captured in these boardgames, and these patterns are not entirely recognized by a single ruleset at the level of our awareness, but instead are likely represented at a high-dimensional level in the neural network itself.

For emulation of a process of reasoning, extracting its salient features from a pixel image is a complex problem, and the pathway may involve many sources of error. Converting images to a low-dimensional form, particularly for the salient subtasks, allows for a greater expectation on generalization and repeatability in the patterns of objects and events. Where it is difficult to extract the salient features of a system, then it is possible to translate and reduce the objects and events in the system to text-based descriptions, a process that has been studied and is interpretable, and a sequence of tokens[46][47][48][49].

Lastly, since the advanced cognitive processes in animals involve a widespread use of dynamic representations, it is plausible that the tasks are not merely generalizable, but may originate in the varied sensory and memory systems. Therefore, the tasks would be expressed by different sensory forms, although the low-dimensional representations are more generalizable, providing a better substrate for recognition of patterns, and essential for a general process of abstract reasoning.

5. Conceptual Knowledge

5.1. Knowledge by Pattern Combination and Recognition

In the 18th century, the philosopher Immanuel Kant suggested that synthesis of prior knowledge leads to new knowledge[50]. This theory of knowledge extends the concept of objects from a set of perfect forms to a recombination of these forms, leading to a boundless number of representations. This was the missing concept for explaining the act of knowing and an unnecessary incorporation of an immeasurable and cognitive ether. Therefore, the forces of knowledge are no longer dependent on description outside the realm of matter, or on hypotheses based on an unbounded complexity of material interactions.

It is possible to divide these objects and forms of knowledge into two categories, sensory and abstract. The sensory objects are ideally constructed from sensory input, even though this assumption is not universal. Instead, perception may refer to construction of these sensory objects, along with any error occurring in their associated pathways. In comparison, the abstract object is ideally a true form. An ideal example is a mathematical symbol, such as an operator for the addition of numbers[43]. However, an abstract object may coincide with sensory objects, such as an animal and its taxonomic relationship to other forms of animals.

Therefore, one hypothesis is that the objects of knowledge are instead a single category, but that the input to form the object is from at least two sources, including sensation of the outside world and the representation of objects as stored in memory.

A hypothetical example is from chess. A person is not able to calculate each game piece and position given all events on the board. Instead, the decision-making is largely dependent on boardgame patterns with respect to piece and position. However, the observable patterns as compared to all possible patterns is strongly bounded. One solution is the hypothesis that the patterns also exist as internal representations synthesized and formed into new patterns not yet observed. Evidence for this hypothesis is in the predictive coding of sensory input, that this compensatory action allows a person to perceive elements of a visual scene or speech a short time prior to its occurrence. This same predictive coding pathway may apply to the internal representations, such as in the chess gameboard patterns, and the ability to recombine prior objects of knowledge. The process of creating new forms and patterns would allow a person to greatly expand upon the number of observable patterns in a world.

To summarize further, the process of predictive coding of sensory information should also apply to the reformation of internal representations. This is a force of recombination that is expected to lead to a very large number of forms in memory, and for detection of objects and forms that are not yet observed. Knowledge by synthesis of priors has the potential to generate the magnitude of forms that is consistent with the extent of human thought. In this case, the cognitive ether of immeasurability or incomputability is not necessary for explaining the advanced forms of cognition.

5.2. Models of Generalized Knowledge

Evidence is mounting in support of a deep learning model, the transformer, for sampling data and constructing the high-dimensional representations[34][43][47][48][49][51][52][53]. A study by Google Research employed a decision transformer architecture to show transfer learning in tasks that occur in a fixed and controlled setting (Atari)[47]. This work supports the concept that generalized patterns occur in an environment with the potential for resampling those patterns in other environments. The experimental control of the environmental properties is somewhat analogous to the cognitive processes that originate in a single embodied source[37][38]. Altogether, the sampling of patterns is from the population of all possible patterns that occur in the system. A sufficiently large sampling of tasks is expected to lead to knowledge of the system. The system may be thought of as a physical system, and in this case, it is a visual space of two dimensions.

In another study, Deepmind questioned whether task-based learning can occur across multiple embodied sources, such as the patterns derived from torque-based tasks (robot arm) and those from a set of captioned images[48]. Their results showed evidence for transfer learning across heterogeneous sources, and that their model is expected to scale in power with an increase in data and model size.

These studies are complemented by Data Distributional Properties Drive Emergent In-Context Learning in Transformers[51]. This article shows convincing evidence for the superior performance in handling a sequence data model by the transformer architecture. They further showed the importance of distributional qualities and dynamics in the training data set, and its relationship to the properties of natural language data[51].

The above computational studies show strong evidence that model performance continues to scale with model size[47][48]. These models for generalized task learning occur in a particular setting. It is possible to consider the setting as a physical system, such as in a particular simulation or from our physical world[38][53][54][55]. With a robust sampling of tasks in a controlled physical system, it is possible to learn the system and transfer knowledge of tasks from the known to those unknown[38][53][54][55]. This is a form of pattern sampling that is robust in its representation of the population of all patterns that occur in a system. Deepmind has searched for these patterns in a system by deep reinforcement learning, while optimizing the approach by simultaneously searching for the shortest path toward learning the system[43][54]. This method is learning a world model.

Since the images with text description are leading to generalized task learning[48][53], then video with text description[52] is expected to enhance the model with a temporal dimension, and reflect tasks that are dynamic in time[55]. OpenAI has developed a deep learning method that receives input as video data, but with a minimal number of associated text labels, and is as capable as a human for learning tasks in a world model (Minecraft)[55]. There is also a question on the difference between simple and complex tasks. However, the tasks may be decomposed into their parts and patterns, although OpenAI's reinforcement learning system is achieving this aim without manual curation[55].


  1. Friedman, R. Cognition as a Mechanical Process. NeuroSci 2021, 2, 141-150.
  2. Vlastos, G. Parmenides Theory of Knowledge. In Transactions and Proceedings of the American Philological Association, The Johns Hopkins University Press: Baltimore, MD, USA, 1946, pp. 66-77.
  3. Chang, L., Tsao, D.Y. The code for facial identity in the primate brain. Cell 2017, 169, 1013-1028.
  4. Hinton, G. How to represent part-whole hierarchies in a neural network. 2021, arXiv:2102.12627.
  5. Bengio, Y., LeCun, Y., Hinton G. Deep Learning for AI. Communications of the ACM 2021, 64, 58-65.
  6. Searle, J.R., Willis, S. Intentionality: An essay in the philosophy of mind. Cambridge University Press, Cambridge, UK, 1983.
  7. Huxley, T.H. Evidence as to Man's Place in Nature. Williams and Norgate, London, UK, 1863.
  8. Haggard, P. Sense of agency in the human brain. Nature Reviews Neuroscience 2017, 18, 196-207.
  9. Ramon, Y., Cajal, S. Textura del Sistema Nervioso del Hombre y de los Vertebrados trans. Nicolas Moya, Madrid, Spain, 1899.
  10. Kriegeskorte, N., Kievit, R.A. Representational geometry: integrating cognition, computation, and the brain. Trends in Cognitive Sciences 2013, 17, 401-412.
  11. Hinton, G.E. Connectionist learning procedures. Artificial Intelligence 1989, 40, 185-234.
  12. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Networks 2015, 61, 85-117.
  13. Paley, W. Natural Theology: or, Evidences of the Existence and Attributes of the Deity, 12th ed., London, UK, 1809.
  14. Darwin, C. On the Origin of Species. John Murray, London, UK, 1859.
  15. Goyal, A., Didolkar, A., Ke, N.R., Blundell, C., Beaudoin, P., Heess, N., et al. Neural Production Systems. 2021, arXiv:2103.01937.
  16. Scholkopf, B., Locatello, F., Bauer, S., Ke, N.R., Kalchbrenner, N., Goyal, A., Bengio, Y. Toward Causal Representation Learning. In Proceedings of the IEEE, 2021.
  17. Wallis, G., Rolls, E.T. Invariant face and object recognition in the visual system. Progress in Neurobiology 1997, 51, 167-194.
  18. Rina Panigrahy (Chair), Conceptual Understanding of Deep Learning Workshop. Conference and Panel Discussion at Google Research, May 17, 2021. Panelists: Blum, L., Gallant, J., Hinton, G., Liang, P., Yu, B.
  19. Gibbs, J.W. Elementary Principles in Statistical Mechanics. Charles Scribner's Sons, New York, NY, 1902.
  20. Schmidhuber, J., 1990. Making the World Differentiable: On Using Self-Supervised Fully Recurrent Neural Networks for Dynamic Reinforcement Learning and Planning in Non-Stationary Environments. Technical Report FKI-126-90, Tech. Univ. Munich, 1990.
  21. Griffiths, T.L., Chater, N., Kemp, C., Perfors, A, Tenenbaum, J.B. Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Sciences 2010, 14, 357-364.
  22. Hinton, G.E., McClelland, J.L., Rumelhart, D.E. Distributed representations. In Parallel distributed processing: explorations in the microstructure of cognition; Rumelhart, D.E., McClelland, J.L., PDP Research Group, Eds., Bradford Books: Cambridge, Mass, 1986.
  23. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al. Attention Is All You Need. 2017, arXiv:1706.03762.
  24. Chen, T., Saxena, S., Li, L., Fleet, D.J. and Hinton, G. Pix2seq: A language modeling framework for object detection. 2021, arXiv:2109.10852.
  25. Hu, R., Singh, A. UniT: Multimodal Multitask Learning with a Unified Transformer. 2021, arXiv:2102.10772.
  26. Xu, Y., Zhu, C., Wang, S., Sun, S., Cheng, H., Liu, X., et al. Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention. 2021. arXiv:2112.03254.
  27. Zeng, A., Wong, A., Welker, S., Choromanski, K., Tombari, F., Purohit, A., et al. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language. 2022, arXiv:2204.00598.
  28. Chaabouni, R., Kharitonov, E., Dupoux, E., Baroni, M. Communicating artificial neural networks develop efficient color-naming systems. Proceedings of the National Academy of Sciences 2021, 118.
  29. Irie, K., Schlag, I., Csordás, R. and Schmidhuber, J. A Modern Self-Referential Weight Matrix That Learns to Modify Itself. 2022, arXiv:2202.05780.
  30. Schlag, I., Irie, K. and Schmidhuber, J. Linear transformers are secretly fast weight programmers. In International Conference on Machine Learning (pp. 9355-9366). PMLR, July 2021.
  31. Petty, R.E., Cacioppo, J.T. The elaboration likelihood model of persuasion. In Communication and Persuasion, Springer: New York, NY, 1986, pp. 1-24.
  32. Mittal, S., Bengio, Y., Lajoie, G. Is a Modular Architecture Enough? 2022, arXiv:2206.02713.
  33. Ha, D., Tang, Y. Collective Intelligence for Deep Learning: A Survey of Recent Developments. 2021, arXiv:2111.14377.
  34. Mustafa, B., Riquelme, C., Puigcerver, J., Jenatton, R., Houlsby, N. Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts. 2022, arXiv:2206.02770.
  35. Chase, W.G., Simon, H.A. Perception in chess. Cognitive Psychology 1973, 4, 55-81.
  36. Pang, R., Lansdell, B.J., Fairhall, A.L. Dimensionality reduction in neuroscience. Current Biology 2016, 26, R656-R660.
  37. Deng, E., Mutlu, B., Mataric, M. Embodiment in socially interactive robots. 2019, arXiv:1912.00312.
  38. Team, E.L., Stooke, A., Mahajan, A., Barros, C., Deck, C., Bauer, J., et al. Open-ended learning leads to generally capable agents. 2021, arXiv:2107.12808.
  39. Agarwal, R., Machado, M.C., Castro, P.S. and Bellemare, M.G. Contrastive behavioral similarity embeddings for generalization in reinforcement learning. 2021, arXiv:2101.05265.
  40. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140-1144.
  41. Barrett, D., Hill, F., Santoro, A., Morcos, A., Lillicrap, T. Measuring abstract reasoning in neural networks. In International Conference on Machine Learning, PMLR, 2018.
  42. Schuster, T., Kalyan, A., Polozov, O. and Kalai, A.T. Programming Puzzles. 2021, arXiv:2106.05784.
  43. Lewkowycz, A., Andreassen, A., Dohan, D., Dyer, E., Michalewski, H., Ramasesh, V., et al. Solving Quantitative Reasoning Problems with Language Models, 2022, arXiv:2206.14858.
  44. Drori, I., Zhang, S., Shuttleworth, R., Tang, L., Lu, A., Ke, E., et al. A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level. 2021, arXiv:2112.15594.
  45. Chen, T., Kornblith, S., Norouzi, M. and Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, PMLR, 2020.
  46. Chen, L., Lu, K., Rajeswaran, A., Lee, K., Grover, A., Laskin, M., et al. Decision Transformer: Reinforcement Learning via Sequence Modeling. Advances in Neural Information Processing Systems 2021, 34.
  47. Lee, K.H., Nachum, O., Yang, M., Lee, L., Freeman, D., Xu, W., et al. Multi-Game Decision Transformers. 2022, arXiv:2205.15241.
  48. Reed, S., Zolna, K., Parisotto, E., Colmenarejo, S.G., Novikov, A., Barth-Maron, G., et al. A Generalist Agent. 2022, arXiv:2205.06175.
  49. Fei, N., Lu, Z., Gao, Y., Yang, G., Huo, Y., Wen, J., et al. Towards artificial general intelligence via a multimodal foundation model. Nature Communications 2022, 13, 1-13.
  50. Kant, I., Smith, N.K. Immanuel Kant's Critique of Pure Reason. Translated by Norman Kemp Smith. Macmillan & Co, London, UK, 1929.
  51. Chan, S.C., Santoro, A., Lampinen, A.K., Wang, J.X., Singh, A., Richemond, P.H., et al. Data Distributional Properties Drive Emergent In-Context Learning in Transformers. 2022, arXiv:2205.05055.
  52. Seo, P.H., Nagrani, A., Arnab, A., Schmid, C. End-to-end Generative Pretraining for Multimodal Video Captioning. 2022, arXiv:2201.08264.
  53. Yan C., Carnevale F., Georgiev P., Santoro A., Guy A., Muldal A., et al. Intra-agent speech permits zero-shot task acquisition. 2022, arXiv:2206.03139.
  54. Guo, Z.D., Thakoor, S., Pîslar, M., Pires, B.A., Altché, F., Tallec, C., et al. BYOL-Explore: Exploration by Bootstrapped Prediction. arXiv, 2022, arXiv:2206.08332.
  55. Baker, B., Akkaya, I., Zhokhov, P., Huizinga, J., Tang, J., Ecoffet, A., et al. Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos. 2022, arXiv:2206.11795v1.
Contributor :
View Times: 1105
Revisions: 73 times (View History)
Update Time: 01 Jul 2022
Table of Contents


    Are you sure to Delete?

    Video Upload Options

    Do you have a full video?
    If you have any further questions, please contact Encyclopedia Editorial Office.
    Friedman, R. Cognition. Encyclopedia. Available online: (accessed on 09 August 2022).
    Friedman R. Cognition. Encyclopedia. Available at: Accessed August 09, 2022.
    Friedman, Robert. "Cognition," Encyclopedia, (accessed August 09, 2022).
    Friedman, R. (2021, April 28). Cognition. In Encyclopedia.
    Friedman, Robert. ''Cognition.'' Encyclopedia. Web. 28 April, 2021.