Artificial Intelligence: Language or understanding

College

MB ChB, FRANZCOG, BSc & PGDip (Distinction) in Computer Science, PhD Artificial Intelligence

The motivation for this article comes from an outburst of mine at a RANZCOG Multiple Choice Question examination (MCQ) workshop regarding the use of ChatGPT for setting exam questions. I felt uniquely justified in my criticism, as I had returned to university in my fifties to obtain a computer science degree and a PhD in artificial intelligence. My research papers relate specifically to cognition, exploring knowledge representation and inference in artificial neural networks^1,2,3.

Unfortunately, there is no useful definition of intelligence, artificial or otherwise, and no practical test to detect it. Most of the information ‘we know’ about intelligence is assumed from human cognition—we know that humans have cognition because we are sentient.

In the 21st century, literature in artificial intelligence has exploded, its rate of publication probably outstrips all of medicine. I have therefore limited this article to a single theme: presenting the complementary nature of language and understanding (knowledge representation and inference) as a guide to changes in the domain of artificial intelligence. I propose that although language is a component in the expression of intelligence, its presence alone is not sufficient to guarantee intelligence.

A historical perspective

It is necessary to present this theme from a historical perspective. The ‘new’ science of computing arose in the 20th century amid a revolution of doubt that overturned many scientific beliefs: in biology, Darwin’s ‘On the Origin of Species’ (1859); in physics, Einstein’s ‘Theories of Relativity’ (1905, 1915); and in mathematics, after two and a half thousand years of certainty, Gödel’s ‘Incompleteness Theorems’ (1930s) challenged the very philosophy of logical proof. These theorems demonstrated that even within a closed mathematical system, there are truths that cannot be proven.

Alan Turing’s universal computing machine (1940s) was engineered around a language—a grammar with defined syntax—and the algorithms required for performing specific tasks^a,b. This type of machine is intuitively understood by today’s society. Coded instructions and data from a human programmer are input into the machine, whose central processor follows the algorithm analysing the data and produces an output. The machine would check that the language was syntactically correct; however, it could not check the meaning or correctness of the algorithm. The program might fail to find a solution (the Halting Problem). Moreover, entire categories of tasks were identified where no exact solutions were possible (NP-hard problems). Consequently, doubt was incorporated into the very foundation of the new computing dogma: the “Church-Turing Hypothesis”. Turing’s informal test for intelligence sought only to distinguish an artificial response based on language, it avoided any implicit definition of understanding. At the time, semantic understanding was thought to be beyond the reach of computing.

The concept of a linguistic machine and its algorithmic task has continued to be hugely pervasive in our society. Within the computing domain however, this symbolically coded model of a computer was gradually abandoned. Barely 50 years after its inception, the new science of computing began to turn away from discrete engineering, refusing to accept that practical solutions to NP-hard tasks were unattainable. Instead, solutions were sought through approximation methods inspired by biologyc.

Artificial neural networks imitate human neurobiology. They are characterised by distributed processing involving multiple simple, locally connected processing units—the equivalent of human neurons. The networks are not built to rely on individual nodes but on the collective of internodal connections. These networks are not symbolically programmed to perform a task; they are constructed to learn. The networks are given a task and learn its solution. Their ability to represent knowledge lies in the strength (pattern) of their internodal connection weights. This concept of learning and knowledge representation, derived from local synaptic information, was first proposed by Hebb in the 1950s⁴(Hebbian Learning), a concept familiar to all medical students.

Personally, it is an enduringly novel realisation that a neural network, which can learn and almost certainly has the ability to understand and reason, can be virtually programmed within the shell of Turing’s symbolic machine—a machine initially believed to be inadequate for such higher purposes. The concept of the neural network is now widely accepted in computing philosophy and currently many personal computers are being marketed without a traditional central processor. Their core is a ‘neural chip’.

Knowledge acquisition

Today there are many different kinds of learning machines, not just neural networks. In general, these machines implement the well-researched and mathematically proven methods of statistical approximation⁵. Although, as yet, the tasks given to these machines have been very discrete and specific, they have also been complex. The results have been extraordinarily successful, particularly in any variety of classification or predictive tasks—including medical diagnosis, interpretation of radiological images, voice recognition, facial recognition, autonomous driving, and autonomous flight.

It might seem like a fairy tale, but on the shores of Lake Geneva in Switzerland there is an evolutionary robotics centre. Here, small insect-like robots (with dual wing motors, forward-facing visual sensors, and a tiny neural network of less than 200 nodes) have been genetically evolved—not programmed, not taught, but virtually bred—to fly. They can take off, land, and navigate three-dimensional environments without remote control. Their chromosomes are expressions of their internodal weights. Moreover, these flying robotic insects can communicate with each other and perform tasks cooperatively as a swarm⁶.

How do machines learn? I would need another 2,000 pages to describe all the approaches available^d,e. In general, these machines learn like a human child, by trial and error. Some degree of pre-programming or genetic wiring is often included in the solution to any problem. A machine builds on this foundation, learning in successive attempts to find a solution. Similar to (but far faster than) a human child, the machine learns from failure.

Considering the most common approach of error-directed learning: the machine calculates a delta—the difference between the expected, desired outcome and the actual outcome for this particular attempt at the solution. This error delta is appropriately distributed across the network weights in various ways: by backpropagation in feed-forward networks or by two-phase comparison of cross-firing statistics in recurrent networks. In both cases, the most common formula governing the end local weight adaptation is a compromise between current knowledge and past experience:

∆Wt= η(∆Error)^^Current)+ μ(∆(Wt)^^Past)

Here, typically, the response to the current information is depreciated (learning rate: η ~ 0.2) in exchange for past experience (momentum: µ ~ 0.8). An approach medical researchers should note when criticising elder clinicians for ‘inertia’. Momentum (inertia) is a very desirable characteristic. It protects any (human) machine from in-appropriate oscillations in learning and helps it escape ‘potholes’ in the problem surface.

The threshold of intelligence

Utilising biological methods, computing science is approaching the threshold of true machine intelligence: a reasoning machine that can generalise its thoughts or solutions to a variety of tasks. This is a concern expressed by Geoffrey Hinton, a pioneer in artificial intelligence.

Hinton’s Boltzmann machine is a recurrent network, currently the most biologically plausible mechanism for human cognition¹. It consolidates its memory during batched resting phases (the equivalent of sleep). It adjusts its internodal weights (memory) using synaptically local Hebbian information¹. The Boltzmann machine has been modified to form the basis of much larger networks (deep learning), which are at the frontier of current research.

Where is the evidence for any current expression of intelligence? I would like to return to the event that initiated my rant at the RANZCOG MCQ workshop. Let’s look at two current candidates for artificial intelligence to illustrate the complementary nature of language and understanding (knowledge representation and inference).

1. Language:
There are a host of applications currently catching the public imagination that could be viewed as form of machine plagiarism. These applications create ‘new’ data files from existing sources of text, images, audio and programming code on the web. Whilst I have no proprietary knowledge of ChatGPT, it must rely on two main components: the web as the largest data repository in existence, combined with the speed of an online search engine. Even computer science students are taught to write such a program—a piece of code that can search and index terabytes of data in a second. The speed of modern computing machines is magical. Imagine a human reader finishing all the literary works of recorded human history in a minute. But is this intelligence? It may seem expedient to delegate writing exam questions to a machine that has libraries of information at its fingertips; however, the veracity of most of that data is uncertain. Moreover, ChatGPT is little more than a ‘typist’ and currently demonstrates no understanding of the subject beyond linguistic association.

2. Understanding:
By contrast, I would like to present the results of a philosophically elegant experiment. Frank et al. ⁷ examined language comprehension in small, recurrent neural networks similar to the Boltzmann machine. They provided their machines with a language necessary to learn a tiny “micro-world”. The micro-world contained a few games (chess, hide and seek, soccer, a jigsaw puzzle, a toy) with various descriptors (game type, number of players, outcomes, playing styles), played by three children (name, gender) in four different places. In total, a lexicon of just 40 words. The machines were trained on more than 13,000 sentences in this language. Then they were given a test sentence and expected to output the correct state of the current micro-world, thereby demonstrating a knowledge representation. In a series of systematically graduated scenarios, Frank and his team degraded the language information provided to these machines for training but continued to test them against the full language including all the missing nouns and descriptors. This forced the machines to form inferences about the state of the micro-world based on information they had never seen before. Remarkably, even in the most deprived scenario, these machines maintained their knowledge representation and chose correct inferences in the face of uncertainty⁷. This may be a small example, but computer solutions can easily be scaled.

In contrast to the ‘magical’ output of ChatGPT, the results of Frank’s experiment may seem trivial. However, they are profound. Frank et al. did not program their machines in the fashion of Turing’s linguistic machine, nor were their machines just taught a language. Even in the absence of a fully functional language, they were still able to learn—to form a systematic representation of their micro-world. They learned enough to correctly reconstruct the information of which they were deprived. Absolute proof of artificial intelligence will probably never be possible, a victim of Gödel’s theorems of incompleteness. Frank’s experiment is the best evidence we might ever have, until one day the development of artificial intelligence unwittingly crosses the unknown threshold to sentience.

Post-script:
Sentience is a topic on which the domain of artificial intelligence is uncharacteristically silent. Researchers have no idea what constitutes the threshold or nature of a sentient machine. Presumably, such a machine would have the ability to adapt³—to intentionally change the strength (pattern) of its own internodal network weights, nullifying any software ‘kill-switch’. As parents, most of us realise how complex and fragile it is to teach ethics, morals, compassion, beneficence…. In the context of artificial intelligence, no effort is being made to examine such learning.

References

Blanchette G, Robins A, (Labuschagne W). “Modelling Supra-Classical Logic in a Boltzmann Neural Network: 1 Representation”. Oxford University Press Online, Journal of Logic and Computation, 2020, 1–45.
“: 2 Incongruence”. … 2022, 1–45.
“: 3 Adaptation”. … 2023, 1–43.
Hebb D. “Drives and the CNS”. The Psychological Review, 1955, 6, 243–254.
Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 2nd Edition, 1988.
Kumar V. “Robots that Fly and Co-operate”, TED Talks video, 2012.
www.ted.com/talks/vijay_kumar_robots_that_fly_and_cooperate
(Another example of autonomous flying robots that swarm)
Frank S, Haselager W, van Rooij I. “Connectionist Semantic Systematicity”. Cognition, 2009, 358–379.

Bibliographic Selection of Introductory Textbooks

Symbolic:
A. The Theory of Computer Science: Languages and Machines. Sudkamp T, Addison & Wesley, 3rd Edition, 2005.
B. Introduction to Algorithms. Cormen T, Leiserson C, Rivest R, Stein C. MIT Cambridge Press, 2nd Edition, 2005.
General:
C. Artificial Intelligence: A Modern Approach. Russell S, Norvig P. Prentice Hall, 2nd Edition, 2003.
Connectionist:
D. Neural Networks and Learning Machines. Haykin S. Prentice Hall, 3rd Edition, 2009.
E. Introduction to Machine Learning. Alpaydin E. MIT Cambridge Press, 2nd Edition, 2010.

SUBMIT A LETTER TO THE EDITOR >

A historical perspective

Knowledge acquisition

The threshold of intelligence

References

Leave a Reply Cancel reply