GPT3 Linguistics 101: Part 2, On Semantics, Meaning and Corpus

Part 2: On Semantics

We start with some definitions and the act of defining.

what is the what?
What is lexical?    
Lexical is a term used to describe the way in which words are stored in memory. The word "lexical" is derived from the Latin word "lexis" which means "word".
What is meaning? What is the meaning of life? What is the meaning of life? What is the meaning of life? What is the meaning of life? What is the meaning of life? What is the meaning of life?
Definition of meaning  
The meaning of a word is the set of all the things that it can be used to refer to. For example, the meaning of the word "dog" is the set of all dogs. The meaning of a word is often described in terms of its reference class
Definition of meaning (noun):  
The meaning of a word is the set of all the things that it can be used to refer to. The meaning of a word is the set of all the things that it can be used to refer to.
Definition of meaning (in the context of the present invention) is the ability to communicate with the user.  The present invention is directed to a method of communicating with a user, comprising the steps of:  (a) providing a plurality of symbols, each symbol having a meaning;  (b) providing a plurality of symbols
Definition of lexical (adjective):  
1. Of or pertaining to words; 2. Of or pertaining to the study of words; 3. Of or pertaining to the study of language; 4. Of or pertaining to the study of language in general; 5. Of or pertaining to the study of language in general
Definition of lexical (lexical)  
(n.) The part of speech which is the subject of a word.
(n.) The part of speech which is the subject of a word.
(n.) The part of speech which is the subject of a word.
lex·i·cal - /ˈleksək(ə)l/  
adjective 1. of, relating to, or being a system of rules or principles that govern the behavior of a group or society.

The Linguistic Stack

a rough schematic of origins of linguistic structure
  • Raw Data — the language corpus. all the words, sentences, data… with all the expected noise and redundancies.
  • Trained Model — the BPE (tokenizer) and various layers in the engines/models (the neural networks, etc). This is where the probabilistic maps live.
  • Inference Context at run time — this is the input to the model, the API request, whatever data/words we send in. “Zero Shot” classification if no “examples” of what we expect to get out are given.
  • The Target — Examples of Output Expectation — One and Few Shot examples that help the model condition its response.

Corpus

This is one of the rare times where digging deep into the details is going to materially improve our ability to understand emergent higher level behavior in GPT3. Knowing the corpus really well is extremely valuable to efficient linguistic interaction with the model.

Table 2.2, page 9, “Language Models are Few-Shot Learners”, https://arxiv.org/pdf/2005.14165.pdf

Trained Models

There’s nothing much extra to say on this that was covered in Part 1 of this essay series and any of the links provided there to GPT3 paper and info on transformer networks.

Inference Context

This is roughly the input, though generally the “first part of the input”. The input is considered on the whole as a way to prime the model to “know where to look in the model”. It helps to think of these models as giant maps.

GPT3 as a Map.
Meaning From Wikipedia, the free encyclopedia Meaning most commonly refers to:  Meaning (linguistics) Meaning (linguistics) is a term used in linguistics to refer to the meaning of a word or phrase.  Meaning (psychology) Meaning is a psychological concept that refers to the subjective interpretation of an event or object.
Meaning 
by Emily Dickinson

"I am the poet's friend,
I am the poet's friend,
I am the poet's friend,
I am the poet's friend."
"I am the poet's friend,
I am the poet's friend,
I am the poet's friend,
I am the poet's
Posted byu/Protobairus 
3 months ago
meaning
less, but I'm not sure if it's a good idea to use it as a title. I'm not sure if it's a good idea to use it as a title.
|word|part of speech|frequency|letter count| 
|meaning|noun|common|7|
|lexical|adjective|uncommon|7|
|word|part of speech|frequency|letter count|
|meaning|noun|common|7|
|lexical|adjective|uncommon|7|
|word|part of speech|frequency|letter count|
|meaning|noun|common|7|

The Target — One and Few Shot Examples

In machine learning we often refer to learning capability as zero shot, one shot and few shot to imply the amount of learning/knowledge a model has in it without prompting and how much context it needs to do the things we hope for. Often the ideal is stated as zero shot — we like our models to require the least amount of just in time context to be useful. However, that’s very hard to achieve particularly for machine learning models that are very general.

one shot example, with two “inference” components
math has enough structure to power robust completions at temp=0
now we’re getting a lot of sequences!
Beryllium 
CAS ID #: 7440-41-7
Affected Organ Systems: Gastrointestinal (Digestive), Immunological (Immune System), Respiratory (From the Nose to the Lungs)
Cancer Classification: EPA: Probable human carcinogen.
"""
Disulfoton
CAS ID #: 298-04-4
Affected Organ Systems: Developmental (effects during periods when organs are developing) , Neurological (Nervous System)
Cancer Classification: EPA: Evidence of noncarcinogenicity for humans.
"""
Gasoline, Automotive
CAS ID #: 8006-61-9
Affected Organ Systems: Dermal (Skin), Gastrointestinal (Stomach and Intestines, part of the digestive system), Neurological (Nervous System), Respiratory (From the Nose to the Lungs)
Cancer Classification: EPA: Not evaluated.
"""
Vinyl Chloride
CAS ID #: 75-01-4
Affected Organ Systems: Cardiovascular (Heart and Blood Vessels), Developmental (effects during periods when organs are developing) , Hepatic (Liver), Immunological (Immune System)
Cancer Classification: EPA: Known human carcinogen.
"""
Radium
-226
CAS ID #: 7440-02-7
Affected Organ Systems: Gastrointestinal (Digestive), Respiratory (From the Nose to the Lungs)
Cancer Classification: EPA: Not evaluated.
"""
Tetrachloroethylene
CAS ID #: 76-01-6
Affected Organ Systems: Cardiovascular (Heart and Blood Vessels), Developmental (effects during periods when organs are developing) , Hepatic (Liver), Neurological (Nervous System), Respir

Semantics and Meaning, A Deeper Perspective

The semantic meaning of language is a big, hairy topic that will never settle down. The challenge of computational linguistics is that we sometimes trick ourselves into thinking humans, mathematics, and our computer programs already make sense and make PERFECT sense and can always make perfect sense.

Conclusion-ish

To close this part 2 out it’s worth reviewing some more advanced examples of input-output relations in GPT3 the community has found:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store