On robot judges: part 4.

Back to Viewpoints

On robot judges: part 4.

Published on:

3 Aug 2023

min read

#notlegaladvice

#LLM

#AI

#notlegaladvice

This article is part of a series. View related content below:

Photo credit: Google DeepMind; https://www.pexels.com/photo/an-artist-s-illustration-of-artificial-intelligence-ai-this-image-depicts-the-process-used-by-text-to-image-diffusion-models-it-was-created-by-linus-zoll-as-part-of-the-visualising-ai-18069158

On #robot judges: part 4.

In part 3,¹ I suggested that a Large Language Model (#LLM) is NOT a program with millions of lines of code listing the answers to every possible question that might ever be asked.

In that case, how do LLMs decide what strings of characters to display in response to human inputs?

Let's dive in.

---

LLMs "generate"² responses to inputs using:
a) data; and
b) training.

First, data.

In order to build an LLM, we first need a dataset. And since LLMs generate text, we need a text dataset - from, for example, books, newspapers, magazines, Wikipedia entries, websites, online forum postings... The list goes on.

Second, training.

The first iteration of an LLM may have readable lines of code along the following lines:

> IF user input, THEN
> ⠀⠀Look for strings of characters in dataset that match user input
> ⠀⠀Look for strings of characters that follow such strings of characters; let's call these strings "possible_answers"
> ⠀⠀List all "possible_answers"
> ⠀⠀Break up each of the "possible_answers" into individual words or phrases
> ⠀⠀Remove repeated words or phrases; let's call remaining words and phrases "possible_fragments"
> ⠀⠀PRINT a string of characters based on "possible_fragments"

Now, that initial string of characters is going to be gibberish.

But that's where training comes in. This may take the form of a human operator who:
a) looks at a few output strings and ranks them in order of "correctness";
b) tells the LLM to try generating something closer to the more "correct" strings; and
c) repeats the process, oh, a few million times maybe.

This doesn't necessarily mean that we require millions of man-hours. It's possible to, for example:
a) get human operators to repeat the ranking process hundreds or thousands of times;
b) use that data to train a "reward model", which is another model that "rewards" the base LLM for generating answers that a human operator would have ranked higher; then
c) use the "reward model" to train the base LLM.³

Crucially, the LLM doesn't consider the "reasons" why certain outputs are ranked higher. Rather, over time, it prefers certain outputs because it is told to prefer these outputs.⁴

So unlike "traditional" programming, in which the rules by which output is displayed are drafted and defined, the rules which an LLM applies to generate its output are machine-generated and undefined.

---

Now some of you are probably wondering:

"Why should we care how an LLM generates answers? Isn't it good enough that LLMs are generating accurate and useful answers in response to inputs? And by extension, why shouldn't we use LLMs to judge disputes, if feeding in a question will get us that accurate answer?"

I suggest that understanding how LLMs generate answers is absolutely fundamental to the question of how we can, or should, use their answers.

In part 5, we'll explore issues with LLM-generated answers.

Disclaimer:

The content of this article is intended for informational and educational purposes only and does not constitute legal advice.

Footnotes:

¹ Part 1: https://www.linkedin.com/posts/khelvin-xu_robot-ai-llm-activity-7100325203108397056-Ghnn
Part 2: https://www.linkedin.com/posts/khelvin-xu_robot-llm-ai-activity-7102135406124548096-KPpB
Part 3: https://www.linkedin.com/posts/khelvin-xu_robot-llm-chatgpt-activity-7111997957616373760-vna5

² Which is why they're often referred to as a form of "generative" AI.

³ Further reading: https://towardsdatascience.com/different-ways-of-training-llms-c57885f388ed.

⁴ This brings to mind the apocryphal story of 5 monkeys in a cage who are sprayed with water every time they reach for a hanging banana. When 1 monkey is replaced, the other 4 monkeys pull back the newcomer everytime it reaches for the banana. Over time, as the monkeys are replaced one by one, they continue to pull newcomers back from the banana - without even knowing the reason why. Here's one telling of the story: https://bigthink.com/articles/monkeys-flea-jars-crab-buckets-and-educational-risk-taking.

An LLM is like one of the subsequent monkeys - it has been trained to provide certain outputs, but it doesn't know why it's generating these outputs.

Never miss a post

Share It On:

Featured Viewpoints

Tech

On AI, arbitration, and asset recovery (digital): part 4.¹

23 Jun 2025

min read

Trials and Tactics

On contract roles, careers, and choosing roles carefully.

16 Jun 2025

min read

Tech

On AI, arbitration, and asset recovery (digital): part 3.¹

2 Jun 2025

min read

Tech

On AI, arbitration, and asset recovery (digital): part 4.¹

23 Jun 2025

min read

Trials and Tactics

On contract roles, careers, and choosing roles carefully.

16 Jun 2025

min read

Tech

On AI, arbitration, and asset recovery (digital): part 4.¹

23 Jun 2025

min read

Trials and Tactics

On contract roles, careers, and choosing roles carefully.

16 Jun 2025

min read