Why we need a "thought" role in LLMs

In the march towards more useful AI Agents the biggest problem is the need to deal with hallucinations.

‍

Both the number and severity of hallucinations has steadily been getting better.

GPT-4o and similar models are really on a high level.

At Stubber we're doing long multiturn interactions, and as the context window increases so does the propensity for hallucinations to occur.

LLMs cannot "think" without outputting tokens. This is why CoT (Chain-of-Thought) and other strategies have been so successful. Getting the model to output a plan and reasoning before answering is really helpful.

But I think current strategies are a bit "hacky".

‍

Currently we're having to tell the model to put parts of the "assistant" role into tags like <thought> and <reflection>, and then parsing out those parts before returning a succinct answer to the user.

If we go back to the API and the concept of roles, we have essentially 4 roles (in the OpenAI API standard).

‍

system - for giving high level instructions

user - for the user input

assistant - for the LLM output (function calling also uses this role)

function - for returning data to the LLM

‍

If we break these down you can see we have 3 input roles (system, user, function) and only 1 output role.

I've been thinking about how valuable an additional output role might be, given that a model provider could train a model on its use.

The output role I'd love to see is called : thought

This idea has some shortcomings but it might trigger some other methods along the way.

‍

The goal is to allow the LLM to "think" and give it the freedom to do so.

Let us give the LLM a place to : make notes, have inner dialog, devise a plan of action, reflect on the conversation so far.

‍

My feeling is that it would allow for better alignment of the model as we only need to focus on the assistant messages for clarity, usefulness and adherence to norms and standards, while letting the model "think out loud" in the thought role.

How do I imagine this working?

‍

Here is an example :

‍

API Call 1 :

=====================

system : You assist in collecting details from people interested in entering a competition. You should collect all the information required and first confirm the information before entering them in the competition by calling the enter_competition tool. Greet the person and ask if they'd like to enter.

=====================

API Response 1 :

=====================

thought : I should greet the person in a friendly excited manner, first getting them to confirm their intent to enter.

=====================

At this point the developer could inspect the thought process and decide to improve the system prompt, or just tag the thought onto the messages chain and resubmit to the API.

So then the next API call would look like this:

API Call 2 :

=====================

thought : I should greet the person in a friendly excited manner, first getting them to confirm their intent to enter.

=====================

API Response 2 :

=====================

assistant : Welcome to the competition! Would you like to enter? I can help.

=====================

As we can see the thought role is still fully in control of the developer too, so if the developer wanted to "inject thoughts" into the mind of the LLM, this now provides a clean way to do it.

Let's continue the example, by imagining the user said yes and then do another API call.

API Call 3 :

=====================

thought : I should greet the person in a friendly excited manner, first getting them to confirm their intent to enter.

assistant : Welcome to the competition! Would you like to enter? I can help.

user : Yes please

=====================

API Response 3 :

=====================

thought : I should probably collect a full name and some sort of contact details, like mobile number and email.

=====================

‍

Again we can see that this allows the developer to peek into the LLMs thinking and either improve the prompt (by including which details should be collected) or otherwise manipulate the thought role on the next round into the LLM.

So the algorithm for the LLM is as follows :

output an assistant message only after a thought message

output a thought message after any system, user, assistant or function message

‍

Shortcomings

‍

This method has some shortcomings, as you can see from the examples.

Latency

It takes 2 API calls to get a response to the user, thus increasing latency.

Each assistant role only occurs after a previous thought role. I imagine training the model on this method is probably better than allowing the model the freedom to sometimes skip the thought process.

Token Usage

Having the LLM output though tokens before responding each time will increase token usage, given that the cost of tokens has been dropping significantly, I think this is a small price to pay for the increased visibility and steerability of this potential method.

‍

Summary

‍

We all know a person who speaks their mind constantly and a person who is rather deliberate and thoughtful, engaging the conversation only after considering a few factors.

What I'm proposing is that we work towards allowing LLMs to become more like the latter.

‍

This is the article intro text

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Duis curs justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristiq.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis curs justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Duis curs justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristiq.

September 7, 2024

Werner Stucky

Why we need a "thought" role in LLMs

In the march towards more useful AI Agents the biggest problem is the need to deal with hallucinations.

Here is an example :

Shortcomings

Summary

This is the article intro text

Other Articles

Why we need a "thought" role in LLMs

The failed promise of Chatbots and how to fix it

Evolving Technology vs New Technology