Enhancing AI Agents: A Proposal for a "Thought" Role to Manage Hallucinations

In the march towards more useful AI Agents the biggest problem is the need to deal with hallucinations.
Both the number and severity of hallucinations has steadily been getting better.

GPT-4o and similar models are really on a high level.

At Stubber we're doing long multiturn interactions, and as the context window increases so does the propensity for hallucinations to occur.


The Token-Based Thinking Problem

LLMs cannot "think" without outputting tokens. This is why CoT (Chain-of-Thought) and other strategies have been so successful. Getting the model to output a plan and reasoning before answering is really helpful.

But I think current strategies are a bit "hacky".

Currently we're having to tell the model to put parts of the "assistant" role into tags like <thought> and <reflection>, and then parsing out those parts before returning a succinct answer to the user.


Current Role Limitations

If we go back to the API and the concept of roles, we have essentially 4 roles (in the OpenAI API standard):

RolePurposeType
systemHigh level instructionsInput
userUser inputInput
assistantLLM output (including function calls)Output
functionReturning data to the LLMInput

Proposal: A Native "Thought" Role

I've been thinking about how valuable an additional output role might be, given that a model provider could train a model on its use. The output role I'd love to see is called: thought.

How It Would Work

Example Workflow

=====================
API Call 1:
=====================
system: You assist in collecting details from people interested in entering a competition...
=====================

=====================
API Response 1:
=====================
thought: I should greet the person in a friendly excited manner...
=====================

=====================
API Call 2 (with thought):
=====================
system: [Original prompt]
thought: [Previous thought]
=====================

=====================
API Response 2:
=====================
assistant: Welcome to the competition! Would you like to enter? I can help.
=====================

Continuing the conversation when user says "Yes please":

=====================
API Call 3:
=====================
system: [Original prompt]
thought: [Previous thought] 
assistant: [Previous response]
user: Yes please
=====================

=====================
API Response 3:
=====================
thought: I should probably collect a full name and some sort of contact details...
=====================

Shortcomings

  1. Latency
    It takes 2 API calls to get a response to the user, thus increasing latency.

  2. Token Usage
    Having the LLM output thought tokens before responding each time will increase token usage. Given that the cost of tokens has been dropping significantly, I think this is a small price to pay.


Summary

We all know a person who speaks their mind constantly and a person who is rather deliberate and thoughtful, engaging the conversation only after considering a few factors. What I'm proposing is that we work towards allowing LLMs to become more like the latter.


Author: Werner Stucky

Loading footer...