Small Intent Models

26 Dec 2024

If we were to model humans as a (grossly oversimplified) system, then their inputs and outputs are:

What they hear, say sound(t), where f(t) means a function of time t,
What they see, say sight(t), and
What they speak, say speech(t)

I will collectively refer to these as stimuli(t).

The hidden/internal variables (again grossly oversimplified), that cannot be easily captured/recorded, are:

What they feel, say feel(t)
What they intend, say intent(t)
and so on

These variables are also dependent on stimuli(t)

In a simple scenario, humans convey their intent to LLMs use a text prompt, say prompt(t), which is a function of intent(t) and stimuli(t).

Today, we cannot easily capture intent(t). But we can record stimuli(t) and obtain an estimate of intent(t), using which we can again estimate prompt(t). Essentially, the inputs to the new system are stimuli(t) and the output is prompt(t), which internally captures intent(t).

The process of generating intent(t) from stimuli(t) involves the human brain. The process of generating prompt(t) from intent(t) again involves the human brain. Using this prompt(t), the LLM generates an answer(t) which is then read and interpreted by the human brain.

Oftentimes, the answer(t) provided by the LLM for prompt(t) is unsatisfactory, because it does not serve the intent(t). The user then generates a prompt¹(t) to obtain answer¹(t), and this process continues say n times, until the user obtains a satisfactory answer for intent(t). While promptⁿ(t) is not significantly different from prompt(t), the user’s satisfaction is different.

Users would like to arrive at promptⁿ(t) and answerⁿ(t) as soon as possible. Ideally, n should be one.

If there were an “intent model” which takes as input stimuli(t) and directly generate promptⁿ(t), allowing LLMs to generate answerⁿ(t) that satisfies intent(t), it would:

Help users save time and effort
Provide users the best answer for their intent(t)
Reduce the cost of inference for obtaining the best answer, since users do not iterate on (prompt, answer) pairs
Allow LLMs to iterate without RLHF, as the intent model can serve as feedback

While this appears similar to prompt engineering, there is a critical distinction. This model attempts to understand the user’s intent, given their stimuli, and generate the best prompt for that stimuli and intent. All the improvements in prompt engineering would be necessary, but not sufficient.

The way users would interact with this intent model is by communicating whether the prompt(t) generated by the model best represents their intent. This can be easily done by showing the user (say) three potential prompts that they may query the LLM for, given the stimuli they recieved in the past (say) 10 minutes. The user clicking on a particular prompt implies that the prompt best represents their intent.

Depending on how frequently a user may choose to interact with this model, the model should be able to digest the stimuli(t) and quickly provide promptⁿ(t), helping the user query for almost every intent(t). This requires the “intent model” to be small. Moreover, smartphones can capture the stimuli(t) and would have enough compute to run the “small intent model” to directly generate promptⁿ(t), which can be sent to an LLM to obtain answerⁿ(t).

Sravan Patchala

Small Intent Models

Related Posts

Personal Artificial Intelligence 10 Jun 2025

OpenAI Trumps Nvidia 30 May 2025

Future of Mobility 12 Apr 2025