Agentic AI approaches and Generalizing Reasoning and Autonomy

Details: Category: Blog; By Stuart Mathews; 28.Nov; 28 November 2025; Last Updated: 28 November 2025; Hits: 359

I read an interesting paper recently, which highlights the utility of doing literature reviews coupled with the application of mathematics to help generalise or simplify a large body of work (literature review).

The paper is entitled, "LLM-based Agentic Reasoning Frameworks: A Survey from Methods to Scenarios" by Zhao et al., and can be found here.

For example, when considering the types of approaches, the survey paper (essentially a literature review) describes (most?) agentic workflows as a configuration of agents using one of three design variations - either Single-Agent, Tool-based, or Multi-Agent. Presumably, hybrid configurations are plausible too.

Single-agent designs single-handedly work to incorporate technologies and strategies to optimise the cognitive, reasoning and general decision-making of itself. Tool-based agents strategically select and utilise external tooling, while Multi-agent designs base their operation on sharing and communicating.

All types of agentic AI workflows usually model reasoning as the basis of their operation, while some use more specific operations based on how their workflow is composed. For example, some might use task decomposition, tool selection, strategic selection of actions (decision-making), or produce intermediate outputs that are then refined to produce the final output. Some use memory, reflection and planning, etc.

If we ignore the specific technologies that are used within the reasoning process, agentic ai applications can be concisely represented in a hierarchical, tree-like structure:

Single Agents (only care about producing optimal reasoning results themselves)
1. Approaches:
  1. Prompt Engineering and In-Context learning - few-shot examples and Chain of Thoughts (CoT)
  2. Self improvements
    1. Reflection/experience
    2. Interactive optimisation - refining outputs
    3. Interactive learning - environmental feedback
Tool-based agents (use external tools)
1. Approaches:
  1. Tool selection - autonomous, rule-based, learned selection
  2. Tool Integration - API, plugin, middleware/RAG
  3. Tool utilisation - sequential, parallel, iterative reuse
Multi-Agents (shares/communicates results, goals, etc.)
1. Organization architecture
  1. Centralised - centralised coordinating agents with subordinate task worker agents
  2. Decentralised - peer-to-peer communication
  3. Hierarchical - task decomposition to subordinate workers with task aggregation at higher levels
2. Interaction Dynamics
  1. Competition - optimise individual agents' goals
  2. Co-operation - optimise a shared goal
  3. Negotiation - compromise between individual and shared goals

This is the bird's-eye view that is afforded by doing a literature review, which in itself is very helpful and means you need not formulate this by going through the same papers the authors did - effectively using their labour and results (thanks!).

This, however, can be further simplified.

For example, an agentic workflow can be seen as a multi-step process, e.g, over k steps that does not stop until the terminating condition Q is met, and each step generally consists of 2 operations, i.e., 1) producing an output \(a_k()\) and 2) updating the context based on that output \(a^{\prime}_k()\).

It should be noted that the context (\(C_k\)) is generally maintained over time and augmented based on the output of an action \(a_k\) in each reasoning step. The context is usually the initial input from the user. For example, if the basis of reasoning within the agentic AI workflow was the use of a Large Language Model (LLM), this context would be the initial prompt. This initial context is represented as \(C_0\) and \(P_U\) represents the user prompt for the LLM:

\(C_0 \leftarrow Init(P_U)\)

Once the inital context is initialized, we can start the k-step multi-step process:

while \(¬Q(C_k,k)\) do:

\( y_{k+1} = a_k(C_k, g, T) \\ \tag{1. Generate output \(y_{k+1}\) at step \(k\)} \\ \)

\( C_{k+1} = a^{\prime}_k(C_k, y_{k+1}, g, T) \\ \tag{2. Generate context \(C_{k+1}\) at step \(k\)} \\ \)

\(k \leftarrow k+1 \\ \tag{move to the next step} \\ \)

end

Here \(y_{k+1}\) represents the output and \(a_k\) is the action at step \(k\) that produces it, given the current context \(C_k\), the goal \(g\) and the tool, \(t\).

The really useful things about this, is that it can be extended to express many agentic AI workflows (see below).

Prompt engineering (single agent)

\[ C_0 \leftarrow Init(P_U, P*) \\ \tag{augment the inital input using prompt engineering (P*)} \]

Iterative optimization (single agent)

\[ C_0 \leftarrow Init(P_U, S), Q \rightarrow Q = y \models S \]

Initialise initial context with an optimisation target S (AKA the Standard).

This is used in iterative optimisation where the loop continually checks the result, y to see if it matches the Standard S and terminates if it does, or otherwise keeps refining the output towards reaching S.

Autonomous tool-based selection

\[ y_{k+1} \leftarrow t_{k+1} = a_k(C_k, g, T), T \in \{t_0, t_1,...t_n\} \]

Where the selection of the tool, \(t\), from the available tools in set \(T\) are considered, irrespective of how that selection was made. For examplem it could have been using an LLM to select the tool using prompt engineering.

Rule-based tool selection

\[ t_{k+1} = a_k(C_k, g, T, R) \]

Here, the rules, \(R\), condition which tool is selected.

Context update via Reflection

\[ C_{k+1} = a_{reflect}(C_k, y_{k+1}, g ,t) \]

This shows that a reflective action/process is used to generate the evolving context \(C_{k+1}\), considering the output \(y_{k+1}\) generated and the current goal and tool.

Interactive learning through goal update/adaptation:

\[ g_{k+1} \leftarrow a_k(\{(C_i, y_i)\}^{k}_{i=1}, g_k, T)\]

This shows that the goal is updated, based on the history (of all context updates and outputs), given the current goal \(g_k\) and tool, \(t\)

Multi-agent co-operative interaction:

\[ g^{ri} \leftarrow a^{ri}_{reflect}(C^{ri}_{k}, g^{ri}, G, t^{ri}) \]

This shows that the goal of the \(i^{th}\) role-based agent \(g^{ri}\) is based on a process of reflection over the current context of the agent \(C^{ri}_{k}\) at step \(k\), its own goal \(g^{ri}\) and a goal that is shared among all agents \(G\)

Multi-agent negotiation:

\[ g^{ri} \leftarrow a^{ri}_{reflect}(C^{ri}_{k}, g^{ri}, Y_k, G, t^{ri}) \]

This shows that the agents' goals are updated based on all agents' outputs (\(Y_k)\)

Multi-agent Competition:

\[ g^{ri} \leftarrow a^{ri}_{reflect}(C^{ri}_{k}, g^{ri}, Y_k, t^{ri}), Y \in \{y^{r1}_k, y^{r2},..., y^{rn}_k\} \]

This shows that each agent updated its goal based on its history of outputs, and reasonably, it does not have a shared goal \(\G\)) with other agents.

Role-based agent initialisation

\[ C_0 \leftarrow P_U \rightarrow C^{ri}_0 \leftarrow (P_Um r_i) \\ \tag{ role-based agent initialization} \\ \]

Autonomy

Projects

Login

Twitter

Agentic AI approaches and Generalizing Reasoning and Autonomy