Published on 2026-06-10

How LLMs Actually Work, Part 3: Agents and the Future of AI

The final part of a practical deep dive into LLMs - reasoning, mixture of experts, multimodal models, and how LLMs become AI agents that plan, use tools, and execute.

Artificial IntelligenceLarge Language ModelsGenerative AIMachine LearningSoftware EngineeringAI Agents

Introduction

This is the final post in a three-part series on how large language models actually work.

  • Part 1 covered tokens, embeddings, transformers, attention, and training
  • Part 2 covered inference, context windows, hallucinations, RAG, and fine-tuning

In this post, we look at where LLMs are heading: the reasoning debate, new architectures, and the shift from chatbots to AI agents.


Do LLMs Actually Reason?

The Intelligence Debate

This remains one of the biggest questions in AI.

Some researchers argue that LLMs genuinely reason.

Others believe they are sophisticated pattern matching systems.

The truth is likely somewhere in between.

Chain Of Thought Reasoning

Modern models often solve problems through intermediate reasoning steps.

This improves:

  • Mathematics
  • Coding
  • Planning
  • Logic

Even today, researchers are still discovering surprising capabilities that emerge at scale.


The Future Of LLM Architectures

Mixture Of Experts

Some modern models use Mixture Of Experts architectures.

Instead of activating the entire network, only relevant experts are used.

Mixture-of-experts router sends each input to a subset of expert networks whose outputs are merged

This improves efficiency while increasing model capacity.

Multimodal Models

Modern AI systems can process:

  • Text
  • Images
  • Audio
  • Video
  • Documents

Multimodal model fusing image, text, and audio pathways into one shared representation

This is why today's AI systems can analyze screenshots, listen to speech, and understand documents.


From LLMs To AI Agents

Why LLMs Alone Are Not Enough

An LLM is essentially a reasoning engine.

Agents combine that reasoning engine with:

  • Tools
  • Memory
  • Planning
  • Execution

How Agents Work

Agent architecture where the LLM plans, picks tools, calls APIs, updates memory, and returns a final answer

This is where modern AI products are heading.

In practice, building reliable agent systems involves:

  • State management
  • Tool reliability
  • Memory design
  • Evaluation pipelines
  • Error recovery

Calling an LLM API is often the easiest part.


Final Thoughts

The most fascinating thing about Large Language Models is that they are fundamentally prediction machines.

Yet from prediction emerges:

  • Conversation
  • Coding
  • Translation
  • Planning
  • Summarization
  • Creativity

Understanding concepts like tokens, embeddings, transformers, attention, RAG, memory, and agents helps transform AI from something magical into something engineerable.

And once you start seeing how all the pieces connect together, you realize that modern AI is not one technology.

It is an ecosystem of technologies working together.

As software engineers, understanding that ecosystem is becoming one of the most valuable skills we can develop.


Series Recap

  1. Part 1: Tokens and Transformers - how text becomes predictions
  2. Part 2: Inference, Memory, and RAG - using models in production
  3. Part 3 (this post): reasoning, architectures, and agents