Published on 2026-06-10
How LLMs Actually Work, Part 3: Agents and the Future of AI
The final part of a practical deep dive into LLMs - reasoning, mixture of experts, multimodal models, and how LLMs become AI agents that plan, use tools, and execute.
Introduction
This is the final post in a three-part series on how large language models actually work.
- Part 1 covered tokens, embeddings, transformers, attention, and training
- Part 2 covered inference, context windows, hallucinations, RAG, and fine-tuning
In this post, we look at where LLMs are heading: the reasoning debate, new architectures, and the shift from chatbots to AI agents.
Do LLMs Actually Reason?
The Intelligence Debate
This remains one of the biggest questions in AI.
Some researchers argue that LLMs genuinely reason.
Others believe they are sophisticated pattern matching systems.
The truth is likely somewhere in between.
Chain Of Thought Reasoning
Modern models often solve problems through intermediate reasoning steps.
This improves:
- Mathematics
- Coding
- Planning
- Logic
Even today, researchers are still discovering surprising capabilities that emerge at scale.
The Future Of LLM Architectures
Mixture Of Experts
Some modern models use Mixture Of Experts architectures.
Instead of activating the entire network, only relevant experts are used.
This improves efficiency while increasing model capacity.
Multimodal Models
Modern AI systems can process:
- Text
- Images
- Audio
- Video
- Documents
This is why today's AI systems can analyze screenshots, listen to speech, and understand documents.
From LLMs To AI Agents
Why LLMs Alone Are Not Enough
An LLM is essentially a reasoning engine.
Agents combine that reasoning engine with:
- Tools
- Memory
- Planning
- Execution
How Agents Work
This is where modern AI products are heading.
In practice, building reliable agent systems involves:
- State management
- Tool reliability
- Memory design
- Evaluation pipelines
- Error recovery
Calling an LLM API is often the easiest part.
Final Thoughts
The most fascinating thing about Large Language Models is that they are fundamentally prediction machines.
Yet from prediction emerges:
- Conversation
- Coding
- Translation
- Planning
- Summarization
- Creativity
Understanding concepts like tokens, embeddings, transformers, attention, RAG, memory, and agents helps transform AI from something magical into something engineerable.
And once you start seeing how all the pieces connect together, you realize that modern AI is not one technology.
It is an ecosystem of technologies working together.
As software engineers, understanding that ecosystem is becoming one of the most valuable skills we can develop.
Series Recap
- Part 1: Tokens and Transformers - how text becomes predictions
- Part 2: Inference, Memory, and RAG - using models in production
- Part 3 (this post): reasoning, architectures, and agents