A doc I created to help organize and simplify this "AI" Boom. The key part of "Artificial Intelligence" is the artificial part. It's not real.
LLMs are built from the work of Data Science/Machine Learning Engineering.
Building on LLMs does not require Machine Learning experience, but rather an understanding of fundamental concepts detailed below.
- LLM
- Beyond LLMs
- Important LLM Understandings
- Further Research
A Large Language Model is built using an incredibly large Neural Network. These models predict what to say based upon extremely large training datasets and statistical relationships between words and phrases. This means its success is determined by its ability to generate high-quality, relevant, and coherent text that aligns with human-like language patterns. Its process is fundamentally about probability prediction, not true understanding, intent, or even alignment with reality; What someone would probably say in response to your input.
Language is the fundamental building block of humanity. Without it, we likely would not even recognize another sentient being. Language does not accurately represent anything, it is simply a way for us to agree on a representation. Language is not true, it is just an agreement. Food for thought. Anyway.
- ChatGPT
- Provider: OpenAI
- Gemini
- Provider: Google
- DeepSeek
- Provider: DeepSeek
- Llama (Open Source - not hosted on the internet)
- Provider: Meta
- Claude
- Provider: Anthropic
- BLOOM (Open Source - not hosted on the internet)
- Provider: HuggingFace
- Grok
- Provider: xAI
- Lumo
- Provider: Proton
This is a really important section. There are TWO inputs to an LLM:
- Context
- Additional information that can be injected to help inform the LLM of what it knows
- Prompt
- The immediate question/command
The ability to provide further context to an LLM introduced sophisticated ways of interacting with an LLM to help it perform better and to dynamically change its prediction.
Note: See LLMs are Fundamentally Stateless
With the ability to inject contextual information with a prompt to an LLM, Retrieval-Augmented Generation (RAG) Modeling was created. This normally involves:
- The storage of relevant contextual information utilizing a Vector database.
- Middleware that injects the contextual information alongside a question/command
Basic workflow looks like this:
- User inputs a question/command
- Middleware intercepts question/command
- Parses the question/command and queries the Vector database to grab relevant contextual info surrounding question/command
- Middleware injects relevant contextual info as context to the LLM in addition to the User's prompt
- Middleware returns LLM's response
Rough Example:
- User inputs prompt in web page:
"Who do I talk to about getting reimbused for my travel meal?" - Middleware takes prompt and searches Vector Database. Gets back info:
"Reimbursement Doc: Please contact John in Accounting for any reimburesement requests" - Middleware injects contextual information alongside prompt to an LLM:
Context:
Relevant Info:
"Reimbursement Doc: Please contact John in Accounting for any reimburesement requests"
Prompt:
"Who do I talk to about getting reimbused for my travel meal?"
- LLM Outputs a better prediction
Output:
You're absolutely right! It is incredibly important to get reimbursed for expenses made while on the job.
You should contact John in Accounting for any reimbursement requests!
Vector Databases store and organize information in a similar way to LLMs, utilizing numerical representations (high-dimensional vectors) to represent how likely information relates together. When you query a Vector database, you provide information (such as text) and the database engine returns the closest vectors (the most relevant/relatable information).
Note: For those from the regular database world:
- A Vector is a record
- A Vector contains a chunk of text that is relevant/related to your input query
Vector databases have been around for a much longer period of time so a lot of options are available.
Stand-Alone
Platform/Product Specific
- MongoDB Atlas Vector Search
- Databricks Vector Database
- ElasticSearch - with vector enabled
- Postgres -
pgvectorextension
Cloud Specific
- Azure: Azure Cognitive Search
- AWS: Amazon Bedrock or Opensearch
- GCP: Vertex AI Matching Engine
With the great success of LLMs, providing a more accessible way of accessing agreed upon information (through their massive training datasets, sourced from the internet), introduced a new building block for software applications. LLMs' complex "reasoning" abilities could then be "directed" through context injection, similiar to someone trying to lead an elephant (if you think that sounds difficult, its because it is). This led to the development of AI Agents, which is a fancy way of saying software built on LLMs and utilizing LLMs. AI Agents are RAG Modeling on STEROIDS.
Because LLMs are language-based, utilizing LLMs within software systems has broad applications. Most Agents are very specific to enable better goal achievement, but we are seeing the rise of general-purpose Agents as well. To be clear, these general-purpose Agents are a combination of smaller agents to enable access to broader functionality.
General Purpose Agents
- ChatGTP Agent Mode
- Combination of Agents that interact with your Operating System
Specific Agents
- Search Agents
- Code Agents
- Customer Service Agents
- Creative Agents
A limitation of LLMs is an inability to hold long-term "memory". LLMs have context within a conversation, but do not link conversations together and do not "learn" based upon interactions. A creation of an Agent normally involves augmenting an LLM by injecting relevant contextual information such as:
- Previous Interactions and a success score
- Additional information relevant to help it achieve a better success score
Behind the scenes of Agents is advanced context injection to help the LLM perform a specific task. Agents enable LLMs to be stateful and goal-driven through advanced context injection.
The ability to inject contextual information into an LLM allows Agents to provide very specific information that an LLM "knows", rough conceptual example below:
Context:
You have access to a database, here are the tables: Table1, Table2, ...
To see the results of a table, output this format
SELECT
*
FROM Table1
The LLM can then output text like this:
SELECT
*
FROM Table1
And then the output is parsed by the Agent, an actual query is ran on a database and then another prompt is issued to the LLM like this:
Context:
You have access to a database, here are the tables: Table1, Table2, ...
To see the results of a table, output this format
SELECT
*
FROM Table1
You queried Table1 in our last interaction, here are results:
blah blah blah
This gives the LLM relevant contextual information, and "empowers" it to interact with systems and make better predictions by injecting live data and very specific, relevant, contextual information.
Note: LangChain is an Open Source AI Agent Development Framework that is a popular choice in empowering the development for these context injection processes.
Model Context Protocol (MCP) is a programmtic way to control what tools an LLM can use and how they can use them.
-
A MCP Server is responsible for receiving commands from the MCP Client, executing those commands (i.e., invoking the tools), and sending the results back to the client.
-
A MCP Client is responsible for injecting the available tools and commands into the LLM’s context, interpreting when the LLM issues a command intended for the MCP Server, and sending the command to the server on the LLM’s behalf. It also injects the MCP Server’s results back into the LLM's input context for further processing.
LlamaIndex is a popular open source library that helps developers easily create MCP Client code.
This design creates a feedback loop between real-world applications and the LLM, making it a fundamental component of AI agent development. It empowers the LLM to trigger commands that execute within software systems and receive results in real time! This continuous interaction gives the LLM relevant context to make better predictions.
Note: It is important to recognize that an LLM has absolutely NO IDEA what triggering a command through an MCP server will do except what is provided to it. It makes a prediction of what someone would probably do given the language context. That is all.
MCP Servers utilize the JSON-RPC 2.0 specification as its core messaging protocol. From my perspective, this is what I'd use:
MCP Server
MCP Client
To simplify and summarize, an AI Agent:
- Intakes a prompt
- Parses the prompt for important contextual information
- Utilizes the gathered information to:
- Gather long-term relevant contextual information by:
- Triggering commands on the MCP Server
- Querying a Vector Database
- etc.
- Gather long-term relevant contextual information by:
- Injects relevant long-term contextual information to the LLM alongside a new prompt to get a better prediction
- Loops through until a solution is provided or confirmed by the system
It is important to correctly interpret a User's input to provide the best contextual information to the Agent. This requires implementation of Machine Learning to better understand the meaning of the input by identifying intents, entities, sentiment, and context. Good NLU enables the Agent to understand user goals and conditions accurately and empowers the entire process.
To empower the LLM to make better predictions, we need to provide accurate, structured, and simple contextual information to the LLM through a context injection. This can include a lot of information, but needs to be condensed down to respect the LLM's token limit.
This involves properly breaking complex tasks down into sub tasks, setting guardrails, embedding external MCP Server tool invocation instructions, etc. This also involves re-structuring the prompt to enable the LLM to make a better prediction.
Once we interpret a User's input, the Agent needs to orchestrate the movement and retrieval of contextual data to empower the LLM to complete its goal. There should also be data pipelines that are constantly updating the data sources that provide contextual information for the Agent to access and provide to the LLM.
LLMs predict the next token (word) to output based upon statistical relationships developed from language data. The success of their prediction is not based upon fundamental truth or alignment with reality. Because of this fact, LLMs can generate plausible but factually incorrect or fabricated information.
An AI Agent is only as good as the context that is injected into the LLM with the prompt. MCP servers help, but only if accurate contextual information is provided in how to correctly utilize the MCP server, what actions have already been tried, etc. This also means if relevant information is not provided, destructive actions or inadequate solutions could easily occur.
Since LLMs are trained on past data, this also incorporates any bias and inaccuracies within the data (such as fundamental or toxic misunderstandings that are culturally widespread). LLMs also do not know current information unless specifically trained on the data (like the current date and context).
LLMs do not have long-term memory. They are inherently stateless. Which means an LLM cannot learn without proper contextual feedback provided to it and that context must provided to it OVER and OVER again to maintain continuity. It's useful to see LLMs as having short-term amnesia. Every time they wake up for the day you have to remind them where they are and why they are there. Every time. Or they cannot pick up where they left off.
There are limits on the context that can be injected into an LLM. This directly impacts the ability of an LLM to be effective in highly complex environments & situations. This is a key constraint in the current AI Engineering landscape.
LLMs are trained on MASSIVE datasets sourced from the internet AND most LLMs record and store the prompts given to it. This prompt data can also be utilized to better train the LLM model, which could expose prompt data inadvertently. Since LLMs change their prediction based upon context injection, this exposes a way to maliciously expose sensitive information.
Note: Ollama helps address a large amount of these concerns by offering the ability to self-host Open Source LLMs. Think of it like a package manager for Open Source LLMs.

