AI Agents

February 15, 2025
in AI Agents
6 min read

Getting Started with AI Agents - Part II

We learned how tools are provided to agents in the system prompt and how AI agents can reason, plan, and interact with their environment.

Now, we'll examine the AI Agent Workflow, known as Thought-Action-Observation.

This tutorial is a summmary of HuggingFace Course on Agents - Unit 1.

The Thought-Action-Observation Cycle

The Think, Act, and Observe components operate in a continuous loop, following the flow outlined below:

Image title

After performing an action, the framework follows these steps in order:

Parse the action to identify the function(s) to call and the argument(s) to use.
Execute the action.
Append the result as an Observation.

In many agent frameworks, rules and guidelines are directly embedded into the system prompt. In the following System Message example we defined we can see:

The Agent’s behavior.
The Tools our Agent has access to.
The Thought-Action-Observation Cycle, that we insert into the LLM instructions.

Image title

Let’s consider a practical example. Imagine we ask an agent about the temperature in Toronto. When the agent receives this question, it begins the initial step of the "Think" process. This "Think" step represents the agent’s internal reasoning and planning activities to solve the task at hand. The agent utilizes its LLM capabilities to analyze the information presented in its prompt.

During this process, the agent can break down complex problems into smaller, more manageable steps, reflect on past experiences, and continuously adjust its plans based on new information. Key components of this thought process include planning, analysis, decision-making, problem-solving, memory integration, self-reflection, goal-setting, and prioritization.

For LLMs that are fine-tuned for function-calling, the thought process is optional.

The user needs current weather information for Toronto. I have access to a tool that fetches weather data. First, I need to call the weather API to get up-to-date details.

This step shows the agent breaking the problem into steps: first, gathering the necessary data.

Based on its reasoning and the fact that the Agent is aware of a get_weather tool, the Agent prepares a JSON-formatted command to call the weather API tool. For example, its first action could be:

   {
     "action": "get_weather",
     "action_input": {
       "location": "Toronto"
     }
   }

The "Observation" step refers to the environment's response to an API call or the raw data received. This observation is then added to the prompt as additional context. Before the Agent formats and presents the final answer to the user, it returns to the "Think" step to update its internal reasoning. If the observation indicates an error or incomplete data, the Agent may re-enter the cycle to correct its approach.

The ability to call external tools, such as a weather API, empowers the Agent to access real-time data, which is a critical capability for any effective AI agent. Each cycle prompts the Agent to integrate new information (observations) into its reasoning (thought process), ensuring that the final outcome is accurate and well-informed. This illustrates the core principle of the ReAct cycle: the dynamic interplay of Thought, Action, and Observation that enables AI agents to tackle complex tasks with precision and efficiency. By mastering these principles, you can design agents that not only reason through their tasks but also leverage external tools to achieve their objectives, continuously refining their outputs based on environmental feedback.

The ReAct Approach

Another technique is the ReAct approach, which combines “Reasoning” (Think) with “Acting” (Act). ReAct is a straightforward prompting method that adds step-by-step reasoning before allowing the LLM to interpret the next tokens. In fact, encouraging the model to think like this promotes the decoding process towards the next tokens that create a plan, rather than jumping to a final solution, as the model is prompted to break the problem down into smaller tasks. This enables the model to examine sub-steps more thoroughly, which generally results in fewer errors compared to attempting to produce the final solution all at once. For instance, DeepSeek, which have been fine-tuned to "think before answering". These models have been trained to always include specific thinking sections (enclosed between and special tokens). This is not just a prompting technique like ReAct, but a training method where the model learns to generate these sections after analyzing thousands of examples that show what we expect it to do.

Actions

Actions refer to the specific steps that an AI agent undertakes to engage with its surroundings. Whether it involves searching the internet for information or managing a physical device, every action is a purposeful task performed by the agent. For instance, an agent that aids in customer service could obtain customer information, provide support articles, or escalate problems to a human representative.

There are multiple types of Agents that take actions differently:

Type of Agent	Description
JSON Agent	The Action to take is specified in JSON format.
Code Agent	The Agent writes a code block that is interpreted externally.
Function-calling Agent	It is a subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action.

Actions can serve several purposes:

Information Gathering (e.g., web searches, database queries).
Tool Usage (e.g., API calls, calculations).
Environment Interaction (e.g., controlling devices).
Communication (e.g., engaging with users).

An essential aspect of an agent is its ability to stop generating new tokens once an action is complete, applicable across all formats (JSON, code, function-calling). This prevents unintended output and ensures clarity. The LLM handles text to describe the desired action and its parameters.

One approach to implementing actions is known as the stop and parse approach. This method ensures that output generation is structured, using formats like JSON or code. It aims to avoid producing unnecessary tokens and to call the appropriate tool to extract the required parameters.

Thought: I need to check the current weather.
Action :
{
  "action": "get_weather",
  "action_input": {"location": "Toronto"}
}

Function-calling agents operate similarly by structuring each action so that a designated function is invoked with the correct arguments.

An alternative Action approach is using Code Agents. Instead of outputting a simple JSON object, a Code Agent generates an executable code block—typically in a high-level language like Python.

alt text

This approach offers several advantages:

Expressiveness: Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON.
Modularity and Reusability: Generated code can include functions and modules that are reusable across different actions or tasks.
Enhanced Debuggability: With a well-defined programming syntax, code errors are often easier to detect and correct.
Direct Integration: Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making.

For example, a Code Agent tasked with fetching the weather might generate the following Python snippet:

# Code Agent Example: Retrieve Weather Information
def get_weather(city):
    import requests
    api_url = f"https://api.weather.com/v1/location/{city}?apiKey=YOUR_API_KEY"
    response = requests.get(api_url)
    if response.status_code == 200:
        data = response.json()
        return data.get("weather", "No weather information available")
    else:
        return "Error: Unable to fetch weather data."

# Execute the function and prepare the final answer
result = get_weather("Toronto")
final_answer = f"The current weather in Toronto is: {result}"
print(final_answer)

This method also follows the stop and parse approach by clearly delimiting the code block and signaling when execution is complete by printing the final_answer.

Observation

Observations are how an Agent perceives the consequences of its actions. We ca understand as signals from the environment that guide the next cycle of thought.

In the observation phase, the agent:

Collects Feedback: Receives confirmation of action success.
Appends Results: Updates its memory with new information.
Adapts Strategy: Refines future actions based on updated context.

This process of using feedback helps the agent stay on track with its goals. It allows the agent to learn and adjust continuously based on real-world results. Observation can also be seen as Tool “logs” that provide textual feedback of the Action execution.

Type of Observation and Examples

System Feedback: Error messages, success notifications, or status codes.
Data Changes: Updates in the database, modifications to the file system, or changes in state.
Environmental Data: Readings from sensors, system metrics, or resource usage information.
Response Analysis: Responses from APIs, query results, or outputs from computations.
Time-based Events: Completion of scheduled tasks or milestones reached, such as deadlines.

Comments

Both tutorials may seem technical, but they offer an overview of understanding the potential of AI agents. In the next blog post, we will discuss an implementation in public transit.

February 13, 2025
in AI Agents
5 min read

Getting Started with AI Agents - Part I

It's hard to have a conversation about AI these days without bringing up tools like ChatGPT, DeepSeek, and the like, right? These tools are becoming key players in how we analyze and interact with public transport. Are you ready for this shift?

In this post, let's chat a bit about AI Agents and how we can tailor LLMs to answer the questions that matter most to us as Transit Data Scientists.

Before diving into more complex applications, let’s take a moment to introduce the topic. This way, we can get comfortable with the terms and concepts we'll be working with!

This tutorial is a summmary of HuggingFace Course on Agents - Unit 1.

What’s an AI Agent, Anyway?

An AI Agent is just a smart system that uses AI to interact with its surroundings and get stuff done. It thinks, plans, and takes action (sometimes using extra tools) to complete tasks.

AI Agents can do anything we set them up for using Tools to carry out Actions.

The Two Main Parts of an AI Agent

The Brain (AI Model) - This is where all the decision-making happens. The AI figures out what to do next. Examples include Large Language Models (LLMs) like GPT-4.
The Body (Tools & Capabilities) - This is what the agent actually does. Its abilities depend on the tools it has access to.

Why Use LLMs?

LLMs (Large Language Models) are the go-to choice for AI Agents because they’re great at understanding and generating text. Popular ones include GPT-4, Llama, and Gemini.

There are two ways you can use an LLM:

Run Locally (if your computer is powerful enough).
Use a Cloud/API (e.g., via Hugging Face’s API).

System Messages: Setting the Rules

System messages (or prompts) tell the AI how it should behave. They act as guiding instructions.

system_message = {
    "role": "system",
    "content": "You are a helpful customer service agent. Always be polite and clear."
}

These messages also define what tools the AI can use and how it should format its responses.

Conversations: How AI Talks to Users

A conversation is just back-and-forth messages between a user and the AI. Chat templates help keep things organized and make sure the AI remembers what’s going on.

Example:

conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "Sure! What’s your order number?"},
    {"role": "user", "content": "ORDER-123"},
]

Chat Templates: Keeping AI Conversations Structured

Chat templates make sure LLMs correctly process messages. There are two main types of AI models:

Base Models: Trained on raw text to predict the next word.
Instruct Models: Fine-tuned to follow instructions and have conversations.

We use ChatML, a structured format for messages. The transformers library takes care of this automatically:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

What are Tools?

Tools let AI Agents do more than just text generation. A Tool is basically a function the AI can call to get things done.

Tool	What It Does
Web Search	Fetches up-to-date info from the internet.
Image Generation	Creates images from text.
Retrieval	Pulls in data from other sources.
API Interface	Connects with external APIs like GitHub or Spotify.

Why Do AI Agents Need Tools?

LLMs have a limited knowledge base (they only know what they were trained on). Tools help by allowing:

Real-time data fetching (e.g., checking the weather).
Specialized tasks (e.g., doing math, calling APIs).

Building a Simple Tool: A Calculator

Let’s create a basic calculator tool that multiplies two numbers:

def calculator(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

This tool includes:

A clear name (calculator).
A description (via the docstring).
Input and output types.

To define it as a tool, we describe it like this:

Tool Name: calculator, Description: Multiplies two numbers., Arguments: a: int, b: int, Outputs: int

Automating Tool Descriptions

Instead of writing descriptions manually, we can use Python introspection to extract details automatically. The Tool class helps manage this info.

class Tool:
    """
    A class representing a reusable piece of code (Tool).

    Attributes:
        name (str): Name of the tool.
        description (str): A textual description of what the tool does.
        func (callable): The function this tool wraps.
        arguments (list): A list of argument.
        outputs (str or list): The return type(s) of the wrapped function.
    """
    def __init__(self, 
                 name: str, 
                 description: str, 
                 func: callable, 
                 arguments: list,
                 outputs: str):
        self.name = name
        self.description = description
        self.func = func
        self.arguments = arguments
        self.outputs = outputs

    def to_string(self) -> str:
        """
        Return a string representation of the tool, 
        including its name, description, arguments, and outputs.
        """
        args_str = ", ".join([
            f"{arg_name}: {arg_type}" for arg_name, arg_type in self.arguments
        ])

        return (
            f"Tool Name: {self.name},"
            f" Description: {self.description},"
            f" Arguments: {args_str},"
            f" Outputs: {self.outputs}"
        )

    def __call__(self, *args, **kwargs):
        """
        Invoke the underlying function (callable) with provided arguments.
        """
        return self.func(*args, **kwargs)

Now, we can create a tool instance:

calculator_tool = Tool(
    "calculator",                   # name
    "Multiply two integers.",       # description
    calculator,                     # function to call
    [("a", "int"), ("b", "int")],   # inputs (names and types)
    "int",                          # output
)

Using a Decorator to Define Tools

A decorator makes tool creation easier:

def tool(func):
    """
    A decorator that creates a Tool instance from the given function.
    """
    # Get the function signature
    signature = inspect.signature(func)

    # Extract (param_name, param_annotation) pairs for inputs
    arguments = []
    for param in signature.parameters.values():
        annotation_name = (
            param.annotation.__name__ 
            if hasattr(param.annotation, '__name__') 
            else str(param.annotation)
        )
        arguments.append((param.name, annotation_name))

    # Determine the return annotation
    return_annotation = signature.return_annotation
    if return_annotation is inspect._empty:
        outputs = "No return annotation"
    else:
        outputs = (
            return_annotation.__name__ 
            if hasattr(return_annotation, '__name__') 
            else str(return_annotation)
        )

    # Use the function's docstring as the description (default if None)
    description = func.__doc__ or "No description provided."

    # The function name becomes the Tool name
    name = func.__name__

    # Return a new Tool instance
    return Tool(
        name=name, 
        description=description, 
        func=func, 
        arguments=arguments, 
        outputs=outputs
    )

Now, we can define tools like this:

@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

print(calculator.to_string())

This makes it easy for AI Agents to recognize and use tools based on text input.

Recap

AI Agents use AI models to interact and make decisions.
LLMs handle language understanding and text generation.
System Messages define the agent’s behavior.
Tools extend an AI’s capabilities beyond text generation.
Chat Templates format conversations properly.
Tools help AI Agents fetch real-time data and execute tasks.

By combining all these pieces, you can build smart AI Agents that think, act, and assist like pros!

In the next tutorial we will discuss the AI Agents workflow.