Skip to content

2025

Deploy Machine Learning Models on HuggingFace Hub using Gradio?

In this tutorial, we will learn how to deploy models to Hugging Face Spaces. Hugging Face Spaces is a platform designed to create, share, and deploy machine learning (ML) demo applications, like an API (you can also call your app).

Each Spaces environment is limited to 16 GB of RAM, 2 CPU cores, and 50 GB of non-persistent disk space by default, which is available free of charge. If you require more resources, you have the option to upgrade to better hardware.

The first step is to create an account at the HuggingFace website.

alt text

Next, we will create a new space by clicking on the upper bar "Spaces" and the button +New Space. Next, we will create a new space by clicking on the upper bar and the button +New Space. We can select the "Free" version for the space hardware and set it up as a "Public" space.

As you can see, we selected Gradio as our web tool.

alt text

This is the result after creating the space:

alt text

We still need to add two files to our project: app.py and requirements.txt.

Let's create the requirements first:

alt text

After creating the "requirements.txt" file, the next step is to create an "app.py" file. To do this, we will copy the code below and paste it into a new file.

import gradio as gr
from transformers import pipeline

# load the image-text pipeline with Blip architecture
pipe = pipeline("image-to-text",
                model="Salesforce/blip-image-captioning-base")

# this function receives the input, call the pipeline
# and get the generated text from the output
def launch(input):
    out = pipe(input)
    return out[0]['generated_text']

# gradio interface, input is an image and output text
iface = gr.Interface(launch,
                     inputs=gr.Image(type='pil'),
                     outputs="text")

# if you want to share the link set "share=True"
iface.launch()
Once you commit your files and click on the App tab, Space will automatically load the required files and interface. After you finish running the files, you will see the screen below, which serves as our interface for interacting with the app.

alt text

Time to test our solution!

alt text alt text

At the bottom of your app screen, you can click "Use via API" to see sample code that you can use to use your model with an API call.

alt text

To run the program locally, copy and paste the code snippet onto your machine and execute the file. Remember to specify the path to the input and to call the API using the endpoint "/predict".

If you wish to keep your API private, you must pass your TOKEN to make the call.

References

This tutorial is based on the course "DeepLearning.AI-Open Source Models with Hugging Face."

Getting Started with AI Agents - Part II

We learned how tools are provided to agents in the system prompt and how AI agents can reason, plan, and interact with their environment.

Now, we'll examine the AI Agent Workflow, known as Thought-Action-Observation.

This tutorial is a summmary of HuggingFace Course on Agents - Unit 1.

The Thought-Action-Observation Cycle

The Think, Act, and Observe components operate in a continuous loop, following the flow outlined below:

Image title

After performing an action, the framework follows these steps in order:

  • Parse the action to identify the function(s) to call and the argument(s) to use.
  • Execute the action.
  • Append the result as an Observation.

In many agent frameworks, rules and guidelines are directly embedded into the system prompt. In the following System Message example we defined we can see:

  • The Agent’s behavior.

  • The Tools our Agent has access to.

  • The Thought-Action-Observation Cycle, that we insert into the LLM instructions.

Image title

Let’s consider a practical example. Imagine we ask an agent about the temperature in Toronto. When the agent receives this question, it begins the initial step of the "Think" process. This "Think" step represents the agent’s internal reasoning and planning activities to solve the task at hand. The agent utilizes its LLM capabilities to analyze the information presented in its prompt.

During this process, the agent can break down complex problems into smaller, more manageable steps, reflect on past experiences, and continuously adjust its plans based on new information. Key components of this thought process include planning, analysis, decision-making, problem-solving, memory integration, self-reflection, goal-setting, and prioritization.

For LLMs that are fine-tuned for function-calling, the thought process is optional.

The user needs current weather information for Toronto. I have access to a tool that fetches weather data. First, I need to call the weather API to get up-to-date details.

This step shows the agent breaking the problem into steps: first, gathering the necessary data.

Based on its reasoning and the fact that the Agent is aware of a get_weather tool, the Agent prepares a JSON-formatted command to call the weather API tool. For example, its first action could be:

   {
     "action": "get_weather",
     "action_input": {
       "location": "Toronto"
     }
   }
The "Observation" step refers to the environment's response to an API call or the raw data received. This observation is then added to the prompt as additional context. Before the Agent formats and presents the final answer to the user, it returns to the "Think" step to update its internal reasoning. If the observation indicates an error or incomplete data, the Agent may re-enter the cycle to correct its approach.

The ability to call external tools, such as a weather API, empowers the Agent to access real-time data, which is a critical capability for any effective AI agent. Each cycle prompts the Agent to integrate new information (observations) into its reasoning (thought process), ensuring that the final outcome is accurate and well-informed. This illustrates the core principle of the ReAct cycle: the dynamic interplay of Thought, Action, and Observation that enables AI agents to tackle complex tasks with precision and efficiency. By mastering these principles, you can design agents that not only reason through their tasks but also leverage external tools to achieve their objectives, continuously refining their outputs based on environmental feedback.

The ReAct Approach

Another technique is the ReAct approach, which combines “Reasoning” (Think) with “Acting” (Act). ReAct is a straightforward prompting method that adds step-by-step reasoning before allowing the LLM to interpret the next tokens. In fact, encouraging the model to think like this promotes the decoding process towards the next tokens that create a plan, rather than jumping to a final solution, as the model is prompted to break the problem down into smaller tasks. This enables the model to examine sub-steps more thoroughly, which generally results in fewer errors compared to attempting to produce the final solution all at once. For instance, DeepSeek, which have been fine-tuned to "think before answering". These models have been trained to always include specific thinking sections (enclosed between and special tokens). This is not just a prompting technique like ReAct, but a training method where the model learns to generate these sections after analyzing thousands of examples that show what we expect it to do.

Actions

Actions refer to the specific steps that an AI agent undertakes to engage with its surroundings. Whether it involves searching the internet for information or managing a physical device, every action is a purposeful task performed by the agent. For instance, an agent that aids in customer service could obtain customer information, provide support articles, or escalate problems to a human representative.

There are multiple types of Agents that take actions differently:

Type of Agent Description
JSON Agent The Action to take is specified in JSON format.
Code Agent The Agent writes a code block that is interpreted externally.
Function-calling Agent It is a subcategory of the JSON Agent which has been fine-tuned to generate a new message for each action.

Actions can serve several purposes:

  1. Information Gathering (e.g., web searches, database queries).
  2. Tool Usage (e.g., API calls, calculations).
  3. Environment Interaction (e.g., controlling devices).
  4. Communication (e.g., engaging with users).

An essential aspect of an agent is its ability to stop generating new tokens once an action is complete, applicable across all formats (JSON, code, function-calling). This prevents unintended output and ensures clarity. The LLM handles text to describe the desired action and its parameters.

One approach to implementing actions is known as the stop and parse approach. This method ensures that output generation is structured, using formats like JSON or code. It aims to avoid producing unnecessary tokens and to call the appropriate tool to extract the required parameters.

Thought: I need to check the current weather.
Action :
{
  "action": "get_weather",
  "action_input": {"location": "Toronto"}
}

Function-calling agents operate similarly by structuring each action so that a designated function is invoked with the correct arguments.

An alternative Action approach is using Code Agents. Instead of outputting a simple JSON object, a Code Agent generates an executable code block—typically in a high-level language like Python.

alt text

This approach offers several advantages:

  • Expressiveness: Code can naturally represent complex logic, including loops, conditionals, and nested functions, providing greater flexibility than JSON.

  • Modularity and Reusability: Generated code can include functions and modules that are reusable across different actions or tasks.

  • Enhanced Debuggability: With a well-defined programming syntax, code errors are often easier to detect and correct.

  • Direct Integration: Code Agents can integrate directly with external libraries and APIs, enabling more complex operations such as data processing or real-time decision making.

For example, a Code Agent tasked with fetching the weather might generate the following Python snippet:

# Code Agent Example: Retrieve Weather Information
def get_weather(city):
    import requests
    api_url = f"https://api.weather.com/v1/location/{city}?apiKey=YOUR_API_KEY"
    response = requests.get(api_url)
    if response.status_code == 200:
        data = response.json()
        return data.get("weather", "No weather information available")
    else:
        return "Error: Unable to fetch weather data."

# Execute the function and prepare the final answer
result = get_weather("Toronto")
final_answer = f"The current weather in Toronto is: {result}"
print(final_answer)
This method also follows the stop and parse approach by clearly delimiting the code block and signaling when execution is complete by printing the final_answer.

Observation

Observations are how an Agent perceives the consequences of its actions. We ca understand as signals from the environment that guide the next cycle of thought.

In the observation phase, the agent:

  • Collects Feedback: Receives confirmation of action success.
  • Appends Results: Updates its memory with new information.
  • Adapts Strategy: Refines future actions based on updated context.

This process of using feedback helps the agent stay on track with its goals. It allows the agent to learn and adjust continuously based on real-world results. Observation can also be seen as Tool “logs” that provide textual feedback of the Action execution.

Type of Observation and Examples

  1. System Feedback: Error messages, success notifications, or status codes.
  2. Data Changes: Updates in the database, modifications to the file system, or changes in state.
  3. Environmental Data: Readings from sensors, system metrics, or resource usage information.
  4. Response Analysis: Responses from APIs, query results, or outputs from computations.
  5. Time-based Events: Completion of scheduled tasks or milestones reached, such as deadlines.
Comments

Both tutorials may seem technical, but they offer an overview of understanding the potential of AI agents. In the next blog post, we will discuss an implementation in public transit.

Getting Started with AI Agents - Part I

It's hard to have a conversation about AI these days without bringing up tools like ChatGPT, DeepSeek, and the like, right? These tools are becoming key players in how we analyze and interact with public transport. Are you ready for this shift?

In this post, let's chat a bit about AI Agents and how we can tailor LLMs to answer the questions that matter most to us as Transit Data Scientists.

Before diving into more complex applications, let’s take a moment to introduce the topic. This way, we can get comfortable with the terms and concepts we'll be working with!

This tutorial is a summmary of HuggingFace Course on Agents - Unit 1.

What’s an AI Agent, Anyway?

An AI Agent is just a smart system that uses AI to interact with its surroundings and get stuff done. It thinks, plans, and takes action (sometimes using extra tools) to complete tasks.

AI Agents can do anything we set them up for using Tools to carry out Actions.

The Two Main Parts of an AI Agent

  1. The Brain (AI Model) - This is where all the decision-making happens. The AI figures out what to do next. Examples include Large Language Models (LLMs) like GPT-4.

  2. The Body (Tools & Capabilities) - This is what the agent actually does. Its abilities depend on the tools it has access to.

Why Use LLMs?

LLMs (Large Language Models) are the go-to choice for AI Agents because they’re great at understanding and generating text. Popular ones include GPT-4, Llama, and Gemini.

There are two ways you can use an LLM:

  • Run Locally (if your computer is powerful enough).
  • Use a Cloud/API (e.g., via Hugging Face’s API).

System Messages: Setting the Rules

System messages (or prompts) tell the AI how it should behave. They act as guiding instructions.

system_message = {
    "role": "system",
    "content": "You are a helpful customer service agent. Always be polite and clear."
}

These messages also define what tools the AI can use and how it should format its responses.

Conversations: How AI Talks to Users

A conversation is just back-and-forth messages between a user and the AI. Chat templates help keep things organized and make sure the AI remembers what’s going on.

Example:

conversation = [
    {"role": "user", "content": "I need help with my order"},
    {"role": "assistant", "content": "Sure! What’s your order number?"},
    {"role": "user", "content": "ORDER-123"},
]

Chat Templates: Keeping AI Conversations Structured

Chat templates make sure LLMs correctly process messages. There are two main types of AI models:

  • Base Models: Trained on raw text to predict the next word.
  • Instruct Models: Fine-tuned to follow instructions and have conversations.

We use ChatML, a structured format for messages. The transformers library takes care of this automatically:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
rendered_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

What are Tools?

Tools let AI Agents do more than just text generation. A Tool is basically a function the AI can call to get things done.

Tool What It Does
Web Search Fetches up-to-date info from the internet.
Image Generation Creates images from text.
Retrieval Pulls in data from other sources.
API Interface Connects with external APIs like GitHub or Spotify.

Why Do AI Agents Need Tools?

LLMs have a limited knowledge base (they only know what they were trained on). Tools help by allowing:

  • Real-time data fetching (e.g., checking the weather).
  • Specialized tasks (e.g., doing math, calling APIs).

Building a Simple Tool: A Calculator

Let’s create a basic calculator tool that multiplies two numbers:

def calculator(a: int, b: int) -> int:
    """Multiply two numbers."""
    return a * b

This tool includes:

  • A clear name (calculator).
  • A description (via the docstring).
  • Input and output types.

To define it as a tool, we describe it like this:

Tool Name: calculator, Description: Multiplies two numbers., Arguments: a: int, b: int, Outputs: int

Automating Tool Descriptions

Instead of writing descriptions manually, we can use Python introspection to extract details automatically. The Tool class helps manage this info.

class Tool:
    """
    A class representing a reusable piece of code (Tool).

    Attributes:
        name (str): Name of the tool.
        description (str): A textual description of what the tool does.
        func (callable): The function this tool wraps.
        arguments (list): A list of argument.
        outputs (str or list): The return type(s) of the wrapped function.
    """
    def __init__(self, 
                 name: str, 
                 description: str, 
                 func: callable, 
                 arguments: list,
                 outputs: str):
        self.name = name
        self.description = description
        self.func = func
        self.arguments = arguments
        self.outputs = outputs

    def to_string(self) -> str:
        """
        Return a string representation of the tool, 
        including its name, description, arguments, and outputs.
        """
        args_str = ", ".join([
            f"{arg_name}: {arg_type}" for arg_name, arg_type in self.arguments
        ])

        return (
            f"Tool Name: {self.name},"
            f" Description: {self.description},"
            f" Arguments: {args_str},"
            f" Outputs: {self.outputs}"
        )

    def __call__(self, *args, **kwargs):
        """
        Invoke the underlying function (callable) with provided arguments.
        """
        return self.func(*args, **kwargs)

Now, we can create a tool instance:

calculator_tool = Tool(
    "calculator",                   # name
    "Multiply two integers.",       # description
    calculator,                     # function to call
    [("a", "int"), ("b", "int")],   # inputs (names and types)
    "int",                          # output
)

Using a Decorator to Define Tools

A decorator makes tool creation easier:

def tool(func):
    """
    A decorator that creates a Tool instance from the given function.
    """
    # Get the function signature
    signature = inspect.signature(func)

    # Extract (param_name, param_annotation) pairs for inputs
    arguments = []
    for param in signature.parameters.values():
        annotation_name = (
            param.annotation.__name__ 
            if hasattr(param.annotation, '__name__') 
            else str(param.annotation)
        )
        arguments.append((param.name, annotation_name))

    # Determine the return annotation
    return_annotation = signature.return_annotation
    if return_annotation is inspect._empty:
        outputs = "No return annotation"
    else:
        outputs = (
            return_annotation.__name__ 
            if hasattr(return_annotation, '__name__') 
            else str(return_annotation)
        )

    # Use the function's docstring as the description (default if None)
    description = func.__doc__ or "No description provided."

    # The function name becomes the Tool name
    name = func.__name__

    # Return a new Tool instance
    return Tool(
        name=name, 
        description=description, 
        func=func, 
        arguments=arguments, 
        outputs=outputs
    )

Now, we can define tools like this:

@tool
def calculator(a: int, b: int) -> int:
    """Multiply two integers."""
    return a * b

print(calculator.to_string())

This makes it easy for AI Agents to recognize and use tools based on text input.

Recap

  • AI Agents use AI models to interact and make decisions.
  • LLMs handle language understanding and text generation.
  • System Messages define the agent’s behavior.
  • Tools extend an AI’s capabilities beyond text generation.
  • Chat Templates format conversations properly.
  • Tools help AI Agents fetch real-time data and execute tasks.

By combining all these pieces, you can build smart AI Agents that think, act, and assist like pros!

In the next tutorial we will discuss the AI ​​Agents workflow.