当前位置: 首页 > news >正文

从零开始构建一个基于Gemini 3的AI智能体

这篇文章介绍了如何从零开始构建一个基于Gemini 3的AI智能体。智能体的核心构成非常简单,包括一个LLM、可执行的工具、上下文/记忆以及一个不断循环的流程。文章通过逐步引导,展示了从基本的文本生成到创建一个功能完善的CLI代理的过程。

Practical Guide on how to build an Agent from scratch with Gemini 3

It seems complicated, when you watch an AI agent edit multiple files, run commands, handle errors, and iteratively solve a problem, it feels like magic. But it isn’t. The secret to building an agent is that there is no secret.

The core of an Agent is surprisingly simple: It is a Large Language Model (LLM) running in a loop, equipped with tools it can choose to use.

If you can write a loop in Python, you can build an agent. This guide will walk you through the process, from a simple API call to a functioning CLI Agent.

What actually is an Agent?

Traditional software workflows are prescriptive and follow predefined paths (Step A -> Step B -> Step C), Agents are System that uses an LLM to dynamically decide the control flow of an application to achieve a user goal.

An agent generally consists of these core components:

  1. The Model (Brain): The reasoning engine, in our case a Gemini model. It reasons through ambiguity, plans steps, and decides when it needs outside help.
  2. Tools (Hands and Eyes): Functions the agent can execute to interact with the outside world/environment (e.g., searching the web, reading a file, calling an API).
  3. Context/Memory (Workspace): The information the agent has access to at any moment. Managing this effectively, known as Context Engineering.
  4. The Loop (Life): A while loop that allows the model to: Observe → Think → Act → Observe again, until the task is complete.

agent

"The Loop" of nearly every agent is an iterative process:

  1. Define Tool Definitions: You describe your available tools (e.g., get_weather) to the model using a structured JSON format.
  2. Call the LLM: You send the user's prompt and the tool definitions to the model.
  3. Model Decision: The model analyzes the request. If a tool is needed, it returns a structured tool use containing the tool name and arguments.
  4. Execute Tool (Client Responsibility): The client/application code intercepts this tool use, executes the actual code or API call, and captures the result.
  5. Respond and Iterate: You send the result (the tool response) back to model. The model uses this new information to decide the next step, either calling another tool or generating the final response.

Building an Agent

Let's build an agent step-by-step, progressing from basic text generation to a functional CLI agent using Gemini 3 Pro and Python SDK.

Prerequisites: Install the SDK (pip install google-genai) and set your GEMINI_API_KEY environment variable (Get it in AI Studio).

Step 1: Basic Text Generation and Abstraction

The first step is to create a baseline interaction with the LLM, for us Gemini 3 Pro. We are going to create a simple Agent class abstraction to structure our code, which we will extend throughout this guide. We will first start with a simple chatbot that maintains a conversation history.

from google import genaifrom google.genai import types class Agent:    def __init__(self, model: str):        self.model = model        self.client = genai.Client()        self.contents = []     def run(self, contents: str):        self.contents.append({"role": "user", "parts": [{"text": contents}]})         response = self.client.models.generate_content(model=self.model, contents=self.contents)        self.contents.append(response.candidates[0].content)         return response agent = Agent(model="gemini-3-pro-preview")response1 = agent.run(    contents="Hello, What are top 3 cities in Germany to visit? Only return the names of the cities.") print(f"Model: {response1.text}")# Output: Berlin, Munich, Cologne response2 = agent.run(    contents="Tell me something about the second city.") print(f"Model: {response2.text}")# Output: Munich is the capital of Bavaria and is known for its Oktoberfest.

This is not an agent yet. It is a standard chatbot. It maintains state but cannot take action, has no "hands or eyes".

Step 2: Giving it Hands & Eyes (Tool Use)

To start turning this an agent, we need Tool Use or Function Calling. We provide the agent with tools. This requires defining the implementation (the Python code) and the definition (the schema the LLM sees). If the LLM believes that tool will help solve a user's prompt, it will return a structured request to call that function instead of just text.

We are going to create 3 tools, read_filewrite_file, and list_dir. A tool Definition is a JSON schema that defines the namedescription, and parameters of the tool.

Best Practice: Use the description fields to explain when and how to use the tool. The model relies heavily on these to understand when and how to use the tool. Be explicit and clear.

import osimport json read_file_definition = {    "name": "read_file",    "description": "Reads a file and returns its contents.",    "parameters": {        "type": "object",        "properties": {            "file_path": {                "type": "string",                "description": "Path to the file to read.",            }        },        "required": ["file_path"],    },} list_dir_definition = {    "name": "list_dir",    "description": "Lists the contents of a directory.",    "parameters": {        "type": "object",        "properties": {            "directory_path": {                "type": "string",                "description": "Path to the directory to list.",            }        },        "required": ["directory_path"],    },} write_file_definition = {    "name": "write_file",    "description": "Writes a file with the given contents.",    "parameters": {        "type": "object",        "properties": {            "file_path": {                "type": "string",                "description": "Path to the file to write.",            },            "contents": {                "type": "string",                "description": "Contents to write to the file.",            },        },        "required": ["file_path", "contents"],    },} def read_file(file_path: str) -> dict:    with open(file_path, "r") as f:        return f.read() def write_file(file_path: str, contents: str) -> bool:    """Writes a file with the given contents."""    with open(file_path, "w") as f:        f.write(contents)    return True def list_dir(directory_path: str) -> list[str]:    """Lists the contents of a directory."""    full_path = os.path.expanduser(directory_path)    return os.listdir(full_path) file_tools = {    "read_file": {"definition": read_file_definition, "function": read_file},    "write_file": {"definition": write_file_definition, "function": write_file},    "list_dir": {"definition": list_dir_definition, "function": list_dir},}

Now we integrate the tools and function calls into our Agent class.

from google import genaifrom google.genai import types class Agent:    def __init__(self, model: str,tools: list[dict]):        self.model = model        self.client = genai.Client()        self.contents = []        self.tools = tools     def run(self, contents: str):        self.contents.append({"role": "user", "parts": [{"text": contents}]})         config = types.GenerateContentConfig(            tools=[types.Tool(function_declarations=[tool["definition"] for tool in self.tools.values()])],        )         response = self.client.models.generate_content(model=self.model, contents=self.contents, config=config)        self.contents.append(response.candidates[0].content)         return response agent = Agent(model="gemini-3-pro-preview", tools=file_tools) response = agent.run(    contents="Can you list my files in the current directory?")print(response.function_calls)# Output: [FunctionCall(name='list_dir', arguments={'directory_path': '.'})]

Great! The model has successfully called the tool. Now, we need to add the tool execution logic to our Agent class and the loop return the result back to the model.

Step 3: Closing the Loop (The Agent)

An Agent isn't about generating one tool call, but about generating a series of tool calls, returning the results back to the model, and then generating another tool call, and so on until the task is completed.

The Agent class handles the core loop: intercepting the FunctionCall, executing the tool on the client side, and sending back the FunctionResponse. We also add a SystemInstruction to the model to guide the model on what to do.

Note: Gemini 3 uses Thought signatures to maintain reasoning context across API calls. You must return these signatures back to the model in your request exactly as they were received.

# ... Code for the tools and tool definitions from Step 2 should be here ... from google import genaifrom google.genai import types class Agent:    def __init__(self, model: str,tools: list[dict], system_instruction: str = "You are a helpful assistant."):        self.model = model        self.client = genai.Client()        self.contents = []        self.tools = tools        self.system_instruction = system_instruction     def run(self, contents: str | list[dict[str, str]]):        if isinstance(contents, list):            self.contents.append({"role": "user", "parts": contents})        else:            self.contents.append({"role": "user", "parts": [{"text": contents}]})         config = types.GenerateContentConfig(            system_instruction=self.system_instruction,            tools=[types.Tool(function_declarations=[tool["definition"] for tool in self.tools.values()])],        )         response = self.client.models.generate_content(model=self.model, contents=self.contents, config=config)        self.contents.append(response.candidates[0].content)         if response.function_calls:            functions_response_parts = []            for tool_call in response.function_calls:                print(f"[Function Call] {tool_call}")                 if tool_call.name in self.tools:                    result = {"result": self.tools[tool_call.name]["function"](**tool_call.args)}                else:                    result = {"error": "Tool not found"}                 print(f"[Function Response] {result}")                functions_response_parts.append({"functionResponse": {"name": tool_call.name, "response": result}})             return self.run(functions_response_parts)                return response agent = Agent(    model="gemini-3-pro-preview",     tools=file_tools,     system_instruction="You are a helpful Coding Assistant. Respond like you are Linus Torvalds.") response = agent.run(    contents="Can you list my files in the current directory?")print(response.text)# Output: [Function Call] id=None args={'directory_path': '.'} name='list_dir'# [Function Response] {'result': ['.venv', ... ]}# There. Your current directory contains: `LICENSE`,

Congratulations. You just built your first functioning agent.

Phase 4: Multi-turn CLI Agent

Now we can run our agent in a simple CLI loop. It takes surprisingly little code to create highly capable behavior.

# ... Code for the Agent, tools and tool definitions from Step 3 should be here ... agent = Agent(    model="gemini-3-pro-preview",     tools=file_tools,     system_instruction="You are a helpful Coding Assistant. Respond like you are Linus Torvalds.") print("Agent ready. Ask it to check files in this directory.")while True:    user_input = input("You: ")    if user_input.lower() in ['exit', 'quit']:        break     response = agent.run(user_input)    print(f"Linus: {response.text}\n")

Best Practices for Engineering Agents

Building the loop is easy; making it reliable, transparent, and controllable is hard. Here are key engineering principles derived from top industry practices, grouped by functional area.

1. Tool Definition & Ergonomics

Your tools are the interface for the model. Don't just wrap your existing internal APIs. If a tool is confusing to a human, it's confusing to the model:

  • Clear Naming: Use obvious names like search_customer_database rather than cust_db_v2_query.
  • Precise Descriptions: Gemini reads the function docstrings to understand when and how to use a tool. Spend time writing these carefully, it is essentially "prompt engineering" for tools.
  • Return Meaningful Errors: Don't return a 50-line Java stack trace. If a tool fails, return a clear string like Error: File not found. Did you mean 'data.csv'?. This allows the agent to self-correct.
  • Tolerate Fuzzy Inputs: If a model frequently guesses file paths wrong, update your tool to handle relative paths or fuzzy inputs rather than just erroring out.

2. Context Engineering

Models have a finite "attention budget." Managing what information enters the context is crucial for performance and cost.

  • Don't "Dump" Data: Don't have a tool that returns an entire 10MB database table. Instead of get_all_users(), create search_users(query: str).
  • Just-in-time Loading: Instead of pre-loading all data (traditional RAG), use just-in-time strategies. The agent should maintain lightweight identifiers (file paths, IDs) and use tools to dynamically load content only when needed.
  • Compression: For very long-running agents, summarize the history, remove old context or start a new sessions.
  • Agentic Memory: Allow the agent to maintain notes or a scratchpad persisted outside the context window, pulling them back in only when relevant.

3. Don't over engineer

It's tempting to build complex multi-agent systems. Don't.

  • Maximize a Single Agent First: Don't immediately build complex multi-agent systems. Gemini is highly capable of handling dozens of tools in a single prompt.
  • Escape Hatches: Ensure loops can be stopped like a max_iterations break (e.g., 15 turns).
  • Guardrails and System Instructions: Use the system_instruction to guide the model with hard rules (e.g., "You are strictly forbidden from offering refunds greater than $50") or use external classifier.
  • Human-in-the-loop: For sensitive actions (like send_email or execute_code), pause the loop and require user confirmation before the tool is actually executed.
  • Prioritize Transparency and Debugging: Log tool calls and parameters. Analyzing the model's reasoning helps identify if issues, and improve the agent over time.

Conclusion

Building an agent is no longer magic; it is a practical engineering task. As we've shown, you can build a working prototype in under 100 lines of code. While understanding these fundamentals is key, don't get bogged down re-engineering the same pattern over and over. The AI community has created fantastic open-source libraries that can help you build more complex and robust agents faster.

http://www.jsqmd.com/news/50022/

相关文章:

  • 2025年市场可靠的清障车厂家推荐,直臂高空作业车/黄牌清障车/常奇清障车/程力清障车/清障车带吊/五十铃清障车清障车直销厂家推荐榜单
  • 2025年合肥牛羊肉供应商综合实力排行榜TOP10:专业评测与选择指南
  • 2025 年 11 月门窗展会权威推荐榜:移门/全屋定制/淋浴房/五金型材/门窗机械/木工机械/玻璃门展会全景解析与创新设计风向标
  • 2025年北京婚姻律所权威推荐榜单:离婚律所/离婚事务所/离婚房产律所团队精选
  • 2025 最新纸塑分离机厂家推荐排行榜:涵盖不干胶 / 淋膜纸 / 奶盒等多场景,权威筛选优质设备厂商
  • 2025年河南知名的伸缩门供应商综合实力排行榜
  • 2025 定制叠层母排厂家优选指南:深圳市格雷特通讯科技有限公司浸粉叠层母排定制 / 叠层母排浸粉专业解决方案
  • 2025年最新上门家教老师综合实力排行,上门家教/一对一家教上门家教机构老师推荐榜单
  • 中文乱码
  • 2025年靠谱的数据中心感烟火灾探测器行业内知名厂家排行榜
  • 020-Spring AI Alibaba DashScope Image 功能完整案例 - 指南
  • 2025年上海全铝家居定制生产商 top10 权威推荐榜单
  • 2025年北京离婚房产律所权威推荐榜单:婚姻律所/离婚事务所/婚姻专业律师团队精选
  • 2025年评价高的密植果树拉技塑钢线厂家选购指南与推荐
  • 2025金属复合板厂家哪家好:吉祥金属复合板厂家解读
  • 2025年上海全铝家居定制品牌前十强权威推荐榜单
  • 2025中山留学中介哪家好?优质机构解析
  • 2025ESD静电管的工厂测评-电感工厂实力分析
  • 2025 年 11 月中国水泵厂家权威推荐榜:涵盖管道/消防/多级/自吸/磁力/排污/真空/离心/卧式水泵,匠心制造与高效性能深度解析
  • kubernetes pod是什么?
  • 2025天然气压缩机源头工厂盘点分析
  • 2025并网柜制造厂家排名测评:行业风向尽在掌握
  • 金华门窗定制厂家哪家好?2025门窗定制出口工厂深度剖析
  • 2025年比较好的气体探测最新TOP厂家排名
  • 2025 广东专业音响厂家优选指南:广东天宏声光电科技有限公司 MYART/JBL/LYNX 进口国产演出 / KTV / 会议 / 舞台音响解决方案
  • 聚焦2025PA磨粉机厂家:甄选中国塑料磨粉机厂家
  • 如何通过LabVIEW父类获取其所有子类?
  • 2025 深圳手板制作厂家标杆:深圳市赛达尔手板模型有限公司 CNC/3D 打印 / 塑胶 / 铝合金 / 新能源汽车手板定制指南
  • 优质2025大型卷板机厂家哪家好测评大盘点
  • 2025年评价高的阻燃采样管路实力厂家TOP推荐榜