Analyzing a Simple AI Agent Code

Introduction

Lately, I find myself using AI agents like Claude Code and Cursor frequently in my work. Based on knowledge gathered from various sources, I had a rough understanding of how these programs work at a high level. However, I had never actually dug into the code itself. Most agents I’ve encountered are already too feature-rich and mature, making their codebases unnecessarily complex.

Then I came across a very basic AI agent code called gemini-writer. It seems to be a tool for writing novels or short stories. The code was simpler than I expected, but it seemed to capture the essence of what an AI agent is all about. So I decided to analyze it.

This article is a record of walking through that code, exploring how an AI agent actually communicates with an LLM and how it performs tasks step by step.

Tool-Using Capability

In one sentence, an AI agent can be defined as: A system that enables an LLM to use tools.

When the model decides “I need to create a file to complete this task,” a function that actually creates the file is called. When it thinks “I should add content here,” that content is actually written to a file.

Looking at the gemini-writer code, this structure becomes clear.

# Part of the main loop in writer.py
response = client.models.generate_content(
    model=MODEL_NAME,
    contents=contents,
    config=generate_config,
)

# Extract function calls from model response
function_calls_list = []
for part in model_content.parts:
    if hasattr(part, 'function_call') and part.function_call:
        fc = part.function_call
        function_calls_list.append({
            "name": fc.name,
            "args": dict(fc.args) if fc.args else {}
        })

# Execute detected tool calls
for fc in function_calls_list:
    func_name = fc["name"]
    args = fc["args"]
    tool_func = tool_map.get(func_name)
    result = tool_func(**args)

Let’s follow the flow:

The client sends the current situation (contents) and the list of available tools (tools) to the LLM.
The LLM assesses the situation and, if it concludes “I need to create a file now,” includes a function_call in its response.
The agent parses this function_call and executes the corresponding Python function.
The execution result is sent back to the LLM.
The LLM decides the next action based on that result.

This flow is the basic operating principle of an agent.

Tool Definitions

The interesting point is that the LLM doesn’t actually execute those functions. The LLM only receives a description saying “these tools are available” and decides when and what to use.

gemini-writer provides three tools:

# From utils.py
def get_tool_definitions() -> types.Tool:
    return types.Tool(
        function_declarations=[
            types.FunctionDeclaration(
                name="create_project",
                description="Creates a new project folder in the 'output' directory...",
                parameters=types.Schema(...)
            ),
            types.FunctionDeclaration(
                name="write_file",
                description="Writes content to a markdown file...",
                parameters=types.Schema(...)
            ),
            types.FunctionDeclaration(
                name="compress_context",
                description="INTERNAL TOOL - Automatically called when token limit approached...",
                parameters=types.Schema(...)
            )
        ]
    )

Each tool has a name, description, and specification of what parameters it should receive. When this definition is shown to the LLM, the model understands “Ah, if I want to write a file, I should call write_file and pass filename and content.”

It’s like handing the LLM a tool user manual.

Actual Tool Implementation

Meanwhile, the functions that actually do the work exist separately.

# From tools/writer.py
def write_file_impl(filename: str, content: str, mode: Literal["create", "append", "overwrite"]) -> str:
    project_folder = get_active_project_folder()
    if not project_folder:
        return "Error: No active project folder..."
    
    file_path = os.path.join(project_folder, filename)
    
    if mode == "create":
        if os.path.exists(file_path):
            return f"Error: File '{filename}' already exists..."
        with open(file_path, 'w', encoding='utf-8') as f:
            f.write(content)
        return f"Successfully created file '{filename}'..."

When the LLM requests a write_file call, the agent finds and executes this write_file_impl function. Then it converts the result to a string and sends it back to the LLM.

The important point here is that the LLM doesn’t actually touch the file system. The LLM only declares “I will create a file,” and the Python code handles the actual work.

Think -> Act -> Observe

Looking at gemini-writer’s main loop, the working mechanism of an AI agent becomes clear at a glance.

1. Add user input to contents
2. Send contents and tools to LLM and receive response
3. Check if there are function calls in the response
4. If there are function calls:
   - Execute the corresponding Python function
   - Package the execution result as a function response
   - Add the result to contents (as user input format)
5. If there are no function calls:
   - Consider the task complete and exit the loop
6. Check token count, compress if necessary
7. Go back to step 2 and repeat

In gemini-writer, this loop is set to repeat up to 300 times. Each iteration is called an “iteration,” which is the unit of the agent “thinking once and acting once.”

Context Management

One of the most important aspects when designing an AI agent is context management. LLMs have a limit on the number of tokens they can process at once. gemini-writer uses the Gemini model, which supports up to 1 million tokens. That seems generous, but during long tasks, you’ll eventually hit the limit.

That’s why gemini-writer is equipped with an automatic compression feature.

# From writer.py
TOKEN_LIMIT = 1000000
COMPRESSION_THRESHOLD = 900000  # Start compression at 90%

# In the main loop
token_count = estimate_token_count(client, MODEL_NAME, contents)
if token_count >= COMPRESSION_THRESHOLD:
    compression_result = compress_context_impl(
        messages=simple_messages,
        client=client,
        model=MODEL_NAME,
        keep_recent=10
    )

When the token count exceeds 900,000, it summarizes and compresses past conversations. The method keeps the 10 most recent messages as-is and replaces everything before that with a single summary.

The compression function calls the LLM again to create the summary.

# From tools/compression.py
summary_prompt = """Please provide a comprehensive summary of the conversation history below. Include:
1. The main task or goal discussed
2. Key decisions made
3. Files created and their purposes
4. Progress made so far
5. Any important context for continuing the work
"""

summary_response = client.models.generate_content(
    model=model,
    contents=contents,
    config=types.GenerateContentConfig(temperature=0.7)
)

In other words, the LLM summarizes past conversations, and then continues the next task based on that summary.

Recovery Mode

gemini-writer supports a recovery mode. If an interrupt occurs during work or the token limit is reached, the agent saves the current state to a file.

./my_project/.context_summary_20260406_222702.md

This file contains a summary of the conversation up to that point. When restarted with this file next time, the agent can restore the previous state and continue the work.

# From writer.py
if args.recover:
    context = load_context_from_file(args.recover)
    initial_message = f"[RECOVERED CONTEXT]\n\n{context}\n\nPlease continue..."