AI & Machine Learning

Building Agentic Workflows with Anthropic's Claude: A Production Guide

How we use Anthropic's Claude API to power agentic workflows at Modelia.ai — from tool use and extended thinking to structured outputs and multi-turn agents. Practical patterns for building reliable AI agents with Claude.

Harsh RastogiHarsh Rastogi
Mar 10, 202616 min
Agentic AIAnthropicClaudeAI SystemsTypeScriptLLM

Why We Chose Anthropic's Claude for Agentic Workloads

When we evaluated LLM providers for Modelia.ai's agentic workflows, the decision came down to three factors: tool use reliability, instruction following, and safety. Anthropic's Claude consistently outperformed alternatives on all three, and after months of production usage, I can explain exactly why.

Claude's tool use implementation is exceptional. When you define tools with clear JSON schemas, Claude doesn't just call them — it reasons about *which* tool to use, *why*, and *what parameters* to pass. In our fashion workflow agents, Claude correctly selects between 12 different tools with 95%+ accuracy, compared to ~80% we saw with other models. The difference in production reliability is night and day.

But the real unlock for agentic systems is Claude's extended thinking capability. When you enable it, Claude shows its chain of thought before acting — allowing you to audit decisions, debug failures, and understand *why* the agent chose a particular path. For high-stakes workflows like automated quality control on AI-generated fashion images, this transparency is essential.

Setting Up Claude for Agentic Tool Use

Anthropic's API has first-class support for tool use. Here's how we set up our agent's backbone at Modelia.ai:

typescript
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

// Define tools with precise descriptions — Claude is very responsive to description quality
const tools: Anthropic.Tool[] = [
  {
    name: 'generate_fashion_image',
    description: 'Generate an AI fashion image with a specific model, outfit, pose, and background. Use this when the workflow requires a new image to be created. Returns an image URL and generation metadata.',
    input_schema: {
      type: 'object',
      properties: {
        model_id: {
          type: 'string',
          description: 'The AI model ID to use for the fashion shoot',
        },
        outfit_description: {
          type: 'string',
          description: 'Detailed description of the outfit including colors, fabrics, and style',
        },
        pose: {
          type: 'string',
          enum: ['standing', 'walking', 'seated', 'editorial', 'candid'],
          description: 'The pose style for the model',
        },
        background: {
          type: 'string',
          description: 'Background setting description (e.g., "minimalist white studio", "urban street")',
        },
        aspect_ratio: {
          type: 'string',
          enum: ['1:1', '4:5', '9:16', '16:9'],
          description: 'Output image aspect ratio',
        },
      },
      required: ['model_id', 'outfit_description', 'pose'],
    },
  },
  {
    name: 'evaluate_image_quality',
    description: 'Run quality checks on a generated image. Evaluates resolution, composition, brand guideline compliance, and visual artifacts. Returns a score (0-100) and specific feedback.',
    input_schema: {
      type: 'object',
      properties: {
        image_url: { type: 'string', description: 'URL of the image to evaluate' },
        brand_guidelines: { type: 'string', description: 'Brand style guidelines to check against' },
      },
      required: ['image_url'],
    },
  },
  {
    name: 'publish_to_catalog',
    description: 'Publish a finalized image to the product catalog. This is a high-stakes action — only use after quality checks pass. Returns the catalog entry ID.',
    input_schema: {
      type: 'object',
      properties: {
        image_url: { type: 'string' },
        product_id: { type: 'string' },
        metadata: {
          type: 'object',
          properties: {
            title: { type: 'string' },
            tags: { type: 'array', items: { type: 'string' } },
          },
        },
      },
      required: ['image_url', 'product_id'],
    },
  },
];

The Agentic Loop with Claude

Here's the core agent loop using Claude's Messages API with tool use:

typescript
interface AgentConfig {
  systemPrompt: string;
  tools: Anthropic.Tool[];
  maxTurns: number;
  model: string;
  enableThinking: boolean;
}

async function runClaudeAgent(
  goal: string,
  config: AgentConfig,
  toolExecutors: Map<string, (input: unknown) => Promise<unknown>>
): Promise<AgentResult> {
  const messages: Anthropic.MessageParam[] = [
    { role: 'user', content: goal },
  ];

  for (let turn = 0; turn < config.maxTurns; turn++) {
    const response = await anthropic.messages.create({
      model: config.model,
      max_tokens: 4096,
      system: config.systemPrompt,
      tools: config.tools,
      messages,
      ...(config.enableThinking && {
        thinking: { type: 'enabled', budget_tokens: 2048 },
      }),
    });

    // Check if Claude wants to use tools
    if (response.stop_reason === 'tool_use') {
      // Add Claude's response (which includes tool_use blocks)
      messages.push({ role: 'assistant', content: response.content });

      // Execute each tool call and collect results
      const toolResults: Anthropic.ToolResultBlockParam[] = [];

      for (const block of response.content) {
        if (block.type === 'tool_use') {
          const executor = toolExecutors.get(block.name);
          if (!executor) {
            toolResults.push({
              type: 'tool_result',
              tool_use_id: block.id,
              content: `Error: Unknown tool "${block.name}"`,
              is_error: true,
            });
            continue;
          }

          try {
            const result = await executor(block.input);
            toolResults.push({
              type: 'tool_result',
              tool_use_id: block.id,
              content: JSON.stringify(result),
            });
          } catch (error) {
            toolResults.push({
              type: 'tool_result',
              tool_use_id: block.id,
              content: `Tool execution failed: ${error.message}`,
              is_error: true,
            });
          }
        }
      }

      messages.push({ role: 'user', content: toolResults });
    } else {
      // Claude returned a final text response — agent is done
      const textContent = response.content
        .filter(b => b.type === 'text')
        .map(b => b.text)
        .join('\n');

      return {
        success: true,
        output: textContent,
        turns: turn + 1,
        thinking: response.content
          .filter(b => b.type === 'thinking')
          .map(b => b.thinking),
      };
    }
  }

  return { success: false, output: 'Max turns reached', turns: config.maxTurns };
}

Extended Thinking: The Agent Debugger

One of Anthropic's most powerful features for agentic systems is extended thinking. When enabled, Claude explicitly reasons through its decisions before acting. This is invaluable for:

  • Debugging — When an agent makes a wrong tool call, the thinking trace shows exactly why
  • Auditing — For compliance-sensitive workflows, you have a reasoning trail
  • Improvement — Patterns in thinking traces reveal where your tools or prompts need refinement
typescript
// Enable extended thinking for complex decisions
const response = await anthropic.messages.create({
  model: 'claude-sonnet-4-20250514',
  max_tokens: 16000,
  thinking: {
    type: 'enabled',
    budget_tokens: 4096, // Give Claude room to think deeply
  },
  system: `You are a fashion quality control agent. Evaluate images against brand guidelines.
    Think carefully about composition, color accuracy, and model representation before scoring.`,
  tools: qualityCheckTools,
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Evaluate this generated image against our luxury brand guidelines.' },
        { type: 'image', source: { type: 'url', url: imageUrl } },
      ],
    },
  ],
});

// Extract thinking for audit log
const thinkingBlocks = response.content.filter(b => b.type === 'thinking');
await auditLog.save({
  agentAction: 'quality_check',
  reasoning: thinkingBlocks.map(b => b.thinking).join('\n'),
  decision: response.content.filter(b => b.type === 'text'),
  timestamp: new Date(),
});

Structured Outputs with Claude

For agentic workflows, you need structured, parseable outputs — not free-form text. Claude handles this elegantly with tool-based structured output:

typescript
// Force structured output by defining a "result" tool
const structuredOutputTool: Anthropic.Tool = {
  name: 'submit_evaluation',
  description: 'Submit the final quality evaluation. You MUST call this tool with your evaluation results.',
  input_schema: {
    type: 'object',
    properties: {
      overall_score: { type: 'number', minimum: 0, maximum: 100 },
      composition_score: { type: 'number', minimum: 0, maximum: 100 },
      brand_compliance: { type: 'boolean' },
      issues: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            severity: { type: 'string', enum: ['critical', 'major', 'minor'] },
            description: { type: 'string' },
            suggestion: { type: 'string' },
          },
          required: ['severity', 'description'],
        },
      },
      approved_for_catalog: { type: 'boolean' },
      reasoning: { type: 'string' },
    },
    required: ['overall_score', 'brand_compliance', 'approved_for_catalog', 'reasoning'],
  },
};

Prompt Engineering for Agentic Claude

Claude responds exceptionally well to clear role definitions and explicit behavioral constraints. Here's the system prompt pattern we use:

typescript
const agentSystemPrompt = `You are a fashion workflow automation agent at Modelia.ai.

## Your Role
You orchestrate AI fashion image generation workflows. You plan shoots, generate images,
run quality checks, and iterate until the output meets brand standards.

## Rules
1. ALWAYS run quality checks before publishing to catalog
2. If quality score is below 80, regenerate with adjusted parameters — do NOT publish
3. Maximum 3 regeneration attempts per image. After 3 failures, escalate to human review
4. Use extended thinking to reason through creative decisions
5. When multiple images are needed, generate them in parallel when possible

## Tool Usage Guidelines
- generate_fashion_image: Use for creating new images. Be specific with outfit descriptions.
- evaluate_image_quality: Use AFTER every generation. Never skip quality checks.
- publish_to_catalog: ONLY after quality score >= 80 and brand_compliance is true.
- escalate_to_human: When you've exhausted retries or face ambiguous brand guidelines.

## Output Format
After completing a workflow, summarize: images generated, quality scores, published items,
and any escalations. Be concise.`;

Error Handling and Retry Patterns

Claude's API can return rate limits, overload errors, or timeout. In an agentic loop, you need robust retry logic:

typescript
async function callClaudeWithRetry(
  params: Anthropic.MessageCreateParams,
  maxRetries: number = 3
): Promise<Anthropic.Message> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await anthropic.messages.create(params);
    } catch (error) {
      if (error instanceof Anthropic.RateLimitError) {
        const waitMs = Math.pow(2, attempt) * 1000 + Math.random() * 1000;
        console.log(`Rate limited. Retrying in ${waitMs}ms (attempt ${attempt + 1})`);
        await new Promise(resolve => setTimeout(resolve, waitMs));
        continue;
      }

      if (error instanceof Anthropic.APIError && error.status === 529) {
        // Overloaded — back off more aggressively
        const waitMs = Math.pow(2, attempt) * 5000;
        await new Promise(resolve => setTimeout(resolve, waitMs));
        continue;
      }

      throw error; // Non-retryable error
    }
  }

  throw new Error('Max retries exceeded for Claude API call');
}

Monitoring Agent Performance

We track these metrics for every Claude-powered agent run at Modelia.ai:

MetricWhat It Tells You
Tool call accuracy% of tool calls that succeed without error
Turns per taskHow efficiently the agent completes goals
Thinking token usageWhether thinking budgets are adequate
Quality score distributionWhether the agent is producing good outputs
Escalation rateHow often the agent needs human help
Cost per workflowTotal API cost (input + output + thinking tokens)
typescript
interface AgentMetrics {
  workflowId: string;
  model: string;
  totalTurns: number;
  toolCallCount: number;
  toolErrorCount: number;
  thinkingTokensUsed: number;
  totalInputTokens: number;
  totalOutputTokens: number;
  estimatedCost: number;
  qualityScore: number | null;
  escalated: boolean;
  durationMs: number;
}

// Log after every agent run
async function logAgentMetrics(metrics: AgentMetrics) {
  await analytics.track('agent_workflow_complete', metrics);

  // Alert if cost is unexpectedly high
  if (metrics.estimatedCost > COST_THRESHOLD) {
    await alerting.notify(`Agent workflow ${metrics.workflowId} cost ${metrics.estimatedCost.toFixed(2)}`);
  }
}

Key Takeaways

  • Claude's tool use is production-ready — With well-designed tool descriptions and schemas, Claude selects and uses tools with 95%+ accuracy in our workflows
  • Extended thinking is a game-changer for agents — It provides debuggability, auditability, and reveals exactly how the agent reasons about complex decisions
  • Structured outputs via tools — Define a "submit result" tool to force Claude into structured, parseable output formats
  • System prompts matter enormously — Clear role definitions, explicit rules, and tool usage guidelines dramatically improve agentic behavior
  • Budget your tokens wisely — Extended thinking tokens, tool call overhead, and multi-turn context add up fast. Monitor costs per workflow.
  • Always implement retry logic — Rate limits and overload errors are normal at scale. Exponential backoff with jitter is essential.
  • Anthropic's safety focus benefits agents — Claude is less likely to take harmful or nonsensical actions compared to other models, which is critical for autonomous systems.

Share this article

Harsh Rastogi - Full Stack Engineer

Harsh Rastogi

Full Stack Engineer

Full Stack Engineer building production AI systems at Modelia. Previously at Asynq and Bharat Electronics Limited. Published researcher.

Connect on LinkedIn

Follow me for more insights on software engineering, system design, and career growth.

View Profile