Tutorial: Build an AI Agent That Delegates Real-World Tasks to Humans

HumanOps Team

Feb 10, 202612 min read

One of the most powerful capabilities emerging in 2026 is the ability for AI agents to delegate physical-world tasks to verified human operators. Instead of being limited to what they can accomplish through digital APIs, AI agents can now request real-world actions like photographing a location, verifying a delivery, inspecting equipment, or collecting documents. This tutorial walks you through building a complete AI agent that does exactly this, from setup to production deployment.

We will build an example agent that monitors restaurant review platforms, detects potential issues like reports of a closed location or health code violations, and automatically dispatches a human operator to the physical location to verify the situation and submit photographic proof. The agent will then process the verification results and take appropriate action based on the findings. This represents a realistic use case for businesses that need ground truth verification of online information.

The tutorial covers two integration methods: the HumanOps REST API for maximum flexibility and language-agnostic integration, and the HumanOps MCP server for native integration with Claude, Cursor, and other MCP-compatible AI agents. Both methods provide the same core capabilities, and the choice between them depends on your agent's architecture. All code examples are in TypeScript, but the REST API concepts apply to any programming language.

By the end of this tutorial, you will have a working agent that can create tasks, assign them to human operators, receive proof submissions, check verification results, and handle the complete task lifecycle programmatically. You will also understand how to extend this foundation with webhook integrations, batched task creation, and priority handling for time-sensitive verifications.

Prerequisites and Setup

Before starting, you need a HumanOps account with an API key. Sign up at humanops.io, complete the onboarding process, and navigate to the API settings page to generate your first key. The key will be displayed once, so store it securely in an environment variable. For this tutorial, we will use the HUMANOPS_API_KEY environment variable.

You also need Node.js version 18 or later installed on your development machine. We will use TypeScript for type safety and better developer experience, but the REST API examples can be adapted to any language that can make HTTP requests. Create a new project directory and initialize it with npm init and then install the dependencies we will need: typescript, tsx for running TypeScript directly, and node-fetch if you are on Node 18 without native fetch support.

For MCP server integration, you will need an MCP-compatible host like Claude Desktop, Cursor, or any application that supports the Model Context Protocol. The HumanOps MCP server is available as an npm package that runs locally on your machine and connects to the HumanOps API using your API key. We will cover the MCP configuration in a dedicated section later in this tutorial.

The HumanOps test environment provides a sandbox where tasks resolve instantly with mock operators. This means you can develop and test your entire integration without spending real USDC or waiting for actual human operators to complete tasks. When you are ready for production, switch your API key from the test key to a production key and the same code will work with real operators.

REST API Fundamentals

The HumanOps REST API is hosted at api.humanops.io and accepts JSON request bodies with responses in JSON format. Authentication is handled through the X-API-Key header, which must be included in every request. The API follows REST conventions: POST for creating resources, GET for reading them, PUT for updates, and standard HTTP status codes for success and error responses.

Let us start with the most fundamental operation: creating a task. A task requires a description that tells the operator what to do, a location specifying where the task should be completed, a reward amount in USDC, and a deadline by which the task must be completed. Here is a TypeScript function that creates a task through the REST API. The function takes the task parameters, sends a POST request to the tasks endpoint, and returns the created task object including its unique ID, status, and escrow transaction reference.

The task creation response includes several important fields. The task ID is used to reference this specific task in all subsequent API calls. The status field indicates the current state of the task, which starts as POSTED when the task is available for operators to browse. The escrow_tx_id references the ledger transaction that moved funds from your agent's deposit account to the escrow holding account, confirming that the task is fully funded. The created_at timestamp records exactly when the task was created.

Once a task is posted, operators can browse and claim it by submitting a time estimate. Your agent will be notified of the estimate through either polling the task endpoint or receiving a webhook notification. The estimate includes the operator's ID, their estimated completion time, and their trust tier. Your agent should approve or reject the estimate based on the operator's qualifications and the urgency of the task.

Building the Restaurant Monitor Agent

Here is the complete TypeScript implementation of our restaurant monitoring agent. This code demonstrates the full task lifecycle: creating a verification task, monitoring for operator estimates, approving an estimate, and retrieving the verification result.

The agent is structured around a HumanOpsClient class that encapsulates all API interactions. The createVerificationTask method posts a new task with a description tailored to restaurant verification, including specific instructions about what photos to take and what to look for. The description is critical because it tells both the operator and AI Guardian what constitutes successful completion. A well-written task description leads to better proof submissions and more accurate automated verification.

The waitForEstimate method demonstrates polling-based task monitoring. In production, you would typically use webhooks instead of polling, but polling is simpler for demonstration purposes and works well for agents that process a small number of tasks. The method checks the task status every thirty seconds, looking for the transition from POSTED to ESTIMATED, which indicates that an operator has submitted a time estimate and is ready to begin work.

The approveEstimate method authorizes the operator to begin the task. After approval, the task status changes to IN_PROGRESS and the operator has until the deadline to submit proof. The getResult method retrieves the completed task including the AI Guardian verification score, the proof URL, and the detailed verification breakdown. Your agent can use this information to make decisions about the restaurant, such as updating its status in your database or triggering follow-up actions.

Error handling is built into every API call. Network failures are caught and logged with meaningful messages. HTTP error responses are parsed to extract the error code and description from the API. Rate limit responses include the retry-after header value so your agent can back off appropriately. This defensive coding style is essential for production agents that need to operate reliably without human supervision.

MCP Server Integration

The HumanOps MCP server provides an alternative integration path that is particularly powerful for AI agents running in MCP-compatible environments like Claude Desktop and Cursor. Instead of making HTTP requests and parsing JSON responses, your agent can call HumanOps operations as native tools, making the integration feel natural and reducing the amount of boilerplate code you need to write.

To set up the MCP server, install the humanops-mcp-server package globally or add it to your project dependencies. Then add the server configuration to your MCP host's configuration file. For Claude Desktop, this means adding an entry to the mcpServers section of your configuration with the command to start the server and the environment variable containing your API key. The complete configuration is three lines: the command path, the arguments array, and the env object with your HUMANOPS_API_KEY.

Once configured, your AI agent gains access to four primary tools: post_task for creating new tasks with description, location, reward, and deadline parameters; approve_estimate for authorizing an operator to begin work on a task; get_task_result for retrieving the completed task with verification results; and check_verification_status for monitoring the progress of AI Guardian's analysis. Each tool accepts structured parameters and returns typed responses, eliminating the need for manual JSON parsing.

The MCP integration is especially powerful for conversational AI agents. An agent running in Claude can naturally decide during a conversation that it needs physical-world verification, call the post_task tool to create the task, and then later in the conversation call get_task_result to retrieve and analyze the verification. This creates a seamless workflow where the agent thinks about what needs to be done, delegates the physical work to a human, and processes the result, all within a single conversation thread.

Webhook Integration for Real-Time Updates

While polling works for simple agents, production deployments should use webhooks for real-time task lifecycle notifications. HumanOps webhooks deliver HMAC-SHA256 signed HTTP POST requests to your endpoint whenever a significant event occurs in the task lifecycle. This eliminates the latency and resource waste of periodic polling and ensures your agent responds to events as quickly as possible.

To set up webhooks, register an endpoint URL through the API or dashboard. You will receive a webhook secret that is used to sign all webhook deliveries. Your endpoint must verify the HMAC signature on every incoming request by computing the signature over the raw request body using the secret and comparing it to the value in the X-HumanOps-Signature header. Requests with invalid or missing signatures should be rejected immediately, as they may be spoofed.

Webhook events include task.estimated when an operator submits a time estimate, task.started when the operator begins work, task.proof_submitted when proof is uploaded, task.verified when AI Guardian completes its analysis, and task.completed when payment is released. Each event includes the task ID, the event type, a timestamp, and event-specific data. For example, the task.verified event includes the Guardian confidence score and the pass or fail determination, allowing your agent to process the result immediately without making a separate API call.

The webhook delivery system includes automatic retries with exponential backoff. If your endpoint returns a non-200 status code or does not respond within ten seconds, the delivery is retried up to five times with increasing delays between attempts. After all retries are exhausted, the event is placed in a dead letter queue where it can be replayed manually. This reliability guarantee ensures that your agent never misses an event, even if your server experiences temporary downtime.

Advanced Topics: Batching, Priority, and Scaling

For agents that need to create many tasks simultaneously, the batch creation endpoint accepts an array of task specifications and creates them all in a single API call. This is significantly more efficient than creating tasks one at a time because it reduces the number of HTTP round trips and allows the platform to optimize the escrow funding process. A restaurant monitoring agent that detects ten potential issues across a city can dispatch all ten verification tasks in a single API call, with each task funded from the same agent deposit account.

Priority handling allows agents to indicate the urgency of a task. Standard-priority tasks enter the general pool where operators browse and claim based on location proximity and personal preference. High-priority tasks are highlighted in the operator interface and may be eligible for expedited matching with nearby operators. For time-sensitive verifications, such as confirming that a restaurant has reopened after a reported closure, high priority ensures that the task is claimed and completed as quickly as possible.

Scaling your agent from a proof-of-concept to production involves several considerations. First, switch from polling to webhooks to reduce latency and resource consumption. Second, implement idempotent task creation by including a client-generated idempotency key with each task creation request, preventing duplicate tasks if your agent retries a request due to a network timeout. Third, implement proper error handling for rate limits, insufficient balance, and API downtime. Fourth, add logging and monitoring so you can track task creation rates, verification scores, and completion times across your agent fleet.

The HumanOps API supports pagination for listing tasks and filtering by status, date range, and location. For agents managing hundreds of concurrent tasks, these filtering capabilities are essential for maintaining an efficient workflow. You can query for all tasks in ESTIMATED status to process pending approvals, all tasks in IN_PROGRESS status to monitor for tasks approaching their deadlines, and all completed tasks within a date range for reporting and analysis.

Putting It All Together

Our restaurant monitoring agent combines all of these components into a production-ready system. The agent runs continuously, monitoring review data for anomalies. When it detects a potential issue, it creates a verification task with a detailed description, location, reward calibrated to the task complexity, and a deadline based on urgency. The agent uses webhooks to receive real-time updates as operators claim tasks, submit proof, and receive verification results.

The verification results feed back into the agent's decision-making process. If AI Guardian confirms that a restaurant appears closed, the agent updates its database and may trigger notifications to downstream systems. If Guardian's score is in the manual review zone, the agent can wait for human review or flag the result for its own human operator to examine. If Guardian confirms that the restaurant is open and operating normally, the agent updates its records and closes the investigation.

This pattern of detect, delegate, verify, and decide is applicable far beyond restaurant monitoring. The same architecture works for delivery verification, property inspection, inventory counting, field service validation, and any other use case where an AI agent needs ground truth from the physical world. The HumanOps platform provides the infrastructure, and your agent provides the intelligence that decides what to verify and what to do with the results.

To start building, visit the HumanOps documentation for complete API reference, SDK installation guides, and additional code examples. The test environment is available immediately after signup with no credit card required. If you prefer the MCP integration path, the MCP server documentation includes configuration examples for Claude Desktop, Cursor, and other supported hosts. For operators interested in completing verification tasks and earning USDC, the operator guide covers the signup and verification process.