Runpod’s endpoint operations allow you to control the complete lifecycle of your Serverless workloads. This guide demonstrates how to submit, monitor, manage, and retrieve results from jobs running on your Serverless endpoints.
/run
: Submit an asynchronous job that processes in the background while you receive an immediate job ID./runsync
: Submit a synchronous job and wait for the complete results in a single response./status
: Check the current status, execution details, and results of a previously submitted job./stream
: Receive incremental results from a job as they become available./cancel
: Stop a job that is in progress or waiting in the queue./retry
: Requeue a failed or timed-out job using the same job ID and input parameters./purge-queue
: Clear all pending jobs from the queue without affecting jobs already in progress./health
: Monitor the operational status of your endpoint, including worker and job statistics.Runpod offers two primary methods for submitting jobs, each suited for different use cases.
/run
)Use asynchronous jobs for longer-running tasks that don’t require immediate results. This approach returns immediately with a job ID and then processes the job in the background. This approach is particularly useful for operations that require significant processing time, or when you want to manage multiple jobs concurrently.
/runsync
)Use synchronous jobs for shorter tasks where you need immediate results. Synchronous jobs waits for job completion before returning the complete result in a single response. This simplifies your code by eliminating the need for status polling, which works best for quick operations (under 30 seconds).
/status
)For asynchronous jobs, you can check the status at any time using the job ID. The status endpoint provides:
IN_QUEUE
, IN_PROGRESS
, COMPLETED
, FAILED
, etc.).You can use the /status
operation to configure the time-to-live (TTL) for an individual job by appending a TTL parameter when checking the status of a job. For example, https://api.runpod.ai/v2/{endpoint_id}/status/{job_id}?ttl=6000
sets the TTL for the job to 6 seconds. Use this when you want to tell the system to remove a job result sooner than the default retention time.
/stream
)For jobs that generate output incrementally or for very large outputs, use the stream endpoint to receive partial results as they become available. This is especially useful for:
The maximum size for a single streamed payload chunk is 1 MB. Larger outputs will be split across multiple chunks.
/health
)The health endpoint provides a quick overview of your endpoint’s operational status. Use it to monitor worker availability, track job queue status, identify potential bottlenecks, and determine if scaling adjustments are needed.
/cancel
)Cancel jobs that are no longer needed or taking too long to complete. This operation stops jobs that are in progress, removes jobs from the queue if they are not yet started, and returns immediately with the job’s canceled status.
/retry
)Retry jobs that have failed or timed out without having to submit a new job request. This operation maintains the same job ID for tracking and requeues the job with the original input parameters, removing the previous output (if any). It can only be used for jobs with a FAILED
or TIMED_OUT
status.
Job results expire after a set period:
/run
): Results available for 30 minutes/runsync
): Results available for 1 minuteOnce expired, jobs cannot be retried.
/purge-queue
)Clear all pending jobs from the queue when you need to reset or cancel multiple jobs at once. This is useful for error recovery, clearing outdated requests, resetting after configuration changes, and managing resource allocation.
The purge-queue operation only affects jobs waiting in the queue. Jobs already in progress will continue to run.
Runpod enforces rate limits to ensure fair platform usage. These limits apply per endpoint and operation:
Operation | Method | Rate Limit | Concurrent Limit |
---|---|---|---|
/run | POST | 1000 requests per 10 seconds | 200 concurrent |
/runsync | POST | 2000 requests per 10 seconds | 400 concurrent |
/status , /status-sync , /stream | GET/POST | 2000 requests per 10 seconds | 400 concurrent |
/cancel | POST | 100 requests per 10 seconds | 20 concurrent |
/purge-queue | POST | 2 requests per 10 seconds | N/A |
/openai/* | POST | 2000 requests per 10 seconds | 400 concurrent |
/requests | GET | 10 requests per 10 seconds | 2 concurrent |
Requests will receive a 429 (Too Many Requests)
status if:
endpoint.WorkersMax * 500
Exceeding these limits will result in HTTP 429 (Too Many Requests) responses. Implement appropriate retry logic with exponential backoff in your applications to handle rate limiting gracefully.
Issue | Possible Causes | Solutions |
---|---|---|
Job stuck in queue | No available workers, max workers limit reached | Increase max workers, check endpoint health |
Timeout errors | Job takes longer than execution timeout | Increase timeout in job policy, optimize job processing |
Failed jobs | Worker errors, input validation issues | Check logs, verify input format, retry with fixed input |
Rate limiting | Too many requests in short time | Implement backoff strategy, batch requests when possible |
Missing results | Results expired | Retrieve results within expiration window (30 min for async, 1 min for sync) |