Running Laravel AI SDK in Production: The Complete Guide

Laravel AI SDK provides a unified, expressive API for integrating AI providers into your Laravel application.

OpenAIAnthropicGeminiGroqxAIDeepSeekMistralOllamaOpenRouter

Beyond text, the SDK handles image generation, text-to-speech, speech-to-text, embeddings, reranking, file management and vector storage across providers including ElevenLabs, Cohere, Jina and VoyageAI.

Calling the model is easy.

Running Laravel AI in production is where most teams struggle.

AI requests are slower, more failure-prone and more expensive than standard CRUD operations. If you treat Laravel AI like a normal HTTP call, your system will eventually break under load.

This guide covers what actually matters when deploying Laravel AI into production: agent architecture, queues, streaming, broadcasting, structured output, failover, retries, cost control, conversation persistence, middleware, security, testing and operational guardrails.

1) The Agent Architecture

The fundamental building block of the Laravel AI SDK is the Agent. Rather than scattering AI calls across controllers and services, you define behaviour in a dedicated PHP class:

bash

php artisan make:agent SalesCoach

Each agent implements the Agent contract and uses the Promptable trait. Agents encapsulate instructions, conversation context, tools and output schemas:

php

use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Promptable;

class SalesCoach implements Agent
{
    use Promptable;

    public function instructions(): string
    {
        return 'You are a sales coach...';
    }
}

Calling the agent is a single line:

php

$response = (new SalesCoach)->prompt('Analyse this sales transcript...');

Agents can be configured with PHP attributes for provider, model, timeout, token limits, temperature and step limits:

php

use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\MaxSteps;
use Laravel\Ai\Attributes\MaxTokens;
use Laravel\Ai\Attributes\Temperature;
use Laravel\Ai\Attributes\Timeout;

#[Provider('anthropic')]
#[Model('claude-sonnet-4-5-20250929')]
#[MaxSteps(10)]
#[MaxTokens(4096)]
#[Temperature(0.7)]
#[Timeout(120)]
class SalesCoach implements Agent
{
    use Promptable;
}

For one-off calls without creating a class, the SDK provides anonymous agents:

php

use function Laravel\Ai\agent;

$response = agent(
    instructions: 'You are an expert at software development.',
)->prompt('Tell me about Laravel');

This agent-first architecture is the foundation of every production pattern that follows.

2) Request Timeouts in Production

In production environments, multiple timeout layers exist:

CDN or reverse proxy
Load balancer
Nginx or Apache
PHP-FPM worker limits
Livewire or Octane timeouts
Browser patience

Even if your local machine waits 60 seconds, production infrastructure often will not.

Laravel AI SDK supports explicit request timeouts through three mechanisms:

The #[Timeout(120)] attribute on the agent class
A timeout() method on the agent
A timeout parameter passed directly to prompt() or stream()

The default timeout is 60 seconds. If an AI call is not guaranteed to complete within your infrastructure's timeout chain, it should never run inside a standard web request.

php

$response = (new SalesCoach)->prompt(
    'Analyse this transcript...',
    timeout: 120,
);

Anything long-running in Laravel AI should be queued.

3) Queues Are Essential for Laravel AI

Laravel AI SDK provides first-class queue support. Agents can be dispatched to a queue with a single method call:

php

use Laravel\Ai\Responses\AgentResponse;
use Throwable;

(new SalesCoach)
    ->queue('Analyse this sales transcript...')
    ->then(function (AgentResponse $response) {
        // Handle completed response
    })
    ->catch(function (Throwable $e) {
        // Handle failure
    });

Under the hood, queue() dispatches an InvokeAgent job that implements ShouldQueue. The returned QueuedAgentResponse wraps Laravel's PendingDispatch, giving you access to all standard queue controls:

php

(new SalesCoach)
    ->queue('Analyse this transcript...')
    ->onQueue('ai')
    ->delay(now()->addSeconds(5))
    ->then(fn ($response) => /* handle */);

Queueing is not limited to text generation. Images, audio, transcriptions and embeddings all support queued execution:

php

Image::of('A donut on the kitchen counter')
    ->portrait()
    ->queue()
    ->then(fn ($image) => $image->store());

That is the baseline.

For a production-ready Laravel AI system, you also need:

A persisted job record for auditing and recovery
Clear job status tracking beyond Laravel's default queue monitoring
Idempotency safeguards to prevent duplicate AI calls on retry
Controlled retry logic with backoff (see Section 8)

The SDK provides the queue dispatch, callbacks and error handling. Job persistence, status tracking and idempotency are application-level concerns you build on top.

A resilient Laravel AI workflow typically looks like this:

User submits AI prompt
Application returns job ID immediately
Queue worker performs the AI call
UI updates when complete

This prevents AI workloads from blocking web workers and protects application stability.

Why a Dedicated AI Queue Matters

AI jobs are longer running, more memory intensive, more failure-prone, bursty under load and expensive per execution. If you mix them with email jobs, notification jobs, billing jobs, webhook handlers and standard background processing, you create a single point of contention.

During a provider slowdown:

AI jobs back up
Workers saturate
Other business-critical jobs starve

That's how you get cascading failures.

1. Separate Queue

Use a dedicated queue name:

php

(new SalesCoach)
    ->queue('Analyse this...')
    ->onQueue('ai');

Or configure a dedicated queue connection in config/queue.php.

2. Dedicated Workers

Run separate workers for AI workloads:

bash

php artisan queue:work --queue=ai --timeout=180 --memory=512

And separate workers for default jobs:

bash

php artisan queue:work --queue=default

That isolates AI latency from your core app behaviour.

3. Horizon Configuration

If using Horizon, define a supervisor specifically for AI jobs:

php

'supervisors' => [
    'ai-supervisor' => [
        'connection' => 'redis',
        'queue' => ['ai'],
        'balance' => 'auto',
        'maxProcesses' => 10,
        'timeout' => 180,
        'memory' => 512,
    ],
],

That lets you scale AI independently.

4. Timeout Alignment

AI jobs may need higher worker timeout, higher memory limit and lower concurrency. You don't want a 30-second worker timeout killing a 45-second generation job.

5. Rate-Limited Queues

If provider rate limits are aggressive, consider lower concurrency, Redis rate limiting or custom queue middleware.

Production Reality: Mixing AI jobs with core application queues is how production incidents start. Separate queues, dedicated workers and independent scaling are not optional they're operational requirements for resilient AI systems.

4) Streaming Responses in Laravel AI

Laravel AI SDK supports streaming responses via Server-Sent Events (SSE):

php

Route::get('/coach', function () {
    return (new SalesCoach)->stream('Analyse this sales transcript...');
});

The stream() method returns a StreamableAgentResponse that implements Laravel's Responsable interface. It automatically sets the correct Content-Type: text/event-stream headers and streams events as they arrive from the provider.

You can process the completed stream with a callback:

php

(new SalesCoach)
    ->stream('Analyse this transcript...')
    ->then(function (StreamedAgentResponse $response) {
        // $response->text
        // $response->events
        // $response->usage
    });

Or iterate over events manually:

php

$stream = (new SalesCoach)->stream('Analyse this transcript...');

foreach ($stream as $event) {
    // Process each StreamEvent
}

The SDK emits typed streaming events including TextDelta, ToolCall, ToolResult, ReasoningDelta, Citation and Error, giving you granular control over the streaming experience.

Vercel AI SDK Protocol

If your frontend uses Next.js or the Vercel AI SDK, Laravel AI supports the Vercel data protocol natively:

php

Route::get('/coach', function () {
    return (new SalesCoach)
        ->stream('Analyse this transcript...')
        ->usingVercelDataProtocol();
});

This maps streaming events to the Vercel protocol format (text-delta, tool-input-available, tool-output-available, finish) with the appropriate x-vercel-ai-ui-message-stream header.

Streaming improves perceived performance, but it does not improve resilience. Streaming still suffers from provider stalls, mid-stream failures, network drops and infrastructure buffering issues. Streaming is a delivery optimisation. Queue architecture is the resilience foundation.

5) Built-In Broadcasting for Real-Time Interfaces

If you are queueing Laravel AI jobs, polling for completion wastes resources.

The SDK provides native broadcasting support. You do not need to wire this up yourself.

Synchronous Broadcasting

Stream events and broadcast them to WebSocket channels simultaneously:

php

(new SalesCoach)->broadcast(
    'Analyse this transcript...',
    new Channel('analysis-results'),
);

For immediate dispatch without queuing the broadcast events:

php

(new SalesCoach)->broadcastNow(
    'Analyse this transcript...',
    new Channel('analysis-results'),
);

Queued Broadcasting

Run the AI call in a background job and broadcast each streaming event as it arrives:

php

(new SalesCoach)->broadcastOnQueue(
    'Analyse this transcript...',
    new Channel('analysis-results'),
);

This dispatches a BroadcastAgent job that streams the response and broadcasts each StreamEvent to the specified channels in real time. The frontend receives updates as the AI generates them, even though the work is running in a queue worker.

Broadcasting supports both public and private channels via Laravel's broadcasting system (Reverb, Pusher, Ably, or any compatible driver).

The production pattern is:

Queue job runs via broadcastOnQueue()
Worker streams the AI response
Each streaming event broadcasts to the channel
Frontend updates instantly via WebSocket

WebSockets improve user experience. They do not replace failover, retries or operational guardrails.

Need Help With Production AI Integration?

Our team specialises in building production-grade AI systems for Laravel and beyond. Talk to an expert today.

Talk to an Expert View AI Services

6) Structured Output and Validation

In production, free-form text responses are difficult to work with programmatically. Laravel AI SDK supports structured output via the HasStructuredOutput contract:

php

use Laravel\Ai\Contracts\HasStructuredOutput;
use Illuminate\Contracts\JsonSchema\JsonSchema;

class SalesCoach implements Agent, HasStructuredOutput
{
    use Promptable;

    public function schema(JsonSchema $schema): array
    {
        return [
            'feedback' => $schema->string()->required(),
            'score' => $schema->integer()->min(1)->max(10)->required(),
            'recommendations' => $schema->array()->items(
                $schema->string()
            ),
        ];
    }
}

The response is a StructuredAgentResponse accessible as a typed array:

php

$response = (new SalesCoach)->prompt('Analyse this transcript...');

$score = $response['score'];
$feedback = $response['feedback'];

How Schema Validation Actually Works

This is a critical distinction for production systems.

The SDK sends your schema to the AI provider as a constraint. For OpenAI, it enables strict mode by default ('schema' => ['strict' => true]). For Anthropic, it uses tool calling to enforce structure. The ObjectSchema also sets withoutAdditionalProperties() to prevent extra fields.

However, the SDK performs no server-side validation of the response. The StructuredAgentResponse stores the provider's decoded JSON directly in a $structured array. If the model returns malformed data, missing fields or incorrect types, the SDK will not catch it.

For production, you must validate structured output yourself:

php

$response = (new SalesCoach)->prompt('Analyse this transcript...');

$validated = validator($response->toArray(), [
    'feedback' => ['required', 'string'],
    'score' => ['required', 'integer', 'min:1', 'max:10'],
    'recommendations' => ['sometimes', 'array'],
    'recommendations.*' => ['string'],
])->validate();

Or use a Form Request, Data Transfer Object or similar pattern to enforce structure at the application boundary.

Tool Input Validation

The same applies to tool inputs. Tools define schemas via the schema() method, which constrains what the model sends. But the Tools\Request class that your handle() method receives is a plain array wrapper with no validation layer:

php

class SearchKnowledgeBase implements Tool
{
    public function handle(Request $request): string
    {
        // $request['query'] comes straight from the model
        // The schema told the model to send a string, but validate anyway

        $query = $request['query'];

        if (! is_string($query) || strlen($query) > 500) {
            return 'Invalid search query.';
        }

        return Document::search($query)->take(5)->get()->toJson();
    }

    public function schema(JsonSchema $schema): array
    {
        return [
            'query' => $schema->string()->required(),
        ];
    }
}

Production Validation Strategy

For production systems, treat AI output the same way you treat user input:

Structured responses: Validate with Laravel's Validator, Form Requests or DTOs before passing data downstream
Tool inputs: Validate and sanitise within handle() before executing any logic
Text responses: Sanitise before rendering to prevent XSS if displaying raw AI text in HTML
Type safety: Never assume the response array matches your schema check types before using values in calculations, database writes or API calls

Schemas reduce the likelihood of malformed output. Validation eliminates the risk.

7) Multi-Provider Failover

Laravel AI SDK supports automatic provider failover. Pass an array of providers and the SDK will cascade through them if a failure occurs:

php

$response = (new SalesCoach)->prompt(
    'Analyse this transcript...',
    provider: ['openai', 'anthropic'],
);

The failover mechanism works by catching FailoverableException instances. Two exception types trigger failover:

RateLimitedException the provider returned a 429 rate limit response
ProviderOverloadedException the provider returned a 5xx overload response

When a failover occurs, the SDK fires an AgentFailedOver event containing the agent, the failed provider, the model and the exception. This event is your hook for logging and alerting.

Failover works across all operations: text generation, image generation, audio synthesis, transcription, embeddings and reranking.

Failover strengthens resilience but introduces concerns you must account for:

Output consistency differences between providers
Structured schema compatibility variations
Model cost differences across providers
Different token limits and capabilities

Failover is a resilience mechanism, not a shortcut. Test your application against each provider in your failover chain.

8) Retries, Backoff and Circuit Breakers

AI providers fail in predictable ways:

429 rate limits
5xx outages
Slow timeouts

The Laravel AI SDK does not include built-in retry logic, exponential backoff or circuit breakers. The SDK provides failover (switching to a different provider) but does not retry the same provider.

For production systems, you need to build these patterns on top of the SDK.

Retries with Backoff

Use Laravel's retry() helper or implement retry logic in your queue jobs:

php

use Illuminate\Support\Facades\Retry;

$response = retry(3, function () {
    return (new SalesCoach)->prompt('Analyse this...');
}, function (int $attempt, Throwable $exception) {
    return $attempt * 2000; // Exponential backoff in milliseconds
});

For queued jobs, configure retries and backoff on the job class:

php

// In your queue job or via QueuedAgentResponse
(new SalesCoach)
    ->queue('Analyse this...')
    ->maxTries(3)
    ->backoff([10, 30, 60]);

Circuit Breakers

Implement circuit breaker protection using Laravel's cache to track provider health:

php

use Illuminate\Support\Facades\Cache;

$isHealthy = Cache::get('ai:provider:openai:healthy', true);

if (! $isHealthy) {
    // Route to fallback provider or return cached response
}

Listen for the AgentFailedOver event to update provider health status and implement half-open/closed circuit states.

What to Implement

A production-grade Laravel AI system should include:

Targeted retry logic that only retries on transient failures
Exponential backoff to avoid hammering a struggling provider
Circuit breaker protection to stop sending requests to a known-down provider
Provider health tracking via event listeners on AgentFailedOver and ProviderFailedOver

Without these, long-running AI requests can overwhelm your queue infrastructure during provider outages.

9) Conversation Persistence

Production chat applications need conversation history. The SDK provides the RemembersConversations trait for automatic database-backed persistence:

php

use Laravel\Ai\Concerns\RemembersConversations;
use Laravel\Ai\Contracts\Conversational;

class SalesCoach implements Agent, Conversational
{
    use Promptable, RemembersConversations;

    public function instructions(): string
    {
        return 'You are a sales coach...';
    }
}

Start a conversation for a user:

php

$response = (new SalesCoach)->forUser($user)->prompt('Hello!');
$conversationId = $response->conversationId;

Continue an existing conversation:

php

$response = (new SalesCoach)
    ->continue($conversationId, as: $user)
    ->prompt('Tell me more about that.');

The SDK automatically stores both user and assistant messages, including tool calls, tool results, usage data and metadata. Conversation titles are generated automatically using the provider's cheapest model.

For production, consider:

Message retention policies
Conversation archival
PII redaction before storage
Access controls on conversation retrieval

10) Middleware for Cross-Cutting Concerns

The SDK provides a middleware system for intercepting agent prompts and responses. This is the intended extension point for logging, cost tracking, rate limiting and security:

php

use Laravel\Ai\Contracts\HasMiddleware;

class SalesCoach implements Agent, HasMiddleware
{
    use Promptable;

    public function middleware(): array
    {
        return [
            new LogPrompts,
            new EnforceCostLimits,
            new SanitiseInput,
        ];
    }
}

A middleware receives the AgentPrompt and a $next closure:

php

use Laravel\Ai\Prompts\AgentPrompt;
use Laravel\Ai\Responses\AgentResponse;

class LogPrompts
{
    public function handle(AgentPrompt $prompt, Closure $next)
    {
        Log::info('Prompting agent', ['prompt' => $prompt->prompt]);

        return $next($prompt)->then(function (AgentResponse $response) {
            Log::info('Agent responded', [
                'text' => $response->text,
                'usage' => $response->usage->toArray(),
            ]);
        });
    }
}

Middleware is the right place to implement:

Request and response logging
Token usage tracking and alerting
Per-user rate limiting
Input sanitisation
Cost enforcement
Audit trails

Need Help With Production AI Integration?

Our team specialises in building production-grade AI systems for Laravel and beyond. Talk to an expert today.

Talk to an Expert View AI Services

11) Tools, Function Calling and RAG

Agents can call tools inside your Laravel application via the HasTools contract:

php

use Laravel\Ai\Contracts\HasTools;

class SalesCoach implements Agent, HasTools
{
    use Promptable;

    public function tools(): iterable
    {
        return [
            new SearchKnowledgeBase,
            new GetCustomerHistory,
        ];
    }
}

Tools are created with Artisan and define their input schema using JsonSchema:

bash

php artisan make:tool SearchKnowledgeBase

php

use Laravel\Ai\Contracts\Tool;
use Laravel\Ai\Tools\Request;

class SearchKnowledgeBase implements Tool
{
    public function description(): string
    {
        return 'Search the knowledge base for relevant documents.';
    }

    public function handle(Request $request): string
    {
        return Document::search($request['query'])->take(5)->get()->toJson();
    }

    public function schema(JsonSchema $schema): array
    {
        return [
            'query' => $schema->string()->required(),
        ];
    }
}

Provider Tools

The SDK includes built-in provider tools:

WebSearch web search via Anthropic, OpenAI or Gemini
WebFetch fetch web pages via Anthropic or Gemini
FileSearch search uploaded files via OpenAI or Gemini
SimilaritySearch vector similarity search for RAG

RAG with Vector Support

The SDK provides native vector storage on PostgreSQL with pgvector:

php

$documents = Document::query()
    ->whereVectorSimilarTo('embedding', 'best wineries in Napa Valley')
    ->limit(10)
    ->get();

The SimilaritySearch tool integrates directly with agents for retrieval-augmented generation:

php

use Laravel\Ai\Tools\SimilaritySearch;

public function tools(): iterable
{
    return [
        SimilaritySearch::usingModel(
            model: Document::class,
            column: 'embedding',
            minSimilarity: 0.7,
            limit: 10,
        ),
    ];
}

12) Prompt Injection and Tool Security

If you are using Laravel AI agents or tools:

Treat tool input as untrusted the model controls what arguments are passed. The Tools\Request class is a plain array wrapper with no validation layer. Schema definitions constrain the model, but do not enforce constraints server-side (see Section 6)
Validate all arguments server-side tool schemas tell the provider what to send, but your handle() method receives raw arguments. Validate types, lengths, ranges and business rules within the handler
Restrict tool execution scope tools should never have broader permissions than the user who triggered the agent. Pass user context into tools and enforce authorisation
Log tool invocation events the SDK fires InvokingTool (before execution) and ToolInvoked (after execution) events for every tool call. Listen to these for audit trails
Sanitise tool output tool results are sent back to the model as context. If a tool returns user-generated content, it could influence the model's subsequent behaviour (indirect prompt injection)

Model output should never directly execute privileged operations. Sanitise, validate and authorise before acting on any tool request.

A secure tool implementation follows this pattern:

php

class TransferFunds implements Tool
{
    public function __construct(private User $user) {}

    public function handle(Request $request): string
    {
        // Validate input types and bounds
        $amount = $request['amount'];
        if (! is_numeric($amount) || $amount <= 0 || $amount > 10000) {
            return 'Invalid transfer amount.';
        }

        // Authorise the action against the current user
        if (! $this->user->can('transfer', $request['account_id'])) {
            return 'Unauthorised.';
        }

        // Execute with validated, authorised parameters
        return TransferService::execute(
            from: $this->user->account,
            to: $request['account_id'],
            amount: $amount,
        );
    }

    public function schema(JsonSchema $schema): array
    {
        return [
            'account_id' => $schema->string()->required(),
            'amount' => $schema->number()->min(0.01)->max(10000)->required(),
        ];
    }
}

The schema reduces the chance of bad input. The handle() method eliminates the risk.

13) Cost Control in Laravel AI Systems

The SDK exposes detailed token usage metrics on every response via the Usage class:

promptTokens input tokens consumed
completionTokens output tokens generated
cacheWriteInputTokens tokens written to prompt cache
cacheReadInputTokens tokens read from prompt cache
reasoningTokens tokens used for extended thinking

php

$response = (new SalesCoach)->prompt('Analyse this...');

$usage = $response->usage;
$totalTokens = $usage->promptTokens + $usage->completionTokens;

Output Limits

Control output size with the #[MaxTokens] attribute:

php

#[MaxTokens(4096)]
class SalesCoach implements Agent { }

Control agent iteration depth with #[MaxSteps]:

php

#[MaxSteps(5)]
class SalesCoach implements Agent { }

Smart Model Selection

The SDK provides attributes to automatically select cost-appropriate models per provider:

php

#[UseCheapestModel]
class SimpleSummariser implements Agent { }

#[UseSmartestModel]
class ComplexReasoner implements Agent { }

Each provider defines its own cheapestTextModel(), defaultTextModel() and smartestTextModel(). This allows cost-tiered agent design without hardcoding model names.

What You Need to Build

The SDK provides the data and controls. Production cost governance requires application-level implementation:

Per-user or per-tenant usage caps
Token budgeting and alerting thresholds
Usage logging via middleware or event listeners on AgentPrompted
Guardrails against abuse (input length limits, request frequency caps)
Model selection based on workload cost sensitivity

AI systems without cost controls become financial risks.

14) Observability in Laravel AI Production

The SDK fires events throughout the AI lifecycle. These are your hooks for monitoring, logging and alerting:

Event	When It Fires
PromptingAgent	Before an agent prompt is sent
AgentPrompted	After a response is received (includes prompt, response, usage)
StreamingAgent	Before a streaming prompt begins
AgentStreamed	After a stream completes
AgentFailedOver	When a provider fails and the SDK switches to the next
ProviderFailedOver	When any provider operation fails over
InvokingTool	Before a tool is called
ToolInvoked	After a tool completes
GeneratingImage	Before image generation
ImageGenerated	After image generation
GeneratingAudio	Before audio synthesis
AudioGenerated	After audio synthesis
GeneratingTranscription	Before transcription
TranscriptionGenerated	After transcription
GeneratingEmbeddings	Before embeddings generation
EmbeddingsGenerated	After embeddings generation
Reranking / Reranked	Before and after reranking

Use these events with Laravel's event listeners to track:

Provider and model per request
Latency (calculate from PromptingAgent to AgentPrompted)
Token usage per agent, user or tenant
Failure categories and failover frequency
Tool invocation patterns
Queue wait time and execution duration

Without observability, production Laravel AI becomes guesswork.

15) Multimodal Capabilities

The SDK is not limited to text generation. Production applications can leverage:

Image Generation

php

use Laravel\Ai\Image;

$image = Image::of('A product photo with white background')
    ->quality('high')
    ->landscape()
    ->timeout(120)
    ->generate();

$path = $image->store();

Supported providers: OpenAI, Gemini, xAI. Supports reference images, queued generation and failover.

Audio Synthesis (Text-to-Speech)

php

use Laravel\Ai\Audio;

$audio = Audio::of('Welcome to our platform.')
    ->female()
    ->generate();

$path = $audio->store();

Supported providers: OpenAI, ElevenLabs. Supports custom voices and instructions.

Transcription (Speech-to-Text)

php

use Laravel\Ai\Transcription;

$transcript = Transcription::fromStorage('meeting.mp3')
    ->diarize()
    ->generate();

Supported providers: OpenAI, ElevenLabs, Mistral. Supports diarisation for speaker identification.

Embeddings

php

use Laravel\Ai\Embeddings;

$response = Embeddings::for([
    'First document content.',
    'Second document content.',
])->generate();

Supported providers: OpenAI, Gemini, Cohere, Mistral, Jina, VoyageAI. Supports caching with configurable store and TTL.

Reranking

php

use Laravel\Ai\Reranking;

$response = Reranking::of($searchResults)
    ->limit(5)
    ->rerank('PHP frameworks');

Supported providers: Cohere, Jina. Also supports Eloquent collection reranking.

All multimodal operations support queueing, failover and events.

16) Custom Endpoints and Proxy Configuration

In production environments you may need to route AI requests through a proxy, API gateway or centralised key management service.

The SDK supports custom base URLs per provider:

php

// config/ai.php
'providers' => [
    'openai' => [
        'driver' => 'openai',
        'key' => env('OPENAI_API_KEY'),
        'url' => env('OPENAI_BASE_URL'),
    ],
    'anthropic' => [
        'driver' => 'anthropic',
        'key' => env('ANTHROPIC_API_KEY'),
        'url' => env('ANTHROPIC_BASE_URL'),
    ],
],

Custom endpoints are supported for OpenAI, Anthropic, Gemini, Groq, Cohere, DeepSeek, xAI and OpenRouter.

17) Testing Laravel AI in Production CI/CD

The SDK provides comprehensive testing support. Every feature can be faked and asserted against:

Faking Agents

php

use App\Ai\Agents\SalesCoach;

SalesCoach::fake([
    'First response',
    'Second response',
]);

// Run your application code...

SalesCoach::assertPrompted('Analyse this...');
SalesCoach::assertQueued(fn ($prompt) => $prompt->contains('transcript'));
SalesCoach::assertNeverPrompted();

Faking Other Operations

php

Image::fake();
Audio::fake();
Transcription::fake();
Embeddings::fake();
Reranking::fake();
Files::fake();
Stores::fake();

Each fake supports custom responses, closures for dynamic responses, assertions for verification and preventStray*() methods to catch unexpected calls.

Production CI/CD pipelines should fake all AI operations to ensure tests are deterministic, fast and free from provider dependencies.

18) Security, Cost and Operational Guardrails

Reliable Laravel AI systems require governance across multiple dimensions.

Cost Control

Implement:

Per-user and per-tenant limits via middleware
Token caps using #[MaxTokens] and #[MaxSteps] attributes
Usage monitoring via AgentPrompted event listeners
Model selection using #[UseCheapestModel] for cost-sensitive workloads
Budget alerting when usage approaches thresholds

Data Governance

Define:

What AI conversations are stored (via RemembersConversations or custom storage)
Retention periods for conversation history
Redaction policies for PII in prompts and responses
Encryption and access controls on the agent_conversations and agent_conversation_messages tables

Boundary Rate Limiting

Protect your AI endpoints from abuse using Laravel's rate limiting and middleware:

Throttle AI endpoints at the HTTP layer
Implement per-user request limits in agent middleware
Use queue rate limiting to control provider API usage

Worker Scaling and Queue Health

Monitor:

Queue backlog growth
Worker concurrency
Memory usage per worker
Execution time distribution

AI jobs are longer-lived than typical background jobs. A text generation job may run for 30-120 seconds. Infrastructure must reflect that reality. Configure appropriate timeouts on your queue workers and consider dedicated queues for AI workloads.

Structured Output and Tool Input Validation

The SDK's schemas constrain provider output but perform no server-side validation (see Section 6). For production:

Validate all structured agent responses with Laravel's Validator before using the data
Validate all tool Request inputs inside handle() before executing logic
Sanitise text responses before rendering in HTML to prevent XSS
Never assume types match the schema check before database writes, calculations or API calls
Log validation failures as a signal of model misbehaviour or prompt injection attempts

The Laravel AI Production Architecture

A resilient Laravel AI architecture uses what the SDK provides and supplements what it does not:

SDK provides:

Agent class architecture with Promptable trait
Queue dispatch via queue() and broadcastOnQueue()
Streaming via SSE and Vercel AI SDK protocol
Native WebSocket broadcasting
Provider failover on rate limits and overloads
Structured output with provider-side schema constraints
Conversation persistence with RemembersConversations
Middleware pipeline for cross-cutting concerns
Token usage metrics on every response
Event system across all operations
Comprehensive testing and faking support
Multimodal support across 13 providers

You build on top:

Retry logic with exponential backoff
Circuit breaker protection
Provider health tracking
Per-user cost caps and budgeting
Rate limiting at application and queue layers
Observability dashboards and alerting
Data retention and redaction policies
Input sanitisation and abuse prevention
Server-side validation of all structured output and tool inputs

Queues are table stakes. Resilience is the differentiator.

Building Production-Grade Laravel AI

Most teams stop at "it works locally".

Production Laravel AI requires engineering discipline across:

Agent architecture and configuration
Infrastructure and queue management
Cost control and model selection
Security, tool validation and input sanitisation
Server-side validation of structured output and tool inputs
Observability, logging and alerting
Operational scaling and worker management
Testing and CI/CD integration
Conversation persistence and data governance

The SDK gives you a strong foundation. Production readiness is what you build on top of it.

At Delaney Industries, we design and implement production-grade Laravel AI systems built for reliability, scalability and operational clarity.

If you are integrating AI into a live Laravel application:

Talk to an Expert View AI Integration Services

Delaney Wright

Director, Delaney Industries

Delaney Wright is the Director of Delaney Industries, a software development company based in Sleaford, Lincolnshire. Specialising in web development, AI integration, web applications and process automation for businesses across the UK.

Web Development

Software Development

Business Services

Creative Services

Running Laravel AI SDK in Production: The Complete Guide

1) The Agent Architecture

2) Request Timeouts in Production

3) Queues Are Essential for Laravel AI

Why a Dedicated AI Queue Matters

1. Separate Queue

2. Dedicated Workers

3. Horizon Configuration

4. Timeout Alignment

5. Rate-Limited Queues

4) Streaming Responses in Laravel AI

Vercel AI SDK Protocol

5) Built-In Broadcasting for Real-Time Interfaces

Synchronous Broadcasting

Queued Broadcasting

Need Help With Production AI Integration?

6) Structured Output and Validation

How Schema Validation Actually Works

Tool Input Validation

Production Validation Strategy

7) Multi-Provider Failover

8) Retries, Backoff and Circuit Breakers

Retries with Backoff

Circuit Breakers

What to Implement

9) Conversation Persistence

10) Middleware for Cross-Cutting Concerns

Need Help With Production AI Integration?

11) Tools, Function Calling and RAG

Provider Tools

RAG with Vector Support

12) Prompt Injection and Tool Security

13) Cost Control in Laravel AI Systems

Output Limits

Smart Model Selection

What You Need to Build

14) Observability in Laravel AI Production

15) Multimodal Capabilities

Image Generation

Audio Synthesis (Text-to-Speech)

Transcription (Speech-to-Text)

Embeddings

Reranking

16) Custom Endpoints and Proxy Configuration

17) Testing Laravel AI in Production CI/CD

Faking Agents

Faking Other Operations

18) Security, Cost and Operational Guardrails

Cost Control

Data Governance

Boundary Rate Limiting

Worker Scaling and Queue Health

Structured Output and Tool Input Validation

The Laravel AI Production Architecture

SDK provides:

You build on top:

Building Production-Grade Laravel AI

Delaney Wright

Related Services

AI Integration

Web Applications

Process Automation

Ready to Build Production AI?