\n
Running Laravel AI SDK in Production - Delaney Industries
\n Back to Blog
AI IntegrationProduction Guide

Running Laravel AI SDK in Production: The Complete Guide

Delaney WrightDirector, Delaney Industries11 June 202525 min read

Laravel AI SDK provides a unified, expressive API for integrating AI providers into your Laravel application.

OpenAIAnthropicGeminiGroqxAIDeepSeekMistralOllamaOpenRouter

Beyond text, the SDK handles image generation, text-to-speech, speech-to-text, embeddings, reranking, file management and vector storage across providers including ElevenLabs, Cohere, Jina and VoyageAI.

Calling the model is easy.

Running Laravel AI in production is where most teams struggle.

AI requests are slower, more failure-prone and more expensive than standard CRUD operations. If you treat Laravel AI like a normal HTTP call, your system will eventually break under load.

This guide covers what actually matters when deploying Laravel AI into production: agent architecture, queues, streaming, broadcasting, structured output, failover, retries, cost control, conversation persistence, middleware, security, testing and operational guardrails.

1) The Agent Architecture

The fundamental building block of the Laravel AI SDK is the Agent. Rather than scattering AI calls across controllers and services, you define behaviour in a dedicated PHP class:

bash
php artisan make:agent SalesCoach

Each agent implements the Agent contract and uses the Promptable trait. Agents encapsulate instructions, conversation context, tools and output schemas:

php
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Promptable;

class SalesCoach implements Agent
{
    use Promptable;

    public function instructions(): string
    {
        return 'You are a sales coach...';
    }
}

Calling the agent is a single line:

php
$response = (new SalesCoach)->prompt('Analyse this sales transcript...');

Agents can be configured with PHP attributes for provider, model, timeout, token limits, temperature and step limits:

php
use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\MaxSteps;
use Laravel\Ai\Attributes\MaxTokens;
use Laravel\Ai\Attributes\Temperature;
use Laravel\Ai\Attributes\Timeout;

#[Provider('anthropic')]
#[Model('claude-sonnet-4-5-20250929')]
#[MaxSteps(10)]
#[MaxTokens(4096)]
#[Temperature(0.7)]
#[Timeout(120)]
class SalesCoach implements Agent
{
    use Promptable;
}

For one-off calls without creating a class, the SDK provides anonymous agents:

php
use function Laravel\Ai\agent;

$response = agent(
    instructions: 'You are an expert at software development.',
)->prompt('Tell me about Laravel');

This agent-first architecture is the foundation of every production pattern that follows.

2) Request Timeouts in Production

In production environments, multiple timeout layers exist:

  • CDN or reverse proxy
  • Load balancer
  • Nginx or Apache
  • PHP-FPM worker limits
  • Livewire or Octane timeouts
  • Browser patience

Even if your local machine waits 60 seconds, production infrastructure often will not.

Laravel AI SDK supports explicit request timeouts through three mechanisms:

  1. The #[Timeout(120)] attribute on the agent class
  2. A timeout() method on the agent
  3. A timeout parameter passed directly to prompt() or stream()

The default timeout is 60 seconds. If an AI call is not guaranteed to complete within your infrastructure's timeout chain, it should never run inside a standard web request.

php
$response = (new SalesCoach)->prompt(
    'Analyse this transcript...',
    timeout: 120,
);

Anything long-running in Laravel AI should be queued.

3) Queues Are Essential for Laravel AI

Laravel AI SDK provides first-class queue support. Agents can be dispatched to a queue with a single method call:

php
use Laravel\Ai\Responses\AgentResponse;
use Throwable;

(new SalesCoach)
    ->queue('Analyse this sales transcript...')
    ->then(function (AgentResponse $response) {
        // Handle completed response
    })
    ->catch(function (Throwable $e) {
        // Handle failure
    });

Under the hood, queue() dispatches an InvokeAgent job that implements ShouldQueue. The returned QueuedAgentResponse wraps Laravel's PendingDispatch, giving you access to all standard queue controls:

php
(new SalesCoach)
    ->queue('Analyse this transcript...')
    ->onQueue('ai')
    ->delay(now()->addSeconds(5))
    ->then(fn ($response) => /* handle */);

Queueing is not limited to text generation. Images, audio, transcriptions and embeddings all support queued execution:

php
Image::of('A donut on the kitchen counter')
    ->portrait()
    ->queue()
    ->then(fn ($image) => $image->store());

That is the baseline.

For a production-ready Laravel AI system, you also need:

  • A persisted job record for auditing and recovery
  • Clear job status tracking beyond Laravel's default queue monitoring
  • Idempotency safeguards to prevent duplicate AI calls on retry
  • Controlled retry logic with backoff (see Section 8)

The SDK provides the queue dispatch, callbacks and error handling. Job persistence, status tracking and idempotency are application-level concerns you build on top.

A resilient Laravel AI workflow typically looks like this:

  1. User submits AI prompt
  2. Application returns job ID immediately
  3. Queue worker performs the AI call
  4. UI updates when complete

This prevents AI workloads from blocking web workers and protects application stability.

Why a Dedicated AI Queue Matters

AI jobs are longer running, more memory intensive, more failure-prone, bursty under load and expensive per execution. If you mix them with email jobs, notification jobs, billing jobs, webhook handlers and standard background processing, you create a single point of contention.

During a provider slowdown:

  • AI jobs back up
  • Workers saturate
  • Other business-critical jobs starve

That's how you get cascading failures.

1. Separate Queue

Use a dedicated queue name:

php
(new SalesCoach)
    ->queue('Analyse this...')
    ->onQueue('ai');

Or configure a dedicated queue connection in config/queue.php.

2. Dedicated Workers

Run separate workers for AI workloads:

bash
php artisan queue:work --queue=ai --timeout=180 --memory=512

And separate workers for default jobs:

bash
php artisan queue:work --queue=default

That isolates AI latency from your core app behaviour.

3. Horizon Configuration

If using Horizon, define a supervisor specifically for AI jobs:

php
'supervisors' => [
    'ai-supervisor' => [
        'connection' => 'redis',
        'queue' => ['ai'],
        'balance' => 'auto',
        'maxProcesses' => 10,
        'timeout' => 180,
        'memory' => 512,
    ],
],

That lets you scale AI independently.

4. Timeout Alignment

AI jobs may need higher worker timeout, higher memory limit and lower concurrency. You don't want a 30-second worker timeout killing a 45-second generation job.

5. Rate-Limited Queues

If provider rate limits are aggressive, consider lower concurrency, Redis rate limiting or custom queue middleware.

Production Reality: Mixing AI jobs with core application queues is how production incidents start. Separate queues, dedicated workers and independent scaling are not optional they're operational requirements for resilient AI systems.

4) Streaming Responses in Laravel AI

Laravel AI SDK supports streaming responses via Server-Sent Events (SSE):

php
Route::get('/coach', function () {
    return (new SalesCoach)->stream('Analyse this sales transcript...');
});

The stream() method returns a StreamableAgentResponse that implements Laravel's Responsable interface. It automatically sets the correct Content-Type: text/event-stream headers and streams events as they arrive from the provider.

You can process the completed stream with a callback:

php
(new SalesCoach)
    ->stream('Analyse this transcript...')
    ->then(function (StreamedAgentResponse $response) {
        // $response->text
        // $response->events
        // $response->usage
    });

Or iterate over events manually:

php
$stream = (new SalesCoach)->stream('Analyse this transcript...');

foreach ($stream as $event) {
    // Process each StreamEvent
}

The SDK emits typed streaming events including TextDelta, ToolCall, ToolResult, ReasoningDelta, Citation and Error, giving you granular control over the streaming experience.

Vercel AI SDK Protocol

If your frontend uses Next.js or the Vercel AI SDK, Laravel AI supports the Vercel data protocol natively:

php
Route::get('/coach', function () {
    return (new SalesCoach)
        ->stream('Analyse this transcript...')
        ->usingVercelDataProtocol();
});

This maps streaming events to the Vercel protocol format (text-delta, tool-input-available, tool-output-available, finish) with the appropriate x-vercel-ai-ui-message-stream header.

Streaming improves perceived performance, but it does not improve resilience. Streaming still suffers from provider stalls, mid-stream failures, network drops and infrastructure buffering issues. Streaming is a delivery optimisation. Queue architecture is the resilience foundation.

5) Built-In Broadcasting for Real-Time Interfaces

If you are queueing Laravel AI jobs, polling for completion wastes resources.

The SDK provides native broadcasting support. You do not need to wire this up yourself.

Synchronous Broadcasting

Stream events and broadcast them to WebSocket channels simultaneously:

php
(new SalesCoach)->broadcast(
    'Analyse this transcript...',
    new Channel('analysis-results'),
);

For immediate dispatch without queuing the broadcast events:

php
(new SalesCoach)->broadcastNow(
    'Analyse this transcript...',
    new Channel('analysis-results'),
);

Queued Broadcasting

Run the AI call in a background job and broadcast each streaming event as it arrives:

php
(new SalesCoach)->broadcastOnQueue(
    'Analyse this transcript...',
    new Channel('analysis-results'),
);

This dispatches a BroadcastAgent job that streams the response and broadcasts each StreamEvent to the specified channels in real time. The frontend receives updates as the AI generates them, even though the work is running in a queue worker.

Broadcasting supports both public and private channels via Laravel's broadcasting system (Reverb, Pusher, Ably, or any compatible driver).

The production pattern is:

  1. Queue job runs via broadcastOnQueue()
  2. Worker streams the AI response
  3. Each streaming event broadcasts to the channel
  4. Frontend updates instantly via WebSocket

WebSockets improve user experience. They do not replace failover, retries or operational guardrails.

Need Help With Production AI Integration?

Our team specialises in building production-grade AI systems for Laravel and beyond. Talk to an expert today.

6) Structured Output and Validation

In production, free-form text responses are difficult to work with programmatically. Laravel AI SDK supports structured output via the HasStructuredOutput contract:

php
use Laravel\Ai\Contracts\HasStructuredOutput;
use Illuminate\Contracts\JsonSchema\JsonSchema;

class SalesCoach implements Agent, HasStructuredOutput
{
    use Promptable;

    public function schema(JsonSchema $schema): array
    {
        return [
            'feedback' => $schema->string()->required(),
            'score' => $schema->integer()->min(1)->max(10)->required(),
            'recommendations' => $schema->array()->items(
                $schema->string()
            ),
        ];
    }
}

The response is a StructuredAgentResponse accessible as a typed array:

php
$response = (new SalesCoach)->prompt('Analyse this transcript...');

$score = $response['score'];
$feedback = $response['feedback'];

How Schema Validation Actually Works

This is a critical distinction for production systems.

The SDK sends your schema to the AI provider as a constraint. For OpenAI, it enables strict mode by default ('schema' => ['strict' => true]). For Anthropic, it uses tool calling to enforce structure. The ObjectSchema also sets withoutAdditionalProperties() to prevent extra fields.

However, the SDK performs no server-side validation of the response. The StructuredAgentResponse stores the provider's decoded JSON directly in a $structured array. If the model returns malformed data, missing fields or incorrect types, the SDK will not catch it.

For production, you must validate structured output yourself:

php
$response = (new SalesCoach)->prompt('Analyse this transcript...');

$validated = validator($response->toArray(), [
    'feedback' => ['required', 'string'],
    'score' => ['required', 'integer', 'min:1', 'max:10'],
    'recommendations' => ['sometimes', 'array'],
    'recommendations.*' => ['string'],
])->validate();

Or use a Form Request, Data Transfer Object or similar pattern to enforce structure at the application boundary.

Tool Input Validation

The same applies to tool inputs. Tools define schemas via the schema() method, which constrains what the model sends. But the Tools\Request class that your handle() method receives is a plain array wrapper with no validation layer:

php
class SearchKnowledgeBase implements Tool
{
    public function handle(Request $request): string
    {
        // $request['query'] comes straight from the model
        // The schema told the model to send a string, but validate anyway

        $query = $request['query'];

        if (! is_string($query) || strlen($query) > 500) {
            return 'Invalid search query.';
        }

        return Document::search($query)->take(5)->get()->toJson();
    }

    public function schema(JsonSchema $schema): array
    {
        return [
            'query' => $schema->string()->required(),
        ];
    }
}

Production Validation Strategy

For production systems, treat AI output the same way you treat user input:

  • Structured responses: Validate with Laravel's Validator, Form Requests or DTOs before passing data downstream
  • Tool inputs: Validate and sanitise within handle() before executing any logic
  • Text responses: Sanitise before rendering to prevent XSS if displaying raw AI text in HTML
  • Type safety: Never assume the response array matches your schema check types before using values in calculations, database writes or API calls

Schemas reduce the likelihood of malformed output. Validation eliminates the risk.

7) Multi-Provider Failover

Laravel AI SDK supports automatic provider failover. Pass an array of providers and the SDK will cascade through them if a failure occurs:

php
$response = (new SalesCoach)->prompt(
    'Analyse this transcript...',
    provider: ['openai', 'anthropic'],
);

The failover mechanism works by catching FailoverableException instances. Two exception types trigger failover:

  • RateLimitedException the provider returned a 429 rate limit response
  • ProviderOverloadedException the provider returned a 5xx overload response

When a failover occurs, the SDK fires an AgentFailedOver event containing the agent, the failed provider, the model and the exception. This event is your hook for logging and alerting.

Failover works across all operations: text generation, image generation, audio synthesis, transcription, embeddings and reranking.

Failover strengthens resilience but introduces concerns you must account for:

  • Output consistency differences between providers
  • Structured schema compatibility variations
  • Model cost differences across providers
  • Different token limits and capabilities

Failover is a resilience mechanism, not a shortcut. Test your application against each provider in your failover chain.

8) Retries, Backoff and Circuit Breakers

AI providers fail in predictable ways:

  • 429 rate limits
  • 5xx outages
  • Slow timeouts

The Laravel AI SDK does not include built-in retry logic, exponential backoff or circuit breakers. The SDK provides failover (switching to a different provider) but does not retry the same provider.

For production systems, you need to build these patterns on top of the SDK.

Retries with Backoff

Use Laravel's retry() helper or implement retry logic in your queue jobs:

php
use Illuminate\Support\Facades\Retry;

$response = retry(3, function () {
    return (new SalesCoach)->prompt('Analyse this...');
}, function (int $attempt, Throwable $exception) {
    return $attempt * 2000; // Exponential backoff in milliseconds
});

For queued jobs, configure retries and backoff on the job class:

php
// In your queue job or via QueuedAgentResponse
(new SalesCoach)
    ->queue('Analyse this...')
    ->maxTries(3)
    ->backoff([10, 30, 60]);

Circuit Breakers

Implement circuit breaker protection using Laravel's cache to track provider health:

php
use Illuminate\Support\Facades\Cache;

$isHealthy = Cache::get('ai:provider:openai:healthy', true);

if (! $isHealthy) {
    // Route to fallback provider or return cached response
}

Listen for the AgentFailedOver event to update provider health status and implement half-open/closed circuit states.

What to Implement

A production-grade Laravel AI system should include:

  • Targeted retry logic that only retries on transient failures
  • Exponential backoff to avoid hammering a struggling provider
  • Circuit breaker protection to stop sending requests to a known-down provider
  • Provider health tracking via event listeners on AgentFailedOver and ProviderFailedOver

Without these, long-running AI requests can overwhelm your queue infrastructure during provider outages.

9) Conversation Persistence

Production chat applications need conversation history. The SDK provides the RemembersConversations trait for automatic database-backed persistence:

php
use Laravel\Ai\Concerns\RemembersConversations;
use Laravel\Ai\Contracts\Conversational;

class SalesCoach implements Agent, Conversational
{
    use Promptable, RemembersConversations;

    public function instructions(): string
    {
        return 'You are a sales coach...';
    }
}

Start a conversation for a user:

php
$response = (new SalesCoach)->forUser($user)->prompt('Hello!');
$conversationId = $response->conversationId;

Continue an existing conversation:

php
$response = (new SalesCoach)
    ->continue($conversationId, as: $user)
    ->prompt('Tell me more about that.');

The SDK automatically stores both user and assistant messages, including tool calls, tool results, usage data and metadata. Conversation titles are generated automatically using the provider's cheapest model.

For production, consider:

  • Message retention policies
  • Conversation archival
  • PII redaction before storage
  • Access controls on conversation retrieval

10) Middleware for Cross-Cutting Concerns

The SDK provides a middleware system for intercepting agent prompts and responses. This is the intended extension point for logging, cost tracking, rate limiting and security:

php
use Laravel\Ai\Contracts\HasMiddleware;

class SalesCoach implements Agent, HasMiddleware
{
    use Promptable;

    public function middleware(): array
    {
        return [
            new LogPrompts,
            new EnforceCostLimits,
            new SanitiseInput,
        ];
    }
}

A middleware receives the AgentPrompt and a $next closure:

php
use Laravel\Ai\Prompts\AgentPrompt;
use Laravel\Ai\Responses\AgentResponse;

class LogPrompts
{
    public function handle(AgentPrompt $prompt, Closure $next)
    {
        Log::info('Prompting agent', ['prompt' => $prompt->prompt]);

        return $next($prompt)->then(function (AgentResponse $response) {
            Log::info('Agent responded', [
                'text' => $response->text,
                'usage' => $response->usage->toArray(),
            ]);
        });
    }
}

Middleware is the right place to implement:

  • Request and response logging
  • Token usage tracking and alerting
  • Per-user rate limiting
  • Input sanitisation
  • Cost enforcement
  • Audit trails

Need Help With Production AI Integration?

Our team specialises in building production-grade AI systems for Laravel and beyond. Talk to an expert today.

11) Tools, Function Calling and RAG

Agents can call tools inside your Laravel application via the HasTools contract:

php
use Laravel\Ai\Contracts\HasTools;

class SalesCoach implements Agent, HasTools
{
    use Promptable;

    public function tools(): iterable
    {
        return [
            new SearchKnowledgeBase,
            new GetCustomerHistory,
        ];
    }
}

Tools are created with Artisan and define their input schema using JsonSchema:

bash
php artisan make:tool SearchKnowledgeBase
php
use Laravel\Ai\Contracts\Tool;
use Laravel\Ai\Tools\Request;

class SearchKnowledgeBase implements Tool
{
    public function description(): string
    {
        return 'Search the knowledge base for relevant documents.';
    }

    public function handle(Request $request): string
    {
        return Document::search($request['query'])->take(5)->get()->toJson();
    }

    public function schema(JsonSchema $schema): array
    {
        return [
            'query' => $schema->string()->required(),
        ];
    }
}

Provider Tools

The SDK includes built-in provider tools:

  • WebSearch web search via Anthropic, OpenAI or Gemini
  • WebFetch fetch web pages via Anthropic or Gemini
  • FileSearch search uploaded files via OpenAI or Gemini
  • SimilaritySearch vector similarity search for RAG

RAG with Vector Support

The SDK provides native vector storage on PostgreSQL with pgvector:

php
$documents = Document::query()
    ->whereVectorSimilarTo('embedding', 'best wineries in Napa Valley')
    ->limit(10)
    ->get();

The SimilaritySearch tool integrates directly with agents for retrieval-augmented generation:

php
use Laravel\Ai\Tools\SimilaritySearch;

public function tools(): iterable
{
    return [
        SimilaritySearch::usingModel(
            model: Document::class,
            column: 'embedding',
            minSimilarity: 0.7,
            limit: 10,
        ),
    ];
}

12) Prompt Injection and Tool Security

If you are using Laravel AI agents or tools:

  • Treat tool input as untrusted the model controls what arguments are passed. The Tools\Request class is a plain array wrapper with no validation layer. Schema definitions constrain the model, but do not enforce constraints server-side (see Section 6)
  • Validate all arguments server-side tool schemas tell the provider what to send, but your handle() method receives raw arguments. Validate types, lengths, ranges and business rules within the handler
  • Restrict tool execution scope tools should never have broader permissions than the user who triggered the agent. Pass user context into tools and enforce authorisation
  • Log tool invocation events the SDK fires InvokingTool (before execution) and ToolInvoked (after execution) events for every tool call. Listen to these for audit trails
  • Sanitise tool output tool results are sent back to the model as context. If a tool returns user-generated content, it could influence the model's subsequent behaviour (indirect prompt injection)

Model output should never directly execute privileged operations. Sanitise, validate and authorise before acting on any tool request.

A secure tool implementation follows this pattern:

php
class TransferFunds implements Tool
{
    public function __construct(private User $user) {}

    public function handle(Request $request): string
    {
        // Validate input types and bounds
        $amount = $request['amount'];
        if (! is_numeric($amount) || $amount <= 0 || $amount > 10000) {
            return 'Invalid transfer amount.';
        }

        // Authorise the action against the current user
        if (! $this->user->can('transfer', $request['account_id'])) {
            return 'Unauthorised.';
        }

        // Execute with validated, authorised parameters
        return TransferService::execute(
            from: $this->user->account,
            to: $request['account_id'],
            amount: $amount,
        );
    }

    public function schema(JsonSchema $schema): array
    {
        return [
            'account_id' => $schema->string()->required(),
            'amount' => $schema->number()->min(0.01)->max(10000)->required(),
        ];
    }
}

The schema reduces the chance of bad input. The handle() method eliminates the risk.

13) Cost Control in Laravel AI Systems

The SDK exposes detailed token usage metrics on every response via the Usage class:

  • promptTokens input tokens consumed
  • completionTokens output tokens generated
  • cacheWriteInputTokens tokens written to prompt cache
  • cacheReadInputTokens tokens read from prompt cache
  • reasoningTokens tokens used for extended thinking
php
$response = (new SalesCoach)->prompt('Analyse this...');

$usage = $response->usage;
$totalTokens = $usage->promptTokens + $usage->completionTokens;

Output Limits

Control output size with the #[MaxTokens] attribute:

php
#[MaxTokens(4096)]
class SalesCoach implements Agent { }

Control agent iteration depth with #[MaxSteps]:

php
#[MaxSteps(5)]
class SalesCoach implements Agent { }

Smart Model Selection

The SDK provides attributes to automatically select cost-appropriate models per provider:

php
#[UseCheapestModel]
class SimpleSummariser implements Agent { }

#[UseSmartestModel]
class ComplexReasoner implements Agent { }

Each provider defines its own cheapestTextModel(), defaultTextModel() and smartestTextModel(). This allows cost-tiered agent design without hardcoding model names.

What You Need to Build

The SDK provides the data and controls. Production cost governance requires application-level implementation:

  • Per-user or per-tenant usage caps
  • Token budgeting and alerting thresholds
  • Usage logging via middleware or event listeners on AgentPrompted
  • Guardrails against abuse (input length limits, request frequency caps)
  • Model selection based on workload cost sensitivity

AI systems without cost controls become financial risks.

14) Observability in Laravel AI Production

The SDK fires events throughout the AI lifecycle. These are your hooks for monitoring, logging and alerting:

EventWhen It Fires
PromptingAgentBefore an agent prompt is sent
AgentPromptedAfter a response is received (includes prompt, response, usage)
StreamingAgentBefore a streaming prompt begins
AgentStreamedAfter a stream completes
AgentFailedOverWhen a provider fails and the SDK switches to the next
ProviderFailedOverWhen any provider operation fails over
InvokingToolBefore a tool is called
ToolInvokedAfter a tool completes
GeneratingImageBefore image generation
ImageGeneratedAfter image generation
GeneratingAudioBefore audio synthesis
AudioGeneratedAfter audio synthesis
GeneratingTranscriptionBefore transcription
TranscriptionGeneratedAfter transcription
GeneratingEmbeddingsBefore embeddings generation
EmbeddingsGeneratedAfter embeddings generation
Reranking / RerankedBefore and after reranking

Use these events with Laravel's event listeners to track:

  • Provider and model per request
  • Latency (calculate from PromptingAgent to AgentPrompted)
  • Token usage per agent, user or tenant
  • Failure categories and failover frequency
  • Tool invocation patterns
  • Queue wait time and execution duration

Without observability, production Laravel AI becomes guesswork.

15) Multimodal Capabilities

The SDK is not limited to text generation. Production applications can leverage:

Image Generation

php
use Laravel\Ai\Image;

$image = Image::of('A product photo with white background')
    ->quality('high')
    ->landscape()
    ->timeout(120)
    ->generate();

$path = $image->store();

Supported providers: OpenAI, Gemini, xAI. Supports reference images, queued generation and failover.

Audio Synthesis (Text-to-Speech)

php
use Laravel\Ai\Audio;

$audio = Audio::of('Welcome to our platform.')
    ->female()
    ->generate();

$path = $audio->store();

Supported providers: OpenAI, ElevenLabs. Supports custom voices and instructions.

Transcription (Speech-to-Text)

php
use Laravel\Ai\Transcription;

$transcript = Transcription::fromStorage('meeting.mp3')
    ->diarize()
    ->generate();

Supported providers: OpenAI, ElevenLabs, Mistral. Supports diarisation for speaker identification.

Embeddings

php
use Laravel\Ai\Embeddings;

$response = Embeddings::for([
    'First document content.',
    'Second document content.',
])->generate();

Supported providers: OpenAI, Gemini, Cohere, Mistral, Jina, VoyageAI. Supports caching with configurable store and TTL.

Reranking

php
use Laravel\Ai\Reranking;

$response = Reranking::of($searchResults)
    ->limit(5)
    ->rerank('PHP frameworks');

Supported providers: Cohere, Jina. Also supports Eloquent collection reranking.

All multimodal operations support queueing, failover and events.

16) Custom Endpoints and Proxy Configuration

In production environments you may need to route AI requests through a proxy, API gateway or centralised key management service.

The SDK supports custom base URLs per provider:

php
// config/ai.php
'providers' => [
    'openai' => [
        'driver' => 'openai',
        'key' => env('OPENAI_API_KEY'),
        'url' => env('OPENAI_BASE_URL'),
    ],
    'anthropic' => [
        'driver' => 'anthropic',
        'key' => env('ANTHROPIC_API_KEY'),
        'url' => env('ANTHROPIC_BASE_URL'),
    ],
],

Custom endpoints are supported for OpenAI, Anthropic, Gemini, Groq, Cohere, DeepSeek, xAI and OpenRouter.

17) Testing Laravel AI in Production CI/CD

The SDK provides comprehensive testing support. Every feature can be faked and asserted against:

Faking Agents

php
use App\Ai\Agents\SalesCoach;

SalesCoach::fake([
    'First response',
    'Second response',
]);

// Run your application code...

SalesCoach::assertPrompted('Analyse this...');
SalesCoach::assertQueued(fn ($prompt) => $prompt->contains('transcript'));
SalesCoach::assertNeverPrompted();

Faking Other Operations

php
Image::fake();
Audio::fake();
Transcription::fake();
Embeddings::fake();
Reranking::fake();
Files::fake();
Stores::fake();

Each fake supports custom responses, closures for dynamic responses, assertions for verification and preventStray*() methods to catch unexpected calls.

Production CI/CD pipelines should fake all AI operations to ensure tests are deterministic, fast and free from provider dependencies.

18) Security, Cost and Operational Guardrails

Reliable Laravel AI systems require governance across multiple dimensions.

Cost Control

Implement:

  • Per-user and per-tenant limits via middleware
  • Token caps using #[MaxTokens] and #[MaxSteps] attributes
  • Usage monitoring via AgentPrompted event listeners
  • Model selection using #[UseCheapestModel] for cost-sensitive workloads
  • Budget alerting when usage approaches thresholds

Data Governance

Define:

  • What AI conversations are stored (via RemembersConversations or custom storage)
  • Retention periods for conversation history
  • Redaction policies for PII in prompts and responses
  • Encryption and access controls on the agent_conversations and agent_conversation_messages tables

Boundary Rate Limiting

Protect your AI endpoints from abuse using Laravel's rate limiting and middleware:

  • Throttle AI endpoints at the HTTP layer
  • Implement per-user request limits in agent middleware
  • Use queue rate limiting to control provider API usage

Worker Scaling and Queue Health

Monitor:

  • Queue backlog growth
  • Worker concurrency
  • Memory usage per worker
  • Execution time distribution

AI jobs are longer-lived than typical background jobs. A text generation job may run for 30-120 seconds. Infrastructure must reflect that reality. Configure appropriate timeouts on your queue workers and consider dedicated queues for AI workloads.

Structured Output and Tool Input Validation

The SDK's schemas constrain provider output but perform no server-side validation (see Section 6). For production:

  • Validate all structured agent responses with Laravel's Validator before using the data
  • Validate all tool Request inputs inside handle() before executing logic
  • Sanitise text responses before rendering in HTML to prevent XSS
  • Never assume types match the schema check before database writes, calculations or API calls
  • Log validation failures as a signal of model misbehaviour or prompt injection attempts

The Laravel AI Production Architecture

A resilient Laravel AI architecture uses what the SDK provides and supplements what it does not:

SDK provides:

  • Agent class architecture with Promptable trait
  • Queue dispatch via queue() and broadcastOnQueue()
  • Streaming via SSE and Vercel AI SDK protocol
  • Native WebSocket broadcasting
  • Provider failover on rate limits and overloads
  • Structured output with provider-side schema constraints
  • Conversation persistence with RemembersConversations
  • Middleware pipeline for cross-cutting concerns
  • Token usage metrics on every response
  • Event system across all operations
  • Comprehensive testing and faking support
  • Multimodal support across 13 providers

You build on top:

  • Retry logic with exponential backoff
  • Circuit breaker protection
  • Provider health tracking
  • Per-user cost caps and budgeting
  • Rate limiting at application and queue layers
  • Observability dashboards and alerting
  • Data retention and redaction policies
  • Input sanitisation and abuse prevention
  • Server-side validation of all structured output and tool inputs

Queues are table stakes. Resilience is the differentiator.

Building Production-Grade Laravel AI

Most teams stop at "it works locally".

Production Laravel AI requires engineering discipline across:

  • Agent architecture and configuration
  • Infrastructure and queue management
  • Cost control and model selection
  • Security, tool validation and input sanitisation
  • Server-side validation of structured output and tool inputs
  • Observability, logging and alerting
  • Operational scaling and worker management
  • Testing and CI/CD integration
  • Conversation persistence and data governance

The SDK gives you a strong foundation. Production readiness is what you build on top of it.

At Delaney Industries, we design and implement production-grade Laravel AI systems built for reliability, scalability and operational clarity.

If you are integrating AI into a live Laravel application:

Delaney Wright

Director, Delaney Industries

Delaney Wright is the Director of Delaney Industries, a software development company based in Sleaford, Lincolnshire. Specialising in web development, AI integration, web applications and process automation for businesses across the UK.

Ready to Build Production AI?

We help businesses integrate AI into their applications with reliability, scalability and operational clarity.

Talk to an Expert
We use cookies to enhance your experience