Laravel AI SDK provides a unified, expressive API for integrating AI providers into your Laravel application.
Beyond text, the SDK handles image generation, text-to-speech, speech-to-text, embeddings, reranking, file management and vector storage across providers including ElevenLabs, Cohere, Jina and VoyageAI.
Calling the model is easy.
Running Laravel AI in production is where most teams struggle.
AI requests are slower, more failure-prone and more expensive than standard CRUD operations. If you treat Laravel AI like a normal HTTP call, your system will eventually break under load.
This guide covers what actually matters when deploying Laravel AI into production: agent architecture, queues, streaming, broadcasting, structured output, failover, retries, cost control, conversation persistence, middleware, security, testing and operational guardrails.
1) The Agent Architecture
The fundamental building block of the Laravel AI SDK is the Agent. Rather than scattering AI calls across controllers and services, you define behaviour in a dedicated PHP class:
php artisan make:agent SalesCoachEach agent implements the Agent contract and uses the Promptable trait. Agents encapsulate instructions, conversation context, tools and output schemas:
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Promptable;
class SalesCoach implements Agent
{
use Promptable;
public function instructions(): string
{
return 'You are a sales coach...';
}
}Calling the agent is a single line:
$response = (new SalesCoach)->prompt('Analyse this sales transcript...');Agents can be configured with PHP attributes for provider, model, timeout, token limits, temperature and step limits:
use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\MaxSteps;
use Laravel\Ai\Attributes\MaxTokens;
use Laravel\Ai\Attributes\Temperature;
use Laravel\Ai\Attributes\Timeout;
#[Provider('anthropic')]
#[Model('claude-sonnet-4-5-20250929')]
#[MaxSteps(10)]
#[MaxTokens(4096)]
#[Temperature(0.7)]
#[Timeout(120)]
class SalesCoach implements Agent
{
use Promptable;
}For one-off calls without creating a class, the SDK provides anonymous agents:
use function Laravel\Ai\agent;
$response = agent(
instructions: 'You are an expert at software development.',
)->prompt('Tell me about Laravel');This agent-first architecture is the foundation of every production pattern that follows.
2) Request Timeouts in Production
In production environments, multiple timeout layers exist:
- CDN or reverse proxy
- Load balancer
- Nginx or Apache
- PHP-FPM worker limits
- Livewire or Octane timeouts
- Browser patience
Even if your local machine waits 60 seconds, production infrastructure often will not.
Laravel AI SDK supports explicit request timeouts through three mechanisms:
- The
#[Timeout(120)]attribute on the agent class - A
timeout()method on the agent - A
timeoutparameter passed directly toprompt()orstream()
The default timeout is 60 seconds. If an AI call is not guaranteed to complete within your infrastructure's timeout chain, it should never run inside a standard web request.
$response = (new SalesCoach)->prompt(
'Analyse this transcript...',
timeout: 120,
);Anything long-running in Laravel AI should be queued.
3) Queues Are Essential for Laravel AI
Laravel AI SDK provides first-class queue support. Agents can be dispatched to a queue with a single method call:
use Laravel\Ai\Responses\AgentResponse;
use Throwable;
(new SalesCoach)
->queue('Analyse this sales transcript...')
->then(function (AgentResponse $response) {
// Handle completed response
})
->catch(function (Throwable $e) {
// Handle failure
});Under the hood, queue() dispatches an InvokeAgent job that implements ShouldQueue. The returned QueuedAgentResponse wraps Laravel's PendingDispatch, giving you access to all standard queue controls:
(new SalesCoach)
->queue('Analyse this transcript...')
->onQueue('ai')
->delay(now()->addSeconds(5))
->then(fn ($response) => /* handle */);Queueing is not limited to text generation. Images, audio, transcriptions and embeddings all support queued execution:
Image::of('A donut on the kitchen counter')
->portrait()
->queue()
->then(fn ($image) => $image->store());That is the baseline.
For a production-ready Laravel AI system, you also need:
- A persisted job record for auditing and recovery
- Clear job status tracking beyond Laravel's default queue monitoring
- Idempotency safeguards to prevent duplicate AI calls on retry
- Controlled retry logic with backoff (see Section 8)
The SDK provides the queue dispatch, callbacks and error handling. Job persistence, status tracking and idempotency are application-level concerns you build on top.
A resilient Laravel AI workflow typically looks like this:
- User submits AI prompt
- Application returns job ID immediately
- Queue worker performs the AI call
- UI updates when complete
This prevents AI workloads from blocking web workers and protects application stability.
Why a Dedicated AI Queue Matters
AI jobs are longer running, more memory intensive, more failure-prone, bursty under load and expensive per execution. If you mix them with email jobs, notification jobs, billing jobs, webhook handlers and standard background processing, you create a single point of contention.
During a provider slowdown:
- AI jobs back up
- Workers saturate
- Other business-critical jobs starve
That's how you get cascading failures.
1. Separate Queue
Use a dedicated queue name:
(new SalesCoach)
->queue('Analyse this...')
->onQueue('ai');Or configure a dedicated queue connection in config/queue.php.
2. Dedicated Workers
Run separate workers for AI workloads:
php artisan queue:work --queue=ai --timeout=180 --memory=512And separate workers for default jobs:
php artisan queue:work --queue=defaultThat isolates AI latency from your core app behaviour.
3. Horizon Configuration
If using Horizon, define a supervisor specifically for AI jobs:
'supervisors' => [
'ai-supervisor' => [
'connection' => 'redis',
'queue' => ['ai'],
'balance' => 'auto',
'maxProcesses' => 10,
'timeout' => 180,
'memory' => 512,
],
],That lets you scale AI independently.
4. Timeout Alignment
AI jobs may need higher worker timeout, higher memory limit and lower concurrency. You don't want a 30-second worker timeout killing a 45-second generation job.
5. Rate-Limited Queues
If provider rate limits are aggressive, consider lower concurrency, Redis rate limiting or custom queue middleware.
Production Reality: Mixing AI jobs with core application queues is how production incidents start. Separate queues, dedicated workers and independent scaling are not optional they're operational requirements for resilient AI systems.
4) Streaming Responses in Laravel AI
Laravel AI SDK supports streaming responses via Server-Sent Events (SSE):
Route::get('/coach', function () {
return (new SalesCoach)->stream('Analyse this sales transcript...');
});The stream() method returns a StreamableAgentResponse that implements Laravel's Responsable interface. It automatically sets the correct Content-Type: text/event-stream headers and streams events as they arrive from the provider.
You can process the completed stream with a callback:
(new SalesCoach)
->stream('Analyse this transcript...')
->then(function (StreamedAgentResponse $response) {
// $response->text
// $response->events
// $response->usage
});Or iterate over events manually:
$stream = (new SalesCoach)->stream('Analyse this transcript...');
foreach ($stream as $event) {
// Process each StreamEvent
}The SDK emits typed streaming events including TextDelta, ToolCall, ToolResult, ReasoningDelta, Citation and Error, giving you granular control over the streaming experience.
Vercel AI SDK Protocol
If your frontend uses Next.js or the Vercel AI SDK, Laravel AI supports the Vercel data protocol natively:
Route::get('/coach', function () {
return (new SalesCoach)
->stream('Analyse this transcript...')
->usingVercelDataProtocol();
});This maps streaming events to the Vercel protocol format (text-delta, tool-input-available, tool-output-available, finish) with the appropriate x-vercel-ai-ui-message-stream header.
Streaming improves perceived performance, but it does not improve resilience. Streaming still suffers from provider stalls, mid-stream failures, network drops and infrastructure buffering issues. Streaming is a delivery optimisation. Queue architecture is the resilience foundation.
5) Built-In Broadcasting for Real-Time Interfaces
If you are queueing Laravel AI jobs, polling for completion wastes resources.
The SDK provides native broadcasting support. You do not need to wire this up yourself.
Synchronous Broadcasting
Stream events and broadcast them to WebSocket channels simultaneously:
(new SalesCoach)->broadcast(
'Analyse this transcript...',
new Channel('analysis-results'),
);For immediate dispatch without queuing the broadcast events:
(new SalesCoach)->broadcastNow(
'Analyse this transcript...',
new Channel('analysis-results'),
);Queued Broadcasting
Run the AI call in a background job and broadcast each streaming event as it arrives:
(new SalesCoach)->broadcastOnQueue(
'Analyse this transcript...',
new Channel('analysis-results'),
);This dispatches a BroadcastAgent job that streams the response and broadcasts each StreamEvent to the specified channels in real time. The frontend receives updates as the AI generates them, even though the work is running in a queue worker.
Broadcasting supports both public and private channels via Laravel's broadcasting system (Reverb, Pusher, Ably, or any compatible driver).
The production pattern is:
- Queue job runs via
broadcastOnQueue() - Worker streams the AI response
- Each streaming event broadcasts to the channel
- Frontend updates instantly via WebSocket
WebSockets improve user experience. They do not replace failover, retries or operational guardrails.
Need Help With Production AI Integration?
Our team specialises in building production-grade AI systems for Laravel and beyond. Talk to an expert today.
6) Structured Output and Validation
In production, free-form text responses are difficult to work with programmatically. Laravel AI SDK supports structured output via the HasStructuredOutput contract:
use Laravel\Ai\Contracts\HasStructuredOutput;
use Illuminate\Contracts\JsonSchema\JsonSchema;
class SalesCoach implements Agent, HasStructuredOutput
{
use Promptable;
public function schema(JsonSchema $schema): array
{
return [
'feedback' => $schema->string()->required(),
'score' => $schema->integer()->min(1)->max(10)->required(),
'recommendations' => $schema->array()->items(
$schema->string()
),
];
}
}The response is a StructuredAgentResponse accessible as a typed array:
$response = (new SalesCoach)->prompt('Analyse this transcript...');
$score = $response['score'];
$feedback = $response['feedback'];How Schema Validation Actually Works
This is a critical distinction for production systems.
The SDK sends your schema to the AI provider as a constraint. For OpenAI, it enables strict mode by default ('schema' => ['strict' => true]). For Anthropic, it uses tool calling to enforce structure. The ObjectSchema also sets withoutAdditionalProperties() to prevent extra fields.
However, the SDK performs no server-side validation of the response. The StructuredAgentResponse stores the provider's decoded JSON directly in a $structured array. If the model returns malformed data, missing fields or incorrect types, the SDK will not catch it.
For production, you must validate structured output yourself:
$response = (new SalesCoach)->prompt('Analyse this transcript...');
$validated = validator($response->toArray(), [
'feedback' => ['required', 'string'],
'score' => ['required', 'integer', 'min:1', 'max:10'],
'recommendations' => ['sometimes', 'array'],
'recommendations.*' => ['string'],
])->validate();Or use a Form Request, Data Transfer Object or similar pattern to enforce structure at the application boundary.
Tool Input Validation
The same applies to tool inputs. Tools define schemas via the schema() method, which constrains what the model sends. But the Tools\Request class that your handle() method receives is a plain array wrapper with no validation layer:
class SearchKnowledgeBase implements Tool
{
public function handle(Request $request): string
{
// $request['query'] comes straight from the model
// The schema told the model to send a string, but validate anyway
$query = $request['query'];
if (! is_string($query) || strlen($query) > 500) {
return 'Invalid search query.';
}
return Document::search($query)->take(5)->get()->toJson();
}
public function schema(JsonSchema $schema): array
{
return [
'query' => $schema->string()->required(),
];
}
}Production Validation Strategy
For production systems, treat AI output the same way you treat user input:
- Structured responses: Validate with Laravel's Validator, Form Requests or DTOs before passing data downstream
- Tool inputs: Validate and sanitise within handle() before executing any logic
- Text responses: Sanitise before rendering to prevent XSS if displaying raw AI text in HTML
- Type safety: Never assume the response array matches your schema check types before using values in calculations, database writes or API calls
Schemas reduce the likelihood of malformed output. Validation eliminates the risk.
7) Multi-Provider Failover
Laravel AI SDK supports automatic provider failover. Pass an array of providers and the SDK will cascade through them if a failure occurs:
$response = (new SalesCoach)->prompt(
'Analyse this transcript...',
provider: ['openai', 'anthropic'],
);The failover mechanism works by catching FailoverableException instances. Two exception types trigger failover:
RateLimitedExceptionthe provider returned a 429 rate limit responseProviderOverloadedExceptionthe provider returned a 5xx overload response
When a failover occurs, the SDK fires an AgentFailedOver event containing the agent, the failed provider, the model and the exception. This event is your hook for logging and alerting.
Failover works across all operations: text generation, image generation, audio synthesis, transcription, embeddings and reranking.
Failover strengthens resilience but introduces concerns you must account for:
- Output consistency differences between providers
- Structured schema compatibility variations
- Model cost differences across providers
- Different token limits and capabilities
Failover is a resilience mechanism, not a shortcut. Test your application against each provider in your failover chain.
8) Retries, Backoff and Circuit Breakers
AI providers fail in predictable ways:
- 429 rate limits
- 5xx outages
- Slow timeouts
The Laravel AI SDK does not include built-in retry logic, exponential backoff or circuit breakers. The SDK provides failover (switching to a different provider) but does not retry the same provider.
For production systems, you need to build these patterns on top of the SDK.
Retries with Backoff
Use Laravel's retry() helper or implement retry logic in your queue jobs:
use Illuminate\Support\Facades\Retry;
$response = retry(3, function () {
return (new SalesCoach)->prompt('Analyse this...');
}, function (int $attempt, Throwable $exception) {
return $attempt * 2000; // Exponential backoff in milliseconds
});For queued jobs, configure retries and backoff on the job class:
// In your queue job or via QueuedAgentResponse
(new SalesCoach)
->queue('Analyse this...')
->maxTries(3)
->backoff([10, 30, 60]);Circuit Breakers
Implement circuit breaker protection using Laravel's cache to track provider health:
use Illuminate\Support\Facades\Cache;
$isHealthy = Cache::get('ai:provider:openai:healthy', true);
if (! $isHealthy) {
// Route to fallback provider or return cached response
}Listen for the AgentFailedOver event to update provider health status and implement half-open/closed circuit states.
What to Implement
A production-grade Laravel AI system should include:
- Targeted retry logic that only retries on transient failures
- Exponential backoff to avoid hammering a struggling provider
- Circuit breaker protection to stop sending requests to a known-down provider
- Provider health tracking via event listeners on AgentFailedOver and ProviderFailedOver
Without these, long-running AI requests can overwhelm your queue infrastructure during provider outages.
9) Conversation Persistence
Production chat applications need conversation history. The SDK provides the RemembersConversations trait for automatic database-backed persistence:
use Laravel\Ai\Concerns\RemembersConversations;
use Laravel\Ai\Contracts\Conversational;
class SalesCoach implements Agent, Conversational
{
use Promptable, RemembersConversations;
public function instructions(): string
{
return 'You are a sales coach...';
}
}Start a conversation for a user:
$response = (new SalesCoach)->forUser($user)->prompt('Hello!');
$conversationId = $response->conversationId;Continue an existing conversation:
$response = (new SalesCoach)
->continue($conversationId, as: $user)
->prompt('Tell me more about that.');The SDK automatically stores both user and assistant messages, including tool calls, tool results, usage data and metadata. Conversation titles are generated automatically using the provider's cheapest model.
For production, consider:
- Message retention policies
- Conversation archival
- PII redaction before storage
- Access controls on conversation retrieval
10) Middleware for Cross-Cutting Concerns
The SDK provides a middleware system for intercepting agent prompts and responses. This is the intended extension point for logging, cost tracking, rate limiting and security:
use Laravel\Ai\Contracts\HasMiddleware;
class SalesCoach implements Agent, HasMiddleware
{
use Promptable;
public function middleware(): array
{
return [
new LogPrompts,
new EnforceCostLimits,
new SanitiseInput,
];
}
}A middleware receives the AgentPrompt and a $next closure:
use Laravel\Ai\Prompts\AgentPrompt;
use Laravel\Ai\Responses\AgentResponse;
class LogPrompts
{
public function handle(AgentPrompt $prompt, Closure $next)
{
Log::info('Prompting agent', ['prompt' => $prompt->prompt]);
return $next($prompt)->then(function (AgentResponse $response) {
Log::info('Agent responded', [
'text' => $response->text,
'usage' => $response->usage->toArray(),
]);
});
}
}Middleware is the right place to implement:
- Request and response logging
- Token usage tracking and alerting
- Per-user rate limiting
- Input sanitisation
- Cost enforcement
- Audit trails
Need Help With Production AI Integration?
Our team specialises in building production-grade AI systems for Laravel and beyond. Talk to an expert today.
11) Tools, Function Calling and RAG
Agents can call tools inside your Laravel application via the HasTools contract:
use Laravel\Ai\Contracts\HasTools;
class SalesCoach implements Agent, HasTools
{
use Promptable;
public function tools(): iterable
{
return [
new SearchKnowledgeBase,
new GetCustomerHistory,
];
}
}Tools are created with Artisan and define their input schema using JsonSchema:
php artisan make:tool SearchKnowledgeBaseuse Laravel\Ai\Contracts\Tool;
use Laravel\Ai\Tools\Request;
class SearchKnowledgeBase implements Tool
{
public function description(): string
{
return 'Search the knowledge base for relevant documents.';
}
public function handle(Request $request): string
{
return Document::search($request['query'])->take(5)->get()->toJson();
}
public function schema(JsonSchema $schema): array
{
return [
'query' => $schema->string()->required(),
];
}
}Provider Tools
The SDK includes built-in provider tools:
- WebSearch web search via Anthropic, OpenAI or Gemini
- WebFetch fetch web pages via Anthropic or Gemini
- FileSearch search uploaded files via OpenAI or Gemini
- SimilaritySearch vector similarity search for RAG
RAG with Vector Support
The SDK provides native vector storage on PostgreSQL with pgvector:
$documents = Document::query()
->whereVectorSimilarTo('embedding', 'best wineries in Napa Valley')
->limit(10)
->get();The SimilaritySearch tool integrates directly with agents for retrieval-augmented generation:
use Laravel\Ai\Tools\SimilaritySearch;
public function tools(): iterable
{
return [
SimilaritySearch::usingModel(
model: Document::class,
column: 'embedding',
minSimilarity: 0.7,
limit: 10,
),
];
}12) Prompt Injection and Tool Security
If you are using Laravel AI agents or tools:
- Treat tool input as untrusted the model controls what arguments are passed. The Tools\Request class is a plain array wrapper with no validation layer. Schema definitions constrain the model, but do not enforce constraints server-side (see Section 6)
- Validate all arguments server-side tool schemas tell the provider what to send, but your handle() method receives raw arguments. Validate types, lengths, ranges and business rules within the handler
- Restrict tool execution scope tools should never have broader permissions than the user who triggered the agent. Pass user context into tools and enforce authorisation
- Log tool invocation events the SDK fires InvokingTool (before execution) and ToolInvoked (after execution) events for every tool call. Listen to these for audit trails
- Sanitise tool output tool results are sent back to the model as context. If a tool returns user-generated content, it could influence the model's subsequent behaviour (indirect prompt injection)
Model output should never directly execute privileged operations. Sanitise, validate and authorise before acting on any tool request.
A secure tool implementation follows this pattern:
class TransferFunds implements Tool
{
public function __construct(private User $user) {}
public function handle(Request $request): string
{
// Validate input types and bounds
$amount = $request['amount'];
if (! is_numeric($amount) || $amount <= 0 || $amount > 10000) {
return 'Invalid transfer amount.';
}
// Authorise the action against the current user
if (! $this->user->can('transfer', $request['account_id'])) {
return 'Unauthorised.';
}
// Execute with validated, authorised parameters
return TransferService::execute(
from: $this->user->account,
to: $request['account_id'],
amount: $amount,
);
}
public function schema(JsonSchema $schema): array
{
return [
'account_id' => $schema->string()->required(),
'amount' => $schema->number()->min(0.01)->max(10000)->required(),
];
}
}The schema reduces the chance of bad input. The handle() method eliminates the risk.
13) Cost Control in Laravel AI Systems
The SDK exposes detailed token usage metrics on every response via the Usage class:
- promptTokens input tokens consumed
- completionTokens output tokens generated
- cacheWriteInputTokens tokens written to prompt cache
- cacheReadInputTokens tokens read from prompt cache
- reasoningTokens tokens used for extended thinking
$response = (new SalesCoach)->prompt('Analyse this...');
$usage = $response->usage;
$totalTokens = $usage->promptTokens + $usage->completionTokens;Output Limits
Control output size with the #[MaxTokens] attribute:
#[MaxTokens(4096)]
class SalesCoach implements Agent { }Control agent iteration depth with #[MaxSteps]:
#[MaxSteps(5)]
class SalesCoach implements Agent { }Smart Model Selection
The SDK provides attributes to automatically select cost-appropriate models per provider:
#[UseCheapestModel]
class SimpleSummariser implements Agent { }
#[UseSmartestModel]
class ComplexReasoner implements Agent { }Each provider defines its own cheapestTextModel(), defaultTextModel() and smartestTextModel(). This allows cost-tiered agent design without hardcoding model names.
What You Need to Build
The SDK provides the data and controls. Production cost governance requires application-level implementation:
- Per-user or per-tenant usage caps
- Token budgeting and alerting thresholds
- Usage logging via middleware or event listeners on AgentPrompted
- Guardrails against abuse (input length limits, request frequency caps)
- Model selection based on workload cost sensitivity
AI systems without cost controls become financial risks.
14) Observability in Laravel AI Production
The SDK fires events throughout the AI lifecycle. These are your hooks for monitoring, logging and alerting:
| Event | When It Fires |
|---|---|
| PromptingAgent | Before an agent prompt is sent |
| AgentPrompted | After a response is received (includes prompt, response, usage) |
| StreamingAgent | Before a streaming prompt begins |
| AgentStreamed | After a stream completes |
| AgentFailedOver | When a provider fails and the SDK switches to the next |
| ProviderFailedOver | When any provider operation fails over |
| InvokingTool | Before a tool is called |
| ToolInvoked | After a tool completes |
| GeneratingImage | Before image generation |
| ImageGenerated | After image generation |
| GeneratingAudio | Before audio synthesis |
| AudioGenerated | After audio synthesis |
| GeneratingTranscription | Before transcription |
| TranscriptionGenerated | After transcription |
| GeneratingEmbeddings | Before embeddings generation |
| EmbeddingsGenerated | After embeddings generation |
| Reranking / Reranked | Before and after reranking |
Use these events with Laravel's event listeners to track:
- Provider and model per request
- Latency (calculate from PromptingAgent to AgentPrompted)
- Token usage per agent, user or tenant
- Failure categories and failover frequency
- Tool invocation patterns
- Queue wait time and execution duration
Without observability, production Laravel AI becomes guesswork.
15) Multimodal Capabilities
The SDK is not limited to text generation. Production applications can leverage:
Image Generation
use Laravel\Ai\Image;
$image = Image::of('A product photo with white background')
->quality('high')
->landscape()
->timeout(120)
->generate();
$path = $image->store();Supported providers: OpenAI, Gemini, xAI. Supports reference images, queued generation and failover.
Audio Synthesis (Text-to-Speech)
use Laravel\Ai\Audio;
$audio = Audio::of('Welcome to our platform.')
->female()
->generate();
$path = $audio->store();Supported providers: OpenAI, ElevenLabs. Supports custom voices and instructions.
Transcription (Speech-to-Text)
use Laravel\Ai\Transcription;
$transcript = Transcription::fromStorage('meeting.mp3')
->diarize()
->generate();Supported providers: OpenAI, ElevenLabs, Mistral. Supports diarisation for speaker identification.
Embeddings
use Laravel\Ai\Embeddings;
$response = Embeddings::for([
'First document content.',
'Second document content.',
])->generate();Supported providers: OpenAI, Gemini, Cohere, Mistral, Jina, VoyageAI. Supports caching with configurable store and TTL.
Reranking
use Laravel\Ai\Reranking;
$response = Reranking::of($searchResults)
->limit(5)
->rerank('PHP frameworks');Supported providers: Cohere, Jina. Also supports Eloquent collection reranking.
All multimodal operations support queueing, failover and events.
16) Custom Endpoints and Proxy Configuration
In production environments you may need to route AI requests through a proxy, API gateway or centralised key management service.
The SDK supports custom base URLs per provider:
// config/ai.php
'providers' => [
'openai' => [
'driver' => 'openai',
'key' => env('OPENAI_API_KEY'),
'url' => env('OPENAI_BASE_URL'),
],
'anthropic' => [
'driver' => 'anthropic',
'key' => env('ANTHROPIC_API_KEY'),
'url' => env('ANTHROPIC_BASE_URL'),
],
],Custom endpoints are supported for OpenAI, Anthropic, Gemini, Groq, Cohere, DeepSeek, xAI and OpenRouter.
17) Testing Laravel AI in Production CI/CD
The SDK provides comprehensive testing support. Every feature can be faked and asserted against:
Faking Agents
use App\Ai\Agents\SalesCoach;
SalesCoach::fake([
'First response',
'Second response',
]);
// Run your application code...
SalesCoach::assertPrompted('Analyse this...');
SalesCoach::assertQueued(fn ($prompt) => $prompt->contains('transcript'));
SalesCoach::assertNeverPrompted();Faking Other Operations
Image::fake();
Audio::fake();
Transcription::fake();
Embeddings::fake();
Reranking::fake();
Files::fake();
Stores::fake();Each fake supports custom responses, closures for dynamic responses, assertions for verification and preventStray*() methods to catch unexpected calls.
Production CI/CD pipelines should fake all AI operations to ensure tests are deterministic, fast and free from provider dependencies.
18) Security, Cost and Operational Guardrails
Reliable Laravel AI systems require governance across multiple dimensions.
Cost Control
Implement:
- Per-user and per-tenant limits via middleware
- Token caps using #[MaxTokens] and #[MaxSteps] attributes
- Usage monitoring via AgentPrompted event listeners
- Model selection using #[UseCheapestModel] for cost-sensitive workloads
- Budget alerting when usage approaches thresholds
Data Governance
Define:
- What AI conversations are stored (via RemembersConversations or custom storage)
- Retention periods for conversation history
- Redaction policies for PII in prompts and responses
- Encryption and access controls on the agent_conversations and agent_conversation_messages tables
Boundary Rate Limiting
Protect your AI endpoints from abuse using Laravel's rate limiting and middleware:
- Throttle AI endpoints at the HTTP layer
- Implement per-user request limits in agent middleware
- Use queue rate limiting to control provider API usage
Worker Scaling and Queue Health
Monitor:
- Queue backlog growth
- Worker concurrency
- Memory usage per worker
- Execution time distribution
AI jobs are longer-lived than typical background jobs. A text generation job may run for 30-120 seconds. Infrastructure must reflect that reality. Configure appropriate timeouts on your queue workers and consider dedicated queues for AI workloads.
Structured Output and Tool Input Validation
The SDK's schemas constrain provider output but perform no server-side validation (see Section 6). For production:
- Validate all structured agent responses with Laravel's Validator before using the data
- Validate all tool Request inputs inside handle() before executing logic
- Sanitise text responses before rendering in HTML to prevent XSS
- Never assume types match the schema check before database writes, calculations or API calls
- Log validation failures as a signal of model misbehaviour or prompt injection attempts
The Laravel AI Production Architecture
A resilient Laravel AI architecture uses what the SDK provides and supplements what it does not:
SDK provides:
- Agent class architecture with Promptable trait
- Queue dispatch via queue() and broadcastOnQueue()
- Streaming via SSE and Vercel AI SDK protocol
- Native WebSocket broadcasting
- Provider failover on rate limits and overloads
- Structured output with provider-side schema constraints
- Conversation persistence with RemembersConversations
- Middleware pipeline for cross-cutting concerns
- Token usage metrics on every response
- Event system across all operations
- Comprehensive testing and faking support
- Multimodal support across 13 providers
You build on top:
- Retry logic with exponential backoff
- Circuit breaker protection
- Provider health tracking
- Per-user cost caps and budgeting
- Rate limiting at application and queue layers
- Observability dashboards and alerting
- Data retention and redaction policies
- Input sanitisation and abuse prevention
- Server-side validation of all structured output and tool inputs
Queues are table stakes. Resilience is the differentiator.
Building Production-Grade Laravel AI
Most teams stop at "it works locally".
Production Laravel AI requires engineering discipline across:
- Agent architecture and configuration
- Infrastructure and queue management
- Cost control and model selection
- Security, tool validation and input sanitisation
- Server-side validation of structured output and tool inputs
- Observability, logging and alerting
- Operational scaling and worker management
- Testing and CI/CD integration
- Conversation persistence and data governance
The SDK gives you a strong foundation. Production readiness is what you build on top of it.
At Delaney Industries, we design and implement production-grade Laravel AI systems built for reliability, scalability and operational clarity.
If you are integrating AI into a live Laravel application:
Delaney Wright
Director, Delaney Industries
Delaney Wright is the Director of Delaney Industries, a software development company based in Sleaford, Lincolnshire. Specialising in web development, AI integration, web applications and process automation for businesses across the UK.
