▸ Blog
Notes from the trenches.
Laravel internals, AI engineering with Claude + MCP, RAG pipelines, Shopify Plus, DevOps. Production lessons from shipped work — every post earns its slot. Featured = hand-picked or most recent.
▸ 24 posts · last updated
May 18, 2026
Browse by topic.
▸ Pick a lane · click a card to filter
Latest drop.
▸ Most recent · or pinned by hand
Why agentic commerce will change the way you build Shopify stores
AI agents don't browse — they query. The shift from human-centric Shopify themes to agent-ready infrastructure: Shopify Catalog, UCP, Agentic Storefronts, MCP servers, and why structured data is the new CSS.
Read post →
More posts.
▸ 9 of 23 · paginated below
-
AI7 mistakes you're making with your production RAG stack (and how to fix them)
Naive chunking, no reranker, embedding drift, latency blowups, vibe-checking — the seven structural mistakes that turn a slick RAG demo into a production nightmare, and the fixes that actually ship.
Read post →
-
ArchitectureScaling with RabbitMQ: why message brokers matter
Synchronous controllers are how monoliths die. RabbitMQ basics, exchanges and queues, the strangler pattern for going async, idempotent workers, and the Laravel queue setup I use to absorb 100k-row spikes without breaking the login page.
Read post →
-
ArchitectureCaching for speed: Redis and semantic layers in RAG
Stop paying for the same LLM call twice. Two-tier caching — exact-match Redis keys plus semantic vector lookups via RedisVL — that cuts RAG latency from seconds to milliseconds and slashes API spend by up to 80%. With tenant isolation, TTL tiers, and the precision metrics that keep it honest.
Read post →
-
ArchitectureScaling on demand: smart auto-scaling for modern AI apps
CPU autoscaling is a lie for GPU workloads. Why queue depth, KV-cache pressure, and TTFT beat CPU as scaling triggers — KEDA-driven patterns, ARIMA forecasting, and composite metrics that scale your AI SaaS before users hit the spinner.
Read post →
-
ArchitectureGPU-aware load balancing: managing AI compute like a pro
Round-robin is a relic when LLM requests span 50 tokens to 50,000. Prefill vs decode disaggregation, KV-cache-aware routing, prefix matching, and the four metrics that matter — how to route AI traffic so your P99 stops bleeding.
Read post →
-
ArchitectureCircuit breakers: preventing cascading failures in your vector DB
A slow vector DB kills SaaS faster than a dead one. The circuit-breaker pattern for AI infrastructure — closed/open/half-open states, fallback tiers, semantic caches, LLM-only mode, and Laravel-friendly wiring to keep production from melting under one bad dependency.
Read post →
-
ArchitectureMessage queues: handling the heavy lifting of document processing
Stop running embeddings inside the request-response cycle. A production-grade document ingestion pipeline — staged workers, exponential backoff, dead-letter quarantines, batched embeddings, and queue-depth autoscaling that keeps your AI app from melting under a 500-page PDF.
Read post →
-
ArchitectureRate limiting: protecting your AI wallet
One runaway agent loop = $5,000 OpenAI bill. Why request-per-second limits lie for LLM apps, how to architect hierarchical token-bucket limits across global / tenant / user layers, and adaptive throttling patterns that protect margins without breaking UX.
Read post →
-
ArchitectureAPI Gateway: the front door of your AI stack
Stop exposing LLM providers directly to the frontend. The gateway pattern for AI apps — JWT-scoped tenant isolation, model aliases, denial-of-wallet rate limiting, streaming-safe timeouts, and the wallet-saving guardrails every senior engineer needs.
Read post →
Tags.
▸ Search by specific tech