#mobile-menu-toggle { display: none !important; }

Hire me Resume (PDF)

▸ Tag · #redis

Posts tagged #redis.

3 posts with this tag.

← Back to all posts

Architecture May 26, 2026

Caching for speed: Redis and semantic layers in RAG

Stop paying for the same LLM call twice. Two-tier caching — exact-match Redis keys plus semantic vector lookups via RedisVL — that cuts RAG latency from seconds to milliseconds and slashes API spend by up to 80%. With tenant isolation, TTL tiers, and the precision metrics that keep it honest.

Read post →
Architecture May 22, 2026

Message queues: handling the heavy lifting of document processing

Stop running embeddings inside the request-response cycle. A production-grade document ingestion pipeline — staged workers, exponential backoff, dead-letter quarantines, batched embeddings, and queue-depth autoscaling that keeps your AI app from melting under a 500-page PDF.

Read post →
Architecture May 21, 2026

Rate limiting: protecting your AI wallet

One runaway agent loop = $5,000 OpenAI bill. Why request-per-second limits lie for LLM apps, how to architect hierarchical token-bucket limits across global / tenant / user layers, and adaptive throttling patterns that protect margins without breaking UX.

Read post →