#mobile-menu-toggle { display: none !important; }

Hire me Resume (PDF)

▸ Tag · #rate-limiting

Posts tagged #rate-limiting.

2 posts with this tag.

← Back to all posts

Architecture May 21, 2026

Rate limiting: protecting your AI wallet

One runaway agent loop = $5,000 OpenAI bill. Why request-per-second limits lie for LLM apps, how to architect hierarchical token-bucket limits across global / tenant / user layers, and adaptive throttling patterns that protect margins without breaking UX.

Read post →
Architecture May 20, 2026

API Gateway: the front door of your AI stack

Stop exposing LLM providers directly to the frontend. The gateway pattern for AI apps — JWT-scoped tenant isolation, model aliases, denial-of-wallet rate limiting, streaming-safe timeouts, and the wallet-saving guardrails every senior engineer needs.

Read post →