System Design
The architecture is centered around a decoupled gateway-platform model. LiteLLM acts as the API gateway handling all model requests. The platform, built as a Next.js fullstack application, manages users, balances, API keys, and orchestration. Communication between gateway and platform uses API calls for real-time interactions and webhooks for asynchronous usage reporting. Selective caching is applied to frequently accessed gateway data. A daily settlement job reconciles failed or missed webhook events. A parallel processing pipeline handles large-scale log reconciliation. PostgreSQL via Supabase serves as the primary data store, with a custom PostgreSQL-based queue system managing background processing on Vercel.