
There’s a number floating around that should make every CFO pay attention. Visa — a payments company, not an AI lab — is now consuming 1.9 trillion AI tokens per month. That’s double what they were using just one month earlier. Eighty-nine per cent of their employees are active AI users, and nearly half qualify as power users, averaging 25 prompts a day.
Now zoom out. If a payments company is scaling AI this aggressively, what’s happening across financial services, fintech, and the thousands of companies building on open banking infrastructure? The answer: AI spend is exploding — and most organisations have no framework for controlling it. They have cloud budgets. They have procurement processes for SaaS tools. But AI consumption doesn’t fit neatly into either category. It’s usage-based, unpredictable, and growing faster than anyone forecasted.
This article is for the people who are staring at a cloud invoice that’s 40% higher than last quarter and trying to figure out what happened — and more importantly, what to do about it. Not the theoretical “AI governance framework” stuff. The practical stuff. What actually works.
The Problem: AI Costs Don’t Behave Like Other Software Costs
When you buy a SaaS product, you know what it costs. Ten seats at $50 per month. Done. Predictable. AI doesn’t work this way. API-based AI services charge per token — per input and per output. The cost of a single API call depends on which model you use, how long the prompt is, and how long the response is. A simple classification task might cost fractions of a cent. A complex document analysis task using a frontier model could cost several dollars per request.
This means your AI bill is a function of adoption — and adoption is the one thing your organisation is actively trying to increase. Every time a team builds a new automation, connects a new workflow, or rolls out an AI feature to customers, consumption goes up. The more successful your AI strategy is, the higher the bill gets. That’s the paradox finance teams are wrestling with right now.
The AI Spend Reality Check
Visa: 1.9 trillion tokens/month. Meta: 60 trillion tokens/month internally. A single developer at AI coding startup Cognition crossed 1 trillion tokens in cumulative usage. These are early indicators of where enterprise AI consumption is heading. If your company is using AI seriously, your cloud and API bills are going to grow — the question is whether that growth is managed or chaotic.
Seven Practical Ways to Reduce Your AI Spend
These aren’t hypothetical. These are the levers that actually move the number.
1. Use the Right Model for the Right Task
This is the single biggest cost lever most companies ignore. Not every task requires a frontier model. Classifying a support ticket? A smaller, cheaper model handles that just fine. Summarising a short email? Same. Generating a complex financial analysis from raw data? That’s where you want the expensive model. The mistake most teams make is defaulting everything to the most powerful (and most expensive) model because it was the first one they integrated. Build a routing layer that sends simple tasks to cheap models and complex tasks to premium models. The cost difference can be 10–50x per request.
2. Cache Everything You Can
If the same prompt produces the same output, you’re paying twice for the same answer. This is surprisingly common in production AI systems — especially in customer support, document processing, and data enrichment pipelines where the same types of queries repeat frequently. Implement a semantic caching layer that stores responses and serves them again when a sufficiently similar prompt arrives. Some teams report 20–40% cost reductions from caching alone.
3. Optimise Your Prompts
Longer prompts cost more — both on the input side and typically on the output side too, since verbose prompts tend to produce verbose responses. Review your production prompts and strip out unnecessary context, redundant instructions, and padding. A well-engineered prompt that’s 40% shorter produces the same quality output at 40% lower cost. This sounds trivial but across millions of API calls per month, the savings compound fast.
4. Use Batch Processing Where Latency Allows
Most AI providers offer batch APIs at significantly reduced rates — sometimes 50% cheaper than real-time inference. If your workload doesn’t need an instant response (think: overnight document processing, daily report generation, weekly data classification), batch it. The results are identical; you just wait a few hours instead of a few seconds. For back-office financial operations, this is often a trivial change that cuts the bill in half for those specific workloads.
5. Set Hard Spend Limits and Alerts
This sounds obvious but a shocking number of organisations don’t do it. Every major AI and cloud provider allows you to set monthly spend caps and usage alerts. Set them. Set them per team, per project, and per environment. A runaway loop in a development environment or an unanticipated spike in production traffic can burn through thousands of dollars in hours. The alert doesn’t reduce your spend — but it stops a manageable bill from becoming an emergency.
6. Audit Your Unused Credits and Commitments
Here’s something that doesn’t get talked about enough: many companies are sitting on cloud and AI credits they’re never going to use. Startup grants from Google Cloud, Azure, or AWS. Enterprise commitments that were over-provisioned. Promotional credits from provider partnerships. These credits expire — and when they do, the value is gone. If your organisation has unused Google Cloud capacity, you can sell Google Cloud credits through brokers who connect you with buyers looking for discounted capacity. It’s a straightforward way to recover cash from credits you’ll never consume instead of watching them expire on your balance sheet.
7. Negotiate — Or Find Alternative Supply
If your monthly AI spend exceeds $5,000 with any single provider, you almost certainly have room to negotiate. Contact your account representative and ask about volume pricing, committed use discounts, or custom rate cards. Most providers have these programmes but won’t proactively offer them unless you ask. And if your current provider won’t negotiate, explore alternatives — the model landscape is increasingly competitive and switching costs are lower than most teams assume.
Where the Money Actually Goes: AI Spend by Category
Most organisations don’t have clear visibility into what’s driving their AI costs. Here’s a typical breakdown for a mid-sized company running production AI workloads:
| Cost Category | Typical Share | Primary Cost Driver |
|---|---|---|
| API Inference (Production) | 35–45% | Real-time model calls in customer-facing features |
| Cloud Compute (Training & Fine-tuning) | 15–25% | GPU hours for model customisation and experimentation |
| Data Processing & Embeddings | 10–15% | Vector databases, RAG pipelines, search indexing |
| Internal Tools & Copilots | 10–20% | Employee-facing AI assistants, code generation, analysis |
| Development & Testing | 5–15% | Non-production model calls during development |
The insight here is that production inference is usually the largest line item — but internal tools and development environments are often the fastest-growing categories, because they scale with employee adoption rather than customer demand. Visa’s numbers bear this out: their 44% power-user rate means almost half the company is generating significant token consumption every working day.
The Uncomfortable Truth: AI Spend Is a Feature, Not a Bug
Let’s be honest about something. If your AI spend is going up, it probably means your organisation is doing the right thing. The companies that will struggle in two years aren’t the ones spending too much on AI — they’re the ones spending too little. Visa isn’t tracking token consumption because they want to cut it. They’re tracking it because they want to reward the teams that are using it most effectively.
The goal isn’t to minimise AI costs in absolute terms. The goal is to maximise the value you get per dollar spent. That means eliminating waste (duplicate calls, oversized models for simple tasks, uncached repeated queries) while continuing to invest in the AI capabilities that actually move your business forward.
For finance and operations leaders in financial services and fintech, this is a familiar challenge dressed in new clothing. You’ve been optimising infrastructure spend for years — cloud migration, API gateway costs, payment processing fees. AI is just the newest line item. The tools are different but the discipline is the same: measure it, allocate it, and relentlessly cut the waste while protecting the investments that generate returns.
The Bottom Line
Your AI bill is going to keep growing. The question is whether you’re paying full price for everything or being smart about it. Route tasks to the right models, cache repeated queries, batch what you can, and don’t let cloud credits expire unused. The companies that treat AI spend as a strategic cost centre — not an uncontrolled line item — will have a structural advantage over those that don’t.
Frequently Asked Questions: AI Spend Management
AI costs are usage-based, not subscription-based. Every API call, every model inference, every embedding query costs money. As more teams adopt AI tools and more workflows get automated, consumption grows proportionally. Unlike a SaaS seat licence, there’s no natural ceiling — which is why active cost management is essential from the start.
Route simple tasks to smaller, cheaper models instead of sending everything to the most expensive frontier model. This single change typically reduces costs by 30–60% with no measurable impact on output quality for routine tasks like classification, summarisation, and data extraction.
Costs vary enormously by provider and model tier. Small, fast models can cost $0.10–0.25 per million input tokens. Large frontier models can cost $10–75 per million input tokens. That’s a 100x cost difference — which is why model selection is the most important cost lever available to engineering teams.
Semantic caching stores the responses to AI queries and serves them again when a sufficiently similar prompt is received, avoiding a duplicate API call. Unlike exact-match caching, semantic caching uses embeddings to identify prompts that are similar in meaning even if the wording differs. Teams running high-volume, repetitive AI workloads typically see 20–40% cost reductions from caching.
Yes, wherever latency permits. Batch APIs are typically 50% cheaper than real-time inference. Any workload that doesn’t need an instant response — overnight processing, daily reports, weekly data classification, bulk document analysis — should be batched. The output quality is identical; you simply wait hours instead of seconds.
Check your billing dashboard on each cloud and AI provider you use — Azure, AWS, Google Cloud, OpenAI, Anthropic. Look for prepaid credit balances, startup grants, promotional credits, or committed use allocations that are under-utilised. Many companies have credits they’ve forgotten about or that were provisioned by a previous team. These credits expire if unused.
Yes. There are brokers that connect sellers with buyers who want discounted cloud and AI credits. The process is confidential and the credits are transferred through legitimate mechanisms. If your organisation has credits that will expire before you can use them, selling them recovers real cash value instead of letting the credits disappear.
There’s no universal benchmark. It depends on your industry, the maturity of your AI adoption, and the specific workloads you’re running. What matters more than the absolute number is the ratio of AI spend to value generated. Track the cost per automated task, cost per model inference, and cost per business outcome — then optimise those ratios rather than targeting an arbitrary budget number.
Absolutely. If you’re spending more than $5,000 per month with any single provider, you likely qualify for volume discounts, committed use pricing, or custom rate cards. Most providers have these programmes but won’t offer them unless you ask. The competitive landscape between providers also gives you leverage — if one won’t negotiate, another probably will.
At minimum: total AI spend by provider, spend by team or department, cost per API call by model, token consumption trends over time, and the ratio of production vs development/testing spend. Set monthly budgets, configure spend alerts, and review costs weekly during periods of rapid adoption. Treat it with the same rigour you apply to cloud infrastructure or payment processing costs.
