
How it works, hidden costs, and practical optimization levers (Batch, PT, caching, Agents, KBs) to lower GenAI spend.
Bedrock
Amazon Bedrock is AWS’s managed layer that gives you a single API to many models (Anthropic, Meta, Mistral, Amazon’s Nova and Titan), plus orchestration features like Guardrails, Knowledge Bases (RAG), Agents/AgentCore, and Flows all inside the AWS security/compliance perimeter and with no training on your prompts by default. For most enterprises, it’s the fastest path to ship GenAI without building and babysitting LLM infra.
Nova family quick note: Amazon’s own models include Nova Micro, Lite, Pro, and Premier, each trading accuracy, latency, and cost. That breadth is handy for cost-aware “route to cheapest that works.”
How Bedrock charges you

On-Demand & Batch. Token-based pricing for text models (input + output); embeddings price on input tokens; images price per image. Batch is available for select models at 50% off on-demand rates, perfect for non-interactive jobs. Cross-region inference is supported for some models with no extra Bedrock fee; pricing uses the source region.
Provisioned Throughput (PT). Buy model units (MUs) hourly with 1- or 6-month commitments for guaranteed throughput (tokens/min). Great when traffic is high and steady, and latency/SLOs matter.
Custom Model Import (on-demand). Bring your own weights; no import fee. You’re billed per active model copy in 5-minute windows, useful if you already trained/fine-tuned elsewhere.
Marketplace models. Some third-party models run on endpoints you size (instances/hours) instead of pure tokens, watch that different meter.
“Platform” meters to remember.
- Guardrails: content filters are $0.15 per 1,000 text units after AWS’s late-2024 price cut.
- Flows: $0.035 per 1,000 node transitions, metered daily, billed monthly (from Feb 1, 2025).
- AgentCore: consumption-based (per-second vCPU & GB-hours + Gateway op fees). Network data transfer for Runtime/Gateway/Code Interpreter/Browser is billed at standard EC2 rates starting Nov 1, 2025.
- Knowledge Bases (RAG): the Bedrock “KB” itself isn’t a separate line item, but your vector store is. OpenSearch Serverless has a real minimum footprint; AWS’s own guidance pegs ~$350/mo for small workloads (four half-OCUs). Plan for that floor.
Snapshot of pricing
- Titan Text Lite (on-demand): $0.0003 / 1K input tokens; $0.0004 / 1K output (example in AWS pricing page).
- Claude 3.5 Sonnet v2 (on-demand): $0.006 / 1K input, $0.03 / 1K output; Batch halves that; Prompt caching shows $0.0075 / 1K cache-write and $0.0006 / 1K cache-read.
TL;DR which mode, when?

Optimization levers that move the needle
1) Right-size models
Default to small/cheap models (Nova Micro/Lite, Titan Lite, or similar) for easy prompts; escalate to larger models (Nova Pro/Premier, Claude Sonnet) only when needed. This “intelligent prompt routing” pattern preserves quality while cutting blended CPM. Re-evaluate weekly; Bedrock’s model breadth makes swapping feasible.
2) Batch anything that can wait
Nightly summarization, bulk embeddings, backfills. Batch is ~50% cheaper and doesn’t block your UX. Keep interactive paths on on-demand.
3) Engineer prompts for cost + turn on prompt caching
Shorten system prompts, cap max_output_tokens, and push repetitive context into tools/KB. Prompt caching can trim ~90%+ of input-token cost on repeated prefixes (depends on model; see cache read/write prices for Claude 3.5 Sonnet v2). Track CacheReadInputTokens/CacheWriteInputTokens to validate.
4) Keep RAG lean (vector cost floor is real)
OpenSearch Serverless vector engine has a minimum monthly OCU footprint; architect for small, relevant indexes, strict filters, and chunking tuned to retrieval — not maximalist ingestion. Consider alternative backends for tiny datasets.
5) Agents: meter the orchestration, not just the LLM
AgentCore charges per-second vCPU/GB plus Gateway ops; logs land in CloudWatch (billed). Consolidate tool calls, throttle steps, and sample telemetry. Also budget for network egress under the Nov 1, 2025 change.
6) Flows: every transition counts
At $0.035 / 1K transitions, over-chatty graphs cost real money. Flatten where possible, and batch sub-steps.
7) Regional discipline
Co-locate Bedrock, S3, KBs, and app to avoid data-transfer surprises; cross-region inference itself isn’t extra, but your data can still incur standard transfer.
8) Governance from day one
Tag by env/team/model/version; allocate costs; set budgets and anomaly alerts. Finout’s Bedrock guides are handy for allocation gotchas and dashboards.
Use cases
- Support copilot (chat + RAG + Guardrails): Route simple intents to a small model; cache long system prompts; index only high-value docs; apply Guardrails for safety ($0.15 / 1K text units).
- Ops digest (batch summarization): Dump tickets/logs to S3 and run Batch every night; escalate a minority of summaries to bigger models.
- Agentic workflows: Keep tools minimal; collapse redundant tool hops; monitor Gateway ops (ListTools/InvokeTool) and plan for CloudWatch + egress.
Quick math founders actually ask for
“Is PT cheaper for us?”
Only if your utilization is consistently high and you need the SLOs. Otherwise, On-Demand + Batch + caching + routing often wins on blended cost.
“Where are the hidden costs?”
- KBs (vector store floor),
- AgentCore (CPU/GB-seconds, Gateway ops, egress from Nov 1, 2025),
- Flows (transitions),
- Guardrails (text units),
- Marketplace endpoints (instance hours).
Is Bedrock the cheapest? The balanced take
Not automatically. But for teams that value security, privacy, and native AWS integration, Bedrock can be operationally the most efficient — and cost-competitive if you use the levers: right-size + route, Batch, prompt caching, lean RAG, and governance. That’s how you get speed without blank-check spending.
Cloudshim AWS FinOps
If you want a handy reference to keep costs sane, We are putting together a lightweight FinOps quick guide at aws.cloudshim.com . Bookmarkable links to pricing/cost/usage pages, gotchas, and a simple Q&A. We are rolling this out across the top AWS services; the Bedrock page is here: https://aws.cloudshim.com/aws-top-services/amazon-bedrock.