Back

AWS Bedrock Pricing: Hidden Costs, Real Savings

How it works, hidden costs, and practical optimization levers (Batch, PT, caching, Agents, KBs) to lower GenAI spend.


Bedrock


Amazon Bedrock is AWS’s managed layer that gives you a single API to many models (Anthropic, Meta, Mistral, Amazon’s Nova and Titan), plus orchestration features like Guardrails, Knowledge Bases (RAG), Agents/AgentCore, and Flows all inside the AWS security/compliance perimeter and with no training on your prompts by default. For most enterprises, it’s the fastest path to ship GenAI without building and babysitting LLM infra.


Nova family quick note: Amazon’s own models include Nova Micro, Lite, Pro, and Premier, each trading accuracy, latency, and cost. That breadth is handy for cost-aware “route to cheapest that works.”


How Bedrock charges you


On-Demand & Batch. Token-based pricing for text models (input + output); embeddings price on input tokens; images price per image. Batch is available for select models at 50% off on-demand rates, perfect for non-interactive jobs. Cross-region inference is supported for some models with no extra Bedrock fee; pricing uses the source region.


Provisioned Throughput (PT). Buy model units (MUs) hourly with 1- or 6-month commitments for guaranteed throughput (tokens/min). Great when traffic is high and steady, and latency/SLOs matter.


Custom Model Import (on-demand). Bring your own weights; no import fee. You’re billed per active model copy in 5-minute windows, useful if you already trained/fine-tuned elsewhere.


Marketplace models. Some third-party models run on endpoints you size (instances/hours) instead of pure tokens, watch that different meter.


“Platform” meters to remember.


  • Guardrails: content filters are $0.15 per 1,000 text units after AWS’s late-2024 price cut.
  • Flows: $0.035 per 1,000 node transitions, metered daily, billed monthly (from Feb 1, 2025).
  • AgentCore: consumption-based (per-second vCPU & GB-hours + Gateway op fees). Network data transfer for Runtime/Gateway/Code Interpreter/Browser is billed at standard EC2 rates starting Nov 1, 2025.
  • Knowledge Bases (RAG): the Bedrock “KB” itself isn’t a separate line item, but your vector store is. OpenSearch Serverless has a real minimum footprint; AWS’s own guidance pegs ~$350/mo for small workloads (four half-OCUs). Plan for that floor.


Snapshot of pricing


  • Titan Text Lite (on-demand): $0.0003 / 1K input tokens; $0.0004 / 1K output (example in AWS pricing page).
  • Claude 3.5 Sonnet v2 (on-demand): $0.006 / 1K input, $0.03 / 1K output; Batch halves that; Prompt caching shows $0.0075 / 1K cache-write and $0.0006 / 1K cache-read.


TL;DR which mode, when?


Optimization levers that move the needle


1) Right-size models

Default to small/cheap models (Nova Micro/Lite, Titan Lite, or similar) for easy prompts; escalate to larger models (Nova Pro/Premier, Claude Sonnet) only when needed. This “intelligent prompt routing” pattern preserves quality while cutting blended CPM. Re-evaluate weekly; Bedrock’s model breadth makes swapping feasible.


2) Batch anything that can wait

Nightly summarization, bulk embeddings, backfills. Batch is ~50% cheaper and doesn’t block your UX. Keep interactive paths on on-demand.


3) Engineer prompts for cost + turn on prompt caching

Shorten system prompts, cap max_output_tokens, and push repetitive context into tools/KB. Prompt caching can trim ~90%+ of input-token cost on repeated prefixes (depends on model; see cache read/write prices for Claude 3.5 Sonnet v2). Track CacheReadInputTokens/CacheWriteInputTokens to validate.


4) Keep RAG lean (vector cost floor is real)

OpenSearch Serverless vector engine has a minimum monthly OCU footprint; architect for small, relevant indexes, strict filters, and chunking tuned to retrieval — not maximalist ingestion. Consider alternative backends for tiny datasets.


5) Agents: meter the orchestration, not just the LLM

AgentCore charges per-second vCPU/GB plus Gateway ops; logs land in CloudWatch (billed). Consolidate tool calls, throttle steps, and sample telemetry. Also budget for network egress under the Nov 1, 2025 change.


6) Flows: every transition counts

At $0.035 / 1K transitions, over-chatty graphs cost real money. Flatten where possible, and batch sub-steps.


7) Regional discipline

Co-locate Bedrock, S3, KBs, and app to avoid data-transfer surprises; cross-region inference itself isn’t extra, but your data can still incur standard transfer.


8) Governance from day one

Tag by env/team/model/version; allocate costs; set budgets and anomaly alerts. Finout’s Bedrock guides are handy for allocation gotchas and dashboards.


Use cases

  • Support copilot (chat + RAG + Guardrails): Route simple intents to a small model; cache long system prompts; index only high-value docs; apply Guardrails for safety ($0.15 / 1K text units).
  • Ops digest (batch summarization): Dump tickets/logs to S3 and run Batch every night; escalate a minority of summaries to bigger models.
  • Agentic workflows: Keep tools minimal; collapse redundant tool hops; monitor Gateway ops (ListTools/InvokeTool) and plan for CloudWatch + egress.

Quick math founders actually ask for


“Is PT cheaper for us?”

Only if your utilization is consistently high and you need the SLOs. Otherwise, On-Demand + Batch + caching + routing often wins on blended cost.


“Where are the hidden costs?”

  • KBs (vector store floor),
  • AgentCore (CPU/GB-seconds, Gateway ops, egress from Nov 1, 2025),
  • Flows (transitions),
  • Guardrails (text units),
  • Marketplace endpoints (instance hours).

Is Bedrock the cheapest? The balanced take


Not automatically. But for teams that value security, privacy, and native AWS integration, Bedrock can be operationally the most efficient — and cost-competitive if you use the levers: right-size + route, Batch, prompt caching, lean RAG, and governance. That’s how you get speed without blank-check spending.


Cloudshim AWS FinOps


If you want a handy reference to keep costs sane, We are putting together a lightweight FinOps quick guide at aws.cloudshim.com . Bookmarkable links to pricing/cost/usage pages, gotchas, and a simple Q&A. We are rolling this out across the top AWS services; the Bedrock page is here: https://aws.cloudshim.com/aws-top-services/amazon-bedrock.

Share

This may also interest you

A simple serverless app with HTTP API Gateway, Lambda and S3

A simple serverless app with HTTP API Gateway, Lambda and S3

When coming up with architectures for an application, wha…

AWS Cost & Usage Report (CUR) as a service (CURAAS?)

AWS Cost & Usage Report (CUR) as a service (CURAAS?)

For those of you who've ever tried to decode how AWS bi…

Making the most of AWS EC2 Savings Plan

Making the most of AWS EC2 Savings Plan

AWS introduced Savings plan (SP) a year ago, for customers…

GPT Pricing Breakdown: OpenAI vs Azure vs AWS vs GCP

GPT Pricing Breakdown: OpenAI vs Azure vs AWS vs GCP

The era of picking an AI model is no longer just about raw …

Cost Impact of the Great Cloud Wars

Cost Impact of the Great Cloud Wars

With the break-through of cloud computing, major cloud pr…

How managing EC2 usage cut this startups AWS Bill by 60%

How managing EC2 usage cut this startups AWS Bill by 60%

Challenge Prasad Purandare is building an AI startup for im…

AWS Bedrock AgentCore and the Future of Serverless AI Agents

AWS Bedrock AgentCore and the Future of Serverless AI Agents

AWS quietly dropped something powerful recently —  AgentCor…

Cutting Through AWS Networking Bills: From NAT to Direct Connect

Cutting Through AWS Networking Bills: From NAT to Direct Connect

AWS bills are sneaky. You log in, see EC2, S3, Lambda costs…

🚀 AI-First Clouds vs AWS: Rethinking the Future of AI App Infrastructure

🚀 AI-First Clouds vs AWS: Rethinking the Future of AI App Infrastructure

Why “AI clouds” are exploding, where they beat AWS, what …