Tokens are the new unit of operating cost
When AI is a demo, token cost is a rounding error. When AI is wired into every employee's daily workflow, it becomes one of the largest variable costs in the technology stack. Glean's whitepaper reframes token economics from a billing curiosity into a primary architectural concern: every retrieval pattern, every prompt template, every choice about which model handles which request shows up on the bill — and compounds across millions of interactions per month.
The provocation is that most enterprises are paying for tokens they never had to spend. Bloated context windows, redundant retrieval, oversized models doing undersized work, and ungoverned memory growth quietly inflate the per-interaction cost of every assistant and agent the business runs.
The four levers that actually move the cost curve
Glean groups the optimization surface into four interlocking decisions:
- Context retrieval and routing. Pulling only the documents and snippets that matter for the question — not a maximalist context dump — is the single highest-leverage choice. Better retrieval means smaller prompts, faster responses, and dramatically lower token spend.
- Memory management. Conversation history and user context need to be summarized, tiered, and pruned with intent. Unbounded memory feels generous to the user and is ruinous on the invoice.
- System architecture. How requests are batched, cached, and routed across infrastructure determines whether the same logical task costs ten tokens or ten thousand.
- Model selection. Matching model size to task complexity — small models for classification and routing, large models reserved for genuine reasoning — is where most of the easy savings live.
What enterprises should take away
The strategic message is that token cost is a designed outcome, not an inevitable one. Two companies running the same AI use case can have 5–10x different unit economics depending on how thoughtfully their retrieval, memory, and routing layers are built.
For any operator deploying AI at scale — hospitality included — this changes the buying conversation. The right question is not "what does this assistant cost per seat" but "what does this assistant cost per useful interaction, and how does that cost change as usage grows." Architecture is the answer to both.
Source: Glean — How Enterprise AI Systems Reduce Token Cost at Scale