For Cong Y. · Senior Product Director, AI & Agent · OKX

The infrastructure layer
for OKX's AI and
agent products

Building AI-powered features on a platform with 71M+ users and $108B+ peak daily volume creates infrastructure demands that most AI platforms never face: inference that must respond in milliseconds for trading decisions, agents that must stay stateful across 130+ chains simultaneously, and LLM costs that compound fast at exchange scale. This brief maps Cloudflare's developer platform specifically to the problems a Senior Product Director of AI & Agents at OKX is solving.

OKX AI product surface this covers:

AI Trading Signals Onchain AI Agents Web3 Wallet Risk AI Copy Trading Intelligence Market Analysis LLMs Conversational AI DeFi Smart Routing

What makes AI at OKX different from AI at a typical SaaS company

The infrastructure requirements for AI on a crypto exchange are qualitatively different from AI on a standard web platform. Three constraints that change everything:

LLM cost at 71M-user scale

A feature that runs one LLM call per user session at $0.01/call costs $710K/day at 71M daily actives. Semantic caching, model routing, and intelligent fallback aren't nice-to-haves at OKX's scale — they're the difference between an AI product that's economically viable and one that isn't.

$0 Cost for cached inference — identical prompts served once

Onchain agents require persistent multi-chain state

An AI agent that monitors a user's portfolio across 130+ chains, watches DeFi positions, and executes rebalancing autonomously cannot be stateless. Each agent needs durable session state that persists between interactions — tracking chain state, pending transactions, risk parameters, and user intent — without round-trips to a central database on every action.

130+ Chains OKX Wallet supports — each needs agent state

Cloudflare's AI stack → OKX AI & Agents

Four Cloudflare products that directly address the constraints above — each built for the specific demands of AI at exchange scale.

Onchain AI Agents

Agents SDK + Durable Objects + Workflows

docs ↗ High Priority

OKX's "Onchain OS" and Web3 wallet position AI agents as a core product direction — agents that autonomously monitor portfolios, execute DeFi strategies, bridge assets across 130+ chains, and respond to market conditions without user intervention. Building agents that are reliable at OKX's user scale requires infrastructure that handles the hard parts: persistent state, tool execution, memory across sessions, and graceful failure recovery.

Cloudflare's Agents SDK is purpose-built for this. It handles the orchestration layer — LLM calls, tool invocations, state management, and human-in-the-loop checkpoints — as a managed runtime that runs globally on Cloudflare's network. One agent per user session, co-located with the user.

Onchain agent architecture on Cloudflare
OKX User
CF Agent
Agents SDK · Durable Object
Onchain tools
OKX APIs
DeFi protocols
  • Durable Objects for per-user agent state — each OKX user's AI agent is a Durable Object. It holds the agent's memory: portfolio context, active DeFi positions, pending transactions across chains, risk parameters, and conversation history. When the user returns the next day, the agent remembers everything — no stateless cold start, no round-trip to a central session store.
  • Human-in-the-loop checkpoints — for high-value actions (large swaps, cross-chain bridges, DeFi strategy execution), agents can pause and request user confirmation via the OKX app before proceeding. Cloudflare's Agents SDK has native support for durable human-in-the-loop pauses — the agent waits indefinitely for confirmation without consuming resources.
  • Workflows for multi-step onchain execution — a DeFi rebalancing operation might span: check prices → calculate optimal routes → execute swap on chain A → bridge to chain B → deposit into yield protocol. Cloudflare Workflows makes each step durable and retryable — if the bridge step fails, it retries from the bridge, not from the beginning.
  • Browser Run for Web3 agent interactions — agents that need to interact with DApps that don't have APIs (new protocol frontends, NFT platforms, governance portals) can use Browser Run to operate headless browsers at the edge — no browser infrastructure to manage, sessions persisted via Durable Objects for continuity.
OKX AI fit: OKX's "Onchain OS" vision — AI agents that autonomously manage crypto portfolios across 130+ chains — requires exactly this stack. One Durable Object per user agent, globally co-located, with Workflows ensuring that multi-step onchain operations complete reliably even when individual chain RPC calls fail.

Edge AI Inference

Workers AI + Vectorize

docs ↗ High Priority

Workers AI runs 50+ open-source ML models on serverless GPUs across Cloudflare's global network — co-located with OKX's users. For AI features where latency is critical and frontier model quality isn't required, Workers AI delivers sub-50ms inference from a PoP in Singapore, Tokyo, or Dubai — not from a centralized US cloud endpoint.

🪙
Real-time token/contract risk scoring

When a user in OKX's Web3 wallet is about to interact with an unfamiliar contract, Workers AI can run a risk classification model locally at the edge PoP — checking against known scam patterns, rug-pull signatures, and anomalous token behavior — in under 50ms before the transaction is signed.

📈
On-device market sentiment

Small, fine-tuned sentiment models (Llama 3.1 8B, Mistral) running on Workers AI can classify news headlines, social signals, and on-chain metrics into trading sentiment in real time — co-located with the user, at a fraction of the cost of routing every classification to GPT-4o.

🔍
Vectorize — semantic portfolio search

OKX users hold complex portfolios across 130+ chains. Vectorize enables semantic search across a user's portfolio history, transaction records, and DeFi positions — "show me all my yield positions that are below threshold" — without rebuilding a search index per user or routing through a centralized database.

🌐
74-language NLP at the edge

OKX serves 180+ regions. NLP features — trading intent classification, document analysis, customer support AI — need to work in the user's native language with minimal latency. Workers AI's multilingual models run at the edge PoP nearest to the user, not at a US-East API endpoint.

OKX AI fit: The "proactive protection" feature in OKX's Web3 wallet — flagging malicious domains and smart contracts — is exactly the Workers AI use case. Risk classification runs at the edge in the user's region, in under 50ms, without a round-trip to a central risk scoring service. At OKX's transaction volume, the latency difference between edge inference and centralized inference compounds into a measurable product quality gap.

AI Training & Inference Data

R2 Object Storage

docs ↗ High Priority

OKX's AI models need training data — historical market data, transaction records, onchain signals, model checkpoints, and fine-tuning datasets. At exchange scale, this is petabyte-class data that the AI/ML pipeline reads repeatedly during training runs, model evaluation, and feature engineering. S3 egress charges on this data compound into a significant monthly cost for any team running frequent training iterations.

  • $0 egress on training data reads — AI training pipelines read the same historical market data hundreds of times across experiments. R2 eliminates egress charges entirely — reading a 1TB market data snapshot 50 times during hyperparameter tuning is free instead of $50K+ in S3 bandwidth.
  • Model artifact storage — fine-tuned model weights, ONNX exports, and inference-optimized model files stored in R2 and served directly to Workers AI for edge deployment — with zero egress when models are loaded for inference.
  • Prompt and completion logging for AI Gateway — AI Gateway can log all LLM requests and responses to R2 for model evaluation, fine-tuning dataset generation, and compliance. This is the training flywheel: production inference data feeds back into model improvement without a separate data pipeline.
OKX AI fit: An AI product team running frequent model iterations against OKX's proprietary market data will spend significant engineering time managing training data infrastructure. R2 eliminates the egress cost that makes data-intensive ML experimentation expensive and enables a direct pipeline from AI Gateway logs to R2 to fine-tuning.

Cloudflare's AI stack for OKX

OKX AI Product NeedCloudflare ProductSpecific ValuePriority
LLM cost, reliability, observability across all AI featuresAI GatewaySemantic caching eliminates redundant inference; model fallback for 24/7 availability; per-feature cost dashboardsHighest
Stateful onchain AI agents (portfolio mgmt, DeFi, multi-chain)Agents SDK Durable Objects WorkflowsPer-user persistent agent state, durable multi-step onchain execution, human-in-the-loop checkpointsHigh
Edge inference — token risk, sentiment, NLP at low latencyWorkers AI Vectorize<50ms inference globally; open-source models at edge; semantic portfolio searchHigh
AI training data, model storage, inference loggingR2$0 egress on training data reads, model artifacts, AI Gateway prompt logging to R2High
Web3 agent browser interactions (DApps without APIs)Browser RunHeadless Chrome at edge for agent DApp interactions; session persistence via Durable ObjectsConsider
For Cong Y. · Senior Product Director, AI & Agent · OKX

Start with AI Gateway — immediate cost impact

AI Gateway is a one-line integration — point your existing LLM base URLs at Cloudflare's gateway instead of directly at the provider. From that moment: every LLM call is logged with cost and latency, semantically similar prompts start hitting cache, and your AI features have automatic fallback if a provider degrades at 3 AM. No code changes beyond the base URL.

For a product director running AI features at OKX's scale, the semantic caching economics are the most immediate lever. Happy to do a 30-minute walkthrough showing exactly how to configure caching policies for OKX's specific AI feature types — trading signals, market analysis, and risk scoring all have different caching characteristics worth understanding before rollout.