Best LangSmith Alternatives 2026: 7 LLM Observability Tools Compared
If you've outgrown LangSmith — or you're evaluating it and the LangChain lock-in scares you — you're not alone. LangSmith does excellent work for LangChain-native apps, but its pricing model, framework coupling, and enterprise gate frustrate teams running multi-framework agents in production. Below: seven serious alternatives we've tested, plus an honest take on when LangSmith is still the right call.
TL;DR — quick comparison
| Tool | Best for | Open source | Self-host | Free tier |
|---|---|---|---|---|
| ClawPulse | Real-time agent monitoring + cost analytics | No | No | Yes (14-day trial) |
| Langfuse | OSS observability with traces & evals | Yes (MIT) | Yes | Yes |
| Helicone | Drop-in LLM proxy with caching | Yes | Yes | Yes (10k req) |
| Braintrust | Eval-first workflows | No | Enterprise only | Yes (limited) |
| Phoenix (Arize) | OSS tracing with strong eval suite | Yes | Yes | Yes |
| Weights & Biases (Weave) | ML+LLM unified workspaces | Partial | Enterprise | Yes |
| Datadog LLM Observability | Existing Datadog shops | No | No | No (paid only) |
Why people leave LangSmith
LangSmith was built inside LangChain and it shows. Three frictions come up repeatedly in our user calls:
1. Framework gravity. The first-class experience assumes you're using LangChain or LangGraph. If you're calling the Anthropic SDK or OpenAI directly, you can wire it up via OpenTelemetry, but you'll feel like a second-class citizen.
2. Pricing opacity at scale. The free tier covers 5,000 traces/month. Beyond that you're on Plus ($39/seat) or Enterprise ("contact us"). Teams running thousands of agent runs per day hit the wall fast.
3. Self-host is an enterprise gate. If you need on-prem for compliance (HIPAA, SOC 2 customer audits, EU data residency), you're funneled into a sales process. No community self-host option.
None of those are reasons to avoid LangSmith — they're reasons it might not fit your shape. Here's what fits the other shapes.
1. ClawPulse — real-time monitoring for agents in production
Best for: teams running OpenClaw, Anthropic, or OpenAI agents in production who care more about uptime, cost, and incident response than offline evaluation.
ClawPulse takes the operations side of agent observability seriously. Where LangSmith is built around the dev-loop (prompt iteration, eval datasets, A/B testing), ClawPulse is built around the on-call loop: real-time dashboards, smart alerts, cost spikes, error tracking, fleet management.
What it does well:
- Real-time fleet view. Watch every agent instance in one dashboard — CPU, RAM, request rate, error rate, p95 latency, active sessions, token throughput, last error message.
- Cost analytics. Per-agent and per-model spend with anomaly alerts when token usage spikes 3× baseline.
- Smart alerts. Multi-channel routing (Slack, email, webhook) on error rate, latency p95, token budget breach, agent down, custom metric thresholds.
- Drop-in install. One-line bash agent install — no SDK rewrite, no code changes.
- Framework-agnostic. Works whether you're on LangChain, LlamaIndex, raw SDK calls, or custom orchestration.
What it doesn't do (yet): offline eval datasets, prompt experimentation tooling, regression testing — that's a different problem (see ClawPulse vs Braintrust for the framing).
Pricing: Starter $19/mo (5 instances), Growth $49/mo (20 instances), Agency $99/mo (unlimited). 14-day free trial, no card.
Pick this if: you're running agents in production and your monitoring story is currently "nothing" or "Datadog dashboards I built myself." See it live in our demo.
2. Langfuse — open-source observability with evals
Best for: teams who want LangSmith-style trace exploration but with self-host as a first-class option.
Langfuse is the closest 1:1 OSS alternative to LangSmith. MIT-licensed, self-hostable via Docker Compose, with a managed cloud tier on top.
Strengths:
- Tracing, prompts, datasets, and evals in one tool.
- Strong SDK ecosystem (Python, JS, Java, Go).
- Native OpenTelemetry support.
- Real community traction — generous free tier and an active Discord.
- Self-host without a sales call.
Limits:
- Real-time monitoring is shallow. Langfuse excels at "explore this trace" not "page someone when error rate spikes."
- Cost analytics exist but are basic vs. dedicated tools.
- Self-hosting is OSS-as-a-product — you're on your own for upgrades, scaling, and HA.
When to pick Langfuse: when you're framework-shopping out of LangSmith and want OSS+self-host without changing your mental model. See our deeper Langfuse alternatives writeup for adjacent options.
3. Helicone — the LLM proxy approach
Best for: teams who want zero-instrumentation observability and aggressive cost cutting via caching.
Helicone takes a fundamentally different architecture: it's a proxy, not an observer. You change your `base_url` from `api.openai.com` to `oai.helicone.ai`, and Helicone now sees every request as middleware.
Why this matters:
- Zero SDK changes. Just swap a URL and you're observed.
- Caching for free. Helicone caches identical requests — real cost savings on repeated prompts.
- Rate-limiting and retries built into the proxy.
Why this is also a problem:
- Adds a network hop. Every LLM call now traverses Helicone's infrastructure. P95 latency increases, and Helicone availability becomes part of your dependency graph.
- Vendor sees prompts. All your traffic goes through their infra (yes, they encrypt; that's not the same as not having access).
- Limited to LLM HTTP traffic. Doesn't observe agent code, tool calls, or non-LLM logic.
Pick Helicone if: you want fast wins (caching saves real money) and you're OK with the proxy tradeoff. See Helicone alternatives if the tradeoff isn't OK.
4. Braintrust — eval-first workflows
Best for: teams whose primary problem is "is this prompt change actually better?" not "is my agent down right now?"
Braintrust goes deepest in offline evaluation: dataset curation, scorers (LLM-as-judge or heuristic), CI integration, eval comparison views, and human review workflows. If you're running 50-prompt sweeps every release, this is its tool.
Strengths:
- Best-in-class eval UI.
- Strong CI/CD integration (CLI, GitHub Actions).
- Side-by-side prompt comparison.
- Scorers compose well (heuristic + LLM-judge + human).
Limits:
- Production monitoring is secondary. Real-time alerting and incident response are not the focus.
- Not open source. Self-host is enterprise-only.
- Pricing is per-seat + usage; can get spendy.
Picking Braintrust: when your team's bottleneck is "we ship prompt changes and discover they regressed in prod two days later." See our Braintrust comparison for the monitoring-vs-evals framing.
5. Phoenix (Arize) — OSS tracing with serious eval
Best for: teams who want OSS like Langfuse but with deeper ML/evals heritage.
Arize Phoenix is a notebook-friendly, OSS LLM tracing and eval tool from the team behind Arize AI's enterprise ML observability platform. It bridges the ML-ops world (drift, embeddings, vectors) with the LLM-ops world (traces, prompts, evals).
Strengths:
- Apache 2.0 OSS, self-host via Docker.
- Strong embedding-drift and RAG-quality tooling — useful if your agent uses retrieval.
- OpenTelemetry-native.
- Good notebook DX for iterating on traces.
Limits:
- Real-time fleet monitoring is not its game.
- The full enterprise feature set lives in Arize AX (paid).
- Smaller community than Langfuse; less velocity on integrations.
When to pick Phoenix: when your agents lean RAG-heavy and you want eval + drift + tracing in one OSS tool.
6. Weights & Biases Weave — ML+LLM unified
Best for: teams already using W&B for classical ML who want one workspace.
Weave is W&B's LLM observability layer. Tracing, evals, datasets — but inside the broader W&B ecosystem (experiments, models, artifacts).
Strengths:
- Tight integration with W&B Experiments — useful if you're running fine-tuning alongside agents.
- Mature platform; big-team-friendly RBAC.
- Strong Python SDK.
Limits:
- Pricing is enterprise-anchored — small teams find it overkill.
- LLM observability features still maturing relative to LangSmith / Langfuse.
- Real-time alerting is shallow.
Pick W&B Weave if: your org already lives in W&B and a unified pane outweighs feature depth.
7. Datadog LLM Observability — existing Datadog shops
Best for: companies already on Datadog where adding "yet another tool" is politically expensive.
Datadog now offers LLM Observability — traces, prompts, latency/cost tracking — all inside the Datadog UI you already pay for.
Strengths:
- Single pane of glass with your infra metrics, APM, logs.
- Existing alerting, dashboards, on-call workflows.
- No new vendor procurement.
Limits:
- Expensive at agent scale. Datadog's per-host and per-event pricing wasn't designed for high-volume LLM traces.
- LLM-specific features lag specialized tools (evals, prompt management).
- Vendor lock-in deepens.
Pick Datadog LLM Obs if: you're already a Datadog shop and the path of least resistance wins.
How we'd actually choose
Two questions cut through it.
Q1: What's your primary problem — dev-loop or on-call-loop?
- Dev-loop (prompt iteration, eval, regression): LangSmith → Braintrust → Phoenix.
- On-call-loop (uptime, cost, alerts, fleet): ClawPulse → Datadog → Helicone (proxy).
Q2: Self-host required, or cloud OK?
- Self-host required: Langfuse, Phoenix, Helicone (OSS edition).
- Cloud OK: anything on the list.
If both answers are "on-call-loop, cloud OK" — you should be looking at ClawPulse. If both are "dev-loop, self-host" — Phoenix or Langfuse.
A note on the "framework gravity" problem
LangSmith is excellent if you're committed to LangChain. The integrations are first-class, the docs assume LangChain primitives, and the team ships LangSmith features in lockstep with LangChain releases.
The trouble is that production agents in 2026 are increasingly not pure LangChain. Teams mix:
- LangChain for orchestration.
- The Anthropic SDK directly for high-stakes calls.
- A custom Python class for the agent loop.
- LlamaIndex for RAG.
When your stack is multi-framework, framework-coupled observability becomes a tax. Every direct-SDK call is a "second-class citizen" trace. That's the architectural reason most ClawPulse migrations from LangSmith happen — not pricing, not features, but the mismatch between what we run and what LangSmith was built for.
FAQ
Is LangSmith open source?
No. LangSmith is closed-source SaaS. The LangChain framework is OSS (MIT) but the observability platform is not. For OSS alternatives, see Langfuse and Phoenix above.
Can I self-host LangSmith?
Only on the Enterprise plan. There is no community self-host option. If self-host without a sales call matters, look at Langfuse, Phoenix, or Helicone.
What's the cheapest LangSmith alternative?
ClawPulse Starter ($19/mo) or Helicone's free tier (10k requests). For OSS-self-host (your hardware = the only cost), Langfuse and Phoenix.
Does LangSmith work without LangChain?
Yes — via the OpenTelemetry SDK or direct API. But the developer experience is noticeably worse than LangChain-native usage. Plan for friction.
Which LangSmith alternative is best for production monitoring vs offline eval?
Different tools for different jobs. Production monitoring: ClawPulse, Datadog LLM Obs, Helicone. Offline eval: Braintrust, Phoenix, LangSmith itself. Don't try to do both with one tool — see our monitoring-vs-evals breakdown for why.
How long does migrating off LangSmith take?
Depends on instrumentation. If you used `@traceable` decorators, swap them for OTel or vendor SDK in a day. If you leaned on LangChain callback handlers, plan a week. ClawPulse's bash-agent install is the fastest path because it requires zero code changes — no SDK swap at all.
Can I run LangSmith and another tool in parallel during evaluation?
Yes. Most teams do exactly this for 2-4 weeks: LangSmith for the dev workflow, the new tool for production, then deprecate one. Both Langfuse and ClawPulse run side-by-side without conflict.
Start monitoring your OpenClaw agents in 2 minutes
Free 14-day trial. No credit card. Just drop in one curl command.
Prefer a walkthrough? Book a 15-min demo.
Verdict
LangSmith is excellent inside its lane: LangChain-native dev workflows. Outside that lane, the alternatives often fit better:
- Production-first, framework-agnostic: ClawPulse.
- OSS + self-host: Langfuse or Phoenix.
- Eval-first: Braintrust.
- Proxy-style cost optimization: Helicone.
- Already on Datadog: Datadog LLM Observability.
Pick by the shape of your problem, not by what's hottest in HN this week. And remember: the right tool for prompt iteration is rarely the right tool for 3am alerts.
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{"@type":"Question","name":"Is LangSmith open source?","acceptedAnswer":{"@type":"Answer","text":"No. LangSmith is closed-source SaaS. The LangChain framework is open source under MIT but the observability platform is not. Open-source alternatives include Langfuse and Arize Phoenix."}},
{"@type":"Question","name":"Can I self-host LangSmith?","acceptedAnswer":{"@type":"Answer","text":"Self-hosting LangSmith is only available on the Enterprise plan and requires a sales process. There is no community self-host option. Self-hostable alternatives include Langfuse, Phoenix, and Helicone."}},
{"@type":"Question","name":"What is the cheapest LangSmith alternative?","acceptedAnswer":{"@type":"Answer","text":"ClawPulse Starter at $19 per month or Helicone's free tier covering 10,000 requests. For free open-source self-host, Langfuse and Phoenix charge nothing for the software."}},
{"@type":"Question","name":"Does LangSmith work without LangChain?","acceptedAnswer":{"@type":"Answer","text":"Yes, via OpenTelemetry SDK or direct API calls. However, the developer experience is noticeably worse than the LangChain-native path. Plan for integration friction if your stack is not LangChain-first."}},
{"@type":"Question","name":"Which LangSmith alternative is best for production monitoring versus offline evaluation?","acceptedAnswer":{"@type":"Answer","text":"Different tools for different jobs. Production monitoring: ClawPulse, Datadog LLM Observability, Helicone. Offline evaluation: Braintrust, Phoenix, or LangSmith itself. A single tool rarely covers both well."}},
{"@type":"Question","name":"How long does migrating off LangSmith take?","acceptedAnswer":{"@type":"Answer","text":"Depends on existing instrumentation. Decorator-based usage typically migrates in a day. LangChain callback-handler usage takes about a week. ClawPulse requires zero SDK changes since it uses a bash agent, making it the fastest migration path."}}
]
}
> 🇫🇷 Lecteur francophone ? Voir notre version FR : Meilleures alternatives à Langfuse en 2026 : 7 outils de monitoring d'agents IA comparés
---
May 2026 update — five-profile decision matrix: LangSmith vs ClawPulse vs the rest
Most "alternatives" articles dump a feature checklist and call it a day. Here is what we actually see in the field, broken down by team profile. We pulled this from 90+ migration conversations over Q1-Q2 2026 with teams running 1k–250k requests/day.
| Profile | Stack reality | Best fit | Why LangSmith is wrong | Why ClawPulse fits |
| --- | --- | --- | --- | --- |
| A. LangChain-native R&D (eval-heavy, prompt iteration) | Python, LangChain `@traceable`, Jupyter notebooks, 1–5 devs | LangSmith stays | (none — this is its sweet spot) | Not the right tool — keep LangSmith for offline evals |
| B. Production multi-provider agents (OpenAI + Anthropic + Mistral) | Node/Python, mixed SDKs, no framework lock-in | ClawPulse | LangChain-coupling pulls dead weight; OTel path is second-class | Framework-agnostic, agent-side, zero SDK changes |
| C. LangChain in prod at scale (>50k req/day, $5k+/mo LLM bill) | Python, LangChain, AWS/GCP, ops-aware team | ClawPulse + Phoenix for evals | LangSmith Plus tier scales linearly with traces — fast $1k/mo+ | Per-instance pricing, no trace tax, retry-storm detection |
| D. Compliance-first (Quebec/EU) | Any stack, Loi 25 / GDPR audits, SOC 2 in flight | ClawPulse | LangSmith data residency is US-East default; EU plan exists but adds latency and $$ | CA-host (Aiven Toronto), Loi 25 art.17/18/28.1 documented |
| E. Solo founder / weekend hacker | One repo, Vercel, $50–500/mo bill | Helicone free or ClawPulse Starter | LangSmith free tier limits + LangChain coupling = high friction for non-LangChain stacks | $19/mo Starter covers 5 instances, no trace caps |
The single most common migration we see: Profile B teams who adopted LangSmith because "it was the LangChain default" and now run 60% of their traffic via raw OpenAI/Anthropic SDKs — LangSmith captures the LangChain 40% well and the rest as opaque blobs. They migrate to ClawPulse, keep LangSmith for the few LangChain pipelines that still benefit from the tracing tree.
Drop-in instrumentation: replace LangSmith `@traceable` with a framework-agnostic wrapper
If you are migrating off LangSmith and want to keep observability without coupling to any framework, the pattern below is what we recommend. It is a thin observer over your LLM client — no proxy, no SDK fork, no `@traceable` decorator chain. Drop it into a shared `lib/llm-observe.ts` and call it from anywhere.
```typescript
// lib/llm-observe.ts — framework-agnostic LLM observer
// Replaces LangSmith's @traceable / RunTree without LangChain coupling.
import { createHash, randomUUID } from "node:crypto";
type LLMCall = {
provider: "openai" | "anthropic" | "mistral" | "groq" | string;
model: string;
messages: Array<{ role: string; content: string }>;
temperature?: number;
user_id?: string;
tenant_id?: string;
trace_id?: string; // root span id (compat with W3C traceparent)
parent_span_id?: string; // for nested calls (agents, tools)
};
type LLMResult
data: T;
usage?: { prompt_tokens?: number; completion_tokens?: number; total_tokens?: number };
cache_read?: boolean; // Anthropic prompt caching, OpenAI cached input
finish_reason?: string;
};
const CLAWPULSE_BEACON =
process.env.CLAWPULSE_INGEST_URL ?? "https://www.clawpulse.org/api/agent/event";
const TOKEN = process.env.CLAWPULSE_TOKEN ?? "";
function hashPrompt(messages: LLMCall["messages"]): string {
// Hash the messages so we can group retry storms by prompt without leaking content.
const normalized = messages.map(m => `${m.role}:${m.content}`).join("\n");
return createHash("sha256").update(normalized).digest("hex").slice(0, 16);
}
export async function instrumentLLMCall
meta: LLMCall,
fn: () => Promise
): Promise
const span_id = randomUUID();
const trace_id = meta.trace_id ?? span_id;
const t0 = performance.now();
let status: "ok" | "error" = "ok";
let error_class: string | undefined;
let result: LLMResult
try {
result = await fn();
return result.data;
} catch (err: any) {
status = "error";
error_class = err?.name ?? err?.constructor?.name ?? "Error";
// Don't swallow — observability must be transparent to call sites.
throw err;
} finally {
const latency_ms = Math.round(performance.now() - t0);
const event = {
schema: "llm.call.v1",
span_id,
trace_id,
parent_span_id: meta.parent_span_id,
ts: new Date().toISOString(),
provider: meta.provider,
model: meta.model,
latency_ms,
status,
error_class,
prompt_hash: hashPrompt(meta.messages),
message_count: meta.messages.length,
temperature: meta.temperature,
user_id: meta.user_id,
tenant_id: meta.tenant_id,
prompt_tokens: result?.usage?.prompt_tokens,
completion_tokens: result?.usage?.completion_tokens,
total_tokens: result?.usage?.total_tokens,
cache_read: result?.cache_read ?? false,
finish_reason: result?.finish_reason,
};
// Fire-and-forget — must not block the request hot path.
// 250ms hard timeout, no retry on the call site (the agent retries server-side).
Promise.race([
fetch(CLAWPULSE_BEACON, {
method: "POST",
headers: { "content-type": "application/json", authorization: `Bearer ${TOKEN}` },
body: JSON.stringify(event),
keepalive: true,
}).catch(() => {}),
new Promise(r => setTimeout(r, 250)),
]).catch(() => {});
}
}
// Usage — direct OpenAI SDK, no LangChain required:
//
// import OpenAI from "openai";
// const client = new OpenAI();
//
// const text = await instrumentLLMCall(
// { provider: "openai", model: "gpt-4o-mini", messages, user_id, tenant_id },
// async () => {
// const r = await client.chat.completions.create({ model: "gpt-4o-mini", messages });
// return {
// data: r.choices[0].message.content ?? "",
// usage: r.usage,
// finish_reason: r.choices[0].finish_reason,
// };
// }
// );
```
Three architectural wins versus LangSmith's `@traceable`:
1. No framework coupling. Works with raw OpenAI, Anthropic, Mistral, Groq, Ollama, vLLM. LangChain is not in the dependency graph.
2. Off the request hot path. The 250ms hard timeout + `keepalive: true` means a beacon outage never affects user latency. LangSmith's `@traceable` blocks the run unless you opt into background mode (and even then, the SDK is in your bundle).
3. Cache-aware billing. The `cache_read` flag lets ClawPulse charge cached tokens at the provider's discounted rate (Anthropic 0.1x, OpenAI cached input 0.5x). LangSmith displays raw tokens — your dashboard cost will diverge from your actual provider bill by 20–60% on cache-heavy workloads.
Postmortem: $11.4k Toronto fintech RAG retry storm — caught by ClawPulse in 9 minutes
A Toronto-based fintech (regulated, Loi 25 in scope) was running a customer-support RAG over Anthropic Claude Sonnet 4.5 with LangSmith tracing on the LangChain side and direct Anthropic SDK on a separate path. On April 18, 2026, a frontend `useEffect` regression triggered a 3x retrieval-corpus duplication on every chat turn. Within 4 hours:
- Anthropic prompt-tokens spend went from $1,200/day baseline to $11,400 in 4h (~$68k/day run rate).
- Anthropic dashboards showed nothing actionable — their daily-cost chart aggregates over 24h windows.
- LangSmith captured the LangChain pipeline cleanly, but the duplicated retrieval ran on the direct-SDK path and was invisible to LangSmith.
- ClawPulse's per-route z-score on `prompt_tokens` crossed 4.1 at minute 9, fired a Slack alert, and pinpointed the prompt_hash of the duplicated retrieval.
Time-to-detection: 9 minutes. Time-to-mitigation: 31 minutes. Estimated savings vs. catching at next-day Anthropic invoice: ~$54k, on a $49/mo Growth plan. The team kept LangSmith for offline RAG evaluation (which it does well) and moved production monitoring to ClawPulse.
Two non-obvious lessons:
- Coverage is the single biggest blind spot. LangSmith covered 40% of LLM traffic — the LangChain side. The other 60% (direct SDK calls for embeddings, reranking, classification) had no observability. Framework-coupled monitoring is structurally limited to the framework's call graph.
- Aggregation windows kill anomaly detection. Provider dashboards aggregate at hour/day boundaries. You need per-minute z-scores on prompt_tokens to catch retry storms before the bill arrives.
Four production SQL recipes — adapt to your warehouse
If you self-host or warehouse your LLM events (BigQuery, Snowflake, Postgres, ClickHouse), these are the four queries we run weekly. They are stack-agnostic — the schema below assumes a flat `llm_calls` table with columns matching the `llm.call.v1` event above.
1. Per-route p95 latency + cost outliers (last 24h)
```sql
SELECT
COALESCE(parent_span_id, trace_id) AS route,
COUNT(*) AS n,
APPROX_QUANTILES(latency_ms, 100)[OFFSET(95)] AS p95_ms,
SUM(prompt_tokens) AS in_tok,
SUM(completion_tokens) AS out_tok,
SUM(IF(cache_read, prompt_tokens, 0)) AS cached_in_tok,
SUM(IF(status='error', 1, 0)) AS errors
FROM llm_calls
WHERE ts > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY route
HAVING n >= 50
ORDER BY p95_ms DESC
LIMIT 25;
```
2. Retry-storm detection by prompt_hash (rolling z-score)
```sql
WITH per_min AS (
SELECT
TIMESTAMP_TRUNC(ts, MINUTE) AS minute,
prompt_hash,
COUNT(*) AS n
FROM llm_calls
WHERE ts > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 6 HOUR)
GROUP BY minute, prompt_hash
),
baseline AS (
SELECT prompt_hash,
AVG(n) AS mu,
STDDEV(n) AS sigma
FROM per_min
WHERE minute < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE)
GROUP BY prompt_hash
HAVING COUNT(*) >= 20
)
SELECT p.minute, p.prompt_hash, p.n, b.mu, b.sigma,
SAFE_DIVIDE(p.n - b.mu, NULLIF(b.sigma, 0)) AS z
FROM per_min p JOIN baseline b USING (prompt_hash)
WHERE p.minute >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE)
AND SAFE_DIVIDE(p.n - b.mu, NULLIF(b.sigma, 0)) > 3.0
ORDER BY z DESC;
```
3. Cache-read ratio degradation (Anthropic prompt caching, OpenAI cached input)
```sql
SELECT
DATE(ts) AS day,
provider,
model,
SUM(IF(cache_read, prompt_tokens, 0)) /
NULLIF(SUM(prompt_tokens), 0) AS cache_ratio,
SUM(prompt_tokens) AS total_in
FROM llm_calls
WHERE ts > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY)
GROUP BY day, provider, model
ORDER BY day DESC, total_in DESC;
```
A drop from 0.65 to 0.30 cache_ratio overnight typically means a prompt template changed and broke the cache prefix — a $$$ silent regression LangSmith does not surface.
4. Multi-tenant fairness (catch a single tenant burning the budget)
```sql
SELECT
tenant_id,
COUNT(*) AS calls,
SUM(total_tokens) AS tokens,
SUM(total_tokens) * 1.0 / SUM(SUM(total_tokens)) OVER () AS share
FROM llm_calls
WHERE ts > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
GROUP BY tenant_id
ORDER BY tokens DESC
LIMIT 20;
```
If one `tenant_id` is consuming >60% of tokens, you have either a runaway customer, a missing rate-limit, or a bug. We have seen all three in the wild.
Seven-tool comparison: where LangSmith genuinely wins, and where it does not
We tested LangSmith, ClawPulse, Helicone, Langfuse (self-hosted), Phoenix (Arize), Braintrust, and Datadog LLM Observability across eight capabilities that matter in production. Scoring is May-2026 current.
| Capability | LangSmith | ClawPulse | Helicone | Langfuse SH | Phoenix | Braintrust | Datadog LLM |
| --- | --- | --- | --- | --- | --- | --- | --- |
| Framework-agnostic | ⚠️ OTel path is 2nd-class | ✅ Bash agent | ✅ Proxy | ✅ SDK | ✅ OTel | ⚠️ Eval-first | ✅ APM |
| Off request hot path | ⚠️ Background mode | ✅ Agent | ❌ Proxy SPOF | ⚠️ SDK | ✅ OTel | ⚠️ SDK | ⚠️ Agent |
| Cache-aware billing | ❌ Raw tokens | ✅ Discounted | ⚠️ Partial | ⚠️ Partial | ❌ | ❌ | ⚠️ |
| Retry-storm z-score | ❌ | ✅ Per-min | ⚠️ Manual | ⚠️ Manual | ❌ | ❌ | ⚠️ |
| Tracing tree depth | ✅ Best-in-class | ⚠️ Span-level | ⚠️ Flat | ✅ | ✅ | ⚠️ | ⚠️ |
| Offline evals + datasets | ✅ Best-in-class | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ |
| Setup time (clean repo) | ~30 min | ~3 min | ~10 min | ~2 hours SH | ~20 min | ~30 min | ~1 hour |
| CA/EU data residency | ⚠️ EU plan +latency | ✅ Aiven Toronto | ⚠️ US default | ✅ Self-host | ⚠️ Self-host | ⚠️ US default | ✅ EU |
LangSmith genuinely wins on: tracing tree depth (RunTree visualization), offline evaluation + dataset management, and the LangChain-native developer experience. If you are LangChain-native and your job is prompt iteration / eval, do not migrate away.
LangSmith loses on: non-LangChain coverage, cache-aware billing accuracy, retry-storm detection, and CA/EU residency. If you are a Profile B/C/D team, those four gaps cost more per month than the entire LangSmith bill.
Compliance: Quebec Loi 25 + GDPR — what changes when you move off LangSmith
LangSmith's default data residency is US-East. Their EU plan exists but routes through Frankfurt with measurable added latency (we saw +120ms p50 in our tests). For Profile D teams under Loi 25 (Quebec) or GDPR Article 28/32:
- ClawPulse data residency: Aiven MySQL in Toronto (Canada Central). Loi 25 art.17 (data localization) compliant by default.
- Subprocessor list: Vercel (US, edge only — no PII storage), Aiven (CA, primary), Resend (US, transactional email only). Loi 25 art.28.1 disclosure documented in our DPA.
- Right-to-erasure: SHA-256 prompt hashing + per-tenant DELETE — we can prove erasure within 72h of request. GDPR art.17 + Loi 25 art.28.2 compliant.
- Audit log retention: 13 months default, configurable. SOC 2 Type II in progress (Q4 2026 target).
If your DPO has flagged LangSmith's US-default residency as a risk, ClawPulse is one of two production-grade options that ship with CA hosting out of the box (the other is self-hosted Langfuse on your own infra).
Ten-point pre-migration checklist
Before you flip the switch off LangSmith, run through this:
1. Inventory your LLM call sites. Grep for `@traceable`, `RunTree`, `traceable(`, `tracing_v2_enabled`. Anything not covered = silent regression risk post-migration.
2. Identify non-LangChain LLM calls. Direct SDK usage (`openai.chat.completions.create`, `anthropic.messages.create`) — these need wrapping with `instrumentLLMCall` or equivalent.
3. Map your eval pipelines. If you depend on LangSmith datasets + evaluators, plan to keep LangSmith for offline evals or migrate datasets to Phoenix/Braintrust.
4. Document your trace-tree dependencies. Any dashboard or alert built on LangChain RunTree IDs — translate to your new span_id/trace_id schema.
5. Validate cache-read billing. Run a 7-day side-by-side: LangSmith dashboard cost vs. provider invoice. If the gap is >10%, prioritize a cache-aware tool.
6. Set up retry-storm alerts. Per-route z-score > 3.0 on prompt_tokens, 5-minute window, fires to Slack/PagerDuty.
7. Verify data residency. Confirm with your DPO that the new tool's primary region matches your compliance requirements.
8. Plan a 14-day overlap. Run both LangSmith and the new tool in parallel. Do not turn off LangSmith until you have caught at least one production incident on the new system.
9. Update runbooks. Anywhere your on-call docs reference "LangSmith trace URL" — update to the new tool's URL pattern.
10. Archive your LangSmith data. Export the last 90 days of traces + datasets before you cancel — LangSmith deletes on cancellation.
Extended FAQ
Q: Can I keep LangSmith for evals and use ClawPulse for production monitoring?
Yes — this is the most common pattern we see in Profile C teams. LangSmith's dataset + evaluator UX is best-in-class; ClawPulse's production observability is framework-agnostic and cache-aware. They do not overlap meaningfully. Wire the evaluation pipeline to LangSmith, wire the production hot path to ClawPulse.
Q: How do I migrate LangSmith `@traceable` decorators without rewriting every call site?
Use a codemod: `@traceable` → `instrumentLLMCall`. We published a `jscodeshift` transform on GitHub (search `clawpulse/codemods`). It handles the common cases (sync/async, named/anonymous, nested decorators) and flags the edge cases for manual review. Most teams complete the migration in <1 day for <500 call sites.
Q: What happens to my historical LangSmith data on cancellation?
LangSmith's standard retention deletes traces 30 days after cancellation. Export everything via their API before you cancel. We have a script template at `clawpulse.org/blog/export-langsmith-traces-before-cancellation` (or contact support — we will run the export for you on Growth+ plans).
Q: Does ClawPulse support OpenTelemetry?
Yes — the bash agent emits OTel spans by default and the API ingests OTLP/HTTP. If your stack already exports to a collector (Jaeger, Tempo, Honeycomb), you can dual-write without changing instrumentation.
Q: How does pricing compare at 100k requests/day?
At 100k req/day (~3M req/month), LangSmith Plus runs $390–$650/mo depending on trace retention. ClawPulse Growth covers it at $49/mo flat. The crossover point where LangSmith becomes cheaper for production monitoring does not exist — it is a different pricing model (trace-tax vs flat).
Q: Can I use ClawPulse with LangChain?
Yes. Wrap your LangChain `LLMChain.invoke()` calls with `instrumentLLMCall`, or add the bash agent to your container — the agent captures provider HTTP traffic regardless of framework. We have customers running LangChain + LlamaIndex + raw SDK in the same service, all observed through ClawPulse.
Q: Does ClawPulse support custom evaluators?
Not yet (Q3 2026 roadmap). For now, pair ClawPulse for production monitoring with Phoenix (Arize) or LangSmith for evals. Phoenix is open-source and has good LangChain integration.
Q: What is the SOC 2 status?
SOC 2 Type II audit in progress, Q4 2026 target. Type I report available on request under NDA. Bridge letter from our infra partners (Vercel, Aiven) covers the gap for most procurement teams.
> Ready to migrate? Start a 14-day free trial at clawpulse.org/signup, or book a demo — we will walk you through the LangSmith → ClawPulse migration on your stack live.
Internal reads: LLM API rate limiting best practices · How much does the Claude API cost in 2026 · OpenAI API cost per token explained · OpenClaw observability platform — complete guide · Best Helicone alternatives 2026 · Best Portkey alternatives 2026 · Why teams are switching from Langfuse
External authority: LangSmith docs · OpenTelemetry semantic conventions for GenAI · Anthropic prompt caching · OpenAI cached input pricing
LangSmith alternatives by deployment shape: OSS, self-hosted on AWS/Azure, or fully managed
The most-asked autocomplete query around "langsmith alternative" is no longer feature parity — it's where can I run it. Three deployment shapes dominate the 2026 buying conversation:
1. Open-source / self-hosted (data never leaves your VPC)
If you're a regulated tenant (HIPAA, FedRAMP-moderate, EU-only data residency) or you simply prefer to own the storage layer, the realistic OSS-friendly options are:
- Langfuse OSS — Postgres + (since v3.0) ClickHouse for traces. Helm chart, ~6 services to operate.
- Phoenix (Arize OSS) — single Docker image, SQLite or Postgres. Easiest local install, no horizontal scaling story.
- OpenTelemetry Collector + ClickHouse + Grafana — no LLM-specific UI, but the cheapest at scale (~$0.20/M spans).
ClawPulse offers a single-tenant cloud deployment (your dedicated DB + dedicated worker pool, US-east, EU-west, or APAC) which covers ~90% of "self-hosted" requests without the operational tax. See the /pricing page for the Agency tier.
2. Self-hosted on AWS
Practical recipe for running a Langfuse-class stack on AWS:
```bash
# RDS Postgres 16 (db.t4g.medium, ~$25/mo)
aws rds create-db-instance
--db-instance-identifier llm-obs
--engine postgres --engine-version 16.3
--db-instance-class db.t4g.medium
--allocated-storage 50 --storage-type gp3
# ClickHouse on ECS Fargate (1 vCPU, 2GB) for traces
# OR: ClickHouse Cloud ~$50/mo for <10M spans
# Langfuse web + worker on App Runner (autoscale 1→4)
# Total ~$120/mo at 5M spans/month
```
The hidden cost is operational toil — you're now on the hook for Postgres upgrades, ClickHouse merges, and TLS rotation. Most teams we talk to spend 4–6 hours/month babysitting this stack. At a $200k loaded eng cost, that's ~$500/mo of hidden labor, making the build-vs-buy math much tighter than the AWS bill suggests.
3. Self-hosted on Azure
Azure-native deployment leverages:
- Azure Database for PostgreSQL Flexible Server (Burstable B2s tier, ~$35/mo)
- Azure Container Apps for Langfuse web + worker (scale-to-zero saves ~40% on dev environments)
- Azure OpenAI Service as the LLM provider — ClawPulse, Langfuse, and Phoenix all auto-detect Azure OpenAI endpoints from the `AZURE_OPENAI_ENDPOINT` env var
A common pitfall: Private Endpoints are not free ($7.30/endpoint/month + $0.01/GB processed). A typical 3-endpoint observability stack adds ~$25/month before any traffic. Budget for this upfront.
The Langfuse → ClickHouse migration: should you wait, switch, or stay flat?
In April 2026 Langfuse migrated their hosted traces from Postgres to ClickHouse. The autocomplete data shows real anxiety around this — `langfuse clickhouse migration`, `langfuse clickhouse alternative`, and `langfuse clickhouse acquisition` are all rising queries.
What it means in practice:
| Concern | Reality |
| --- | --- |
| Self-hosted upgrade | OSS v3.0+ requires you to stand up ClickHouse alongside Postgres. The migration script is one-way. |
| Query latency | Faster for aggregate analytics (p50 1.2s → 380ms on 100M spans), slower for single-trace lookups (~120ms penalty). |
| Cost at scale | ClickHouse compresses spans ~10×. A 500GB Postgres dataset compacts to ~50GB. Wins above 50M spans/month. |
| Operational complexity | +1 stateful service. ClickHouse merges, ZooKeeper (or ClickHouse Keeper), backup strategy. |
If you're below 5M spans/month, the migration cost outweighs the benefit — stay on Postgres-only OSS, or move to a managed alternative like ClawPulse which bundles columnar storage transparently. If you're above 50M spans/month and self-hosting, the migration is unavoidable but worth doing — we maintain a Langfuse → ClawPulse parallel-write playbook for teams who want a managed exit.
LangGraph + LangSmith alternative: drop-in tracing for graph runs
LangGraph (the agentic-orchestration cousin of LangChain) auto-emits traces to LangSmith when `LANGCHAIN_TRACING_V2=true`. To swap in an alternative without rewriting your graph:
```python
import os
from langgraph.graph import StateGraph
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
# Point OTel at ClawPulse (or Langfuse OSS, Phoenix, etc.)
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter(
endpoint="https://otel.clawpulse.org/v1/traces",
headers={"authorization": f"Bearer {os.environ["CLAWPULSE_TOKEN"]}"},
)))
trace.set_tracer_provider(provider)
tracer = trace.get_tracer(__name__)
# Wrap each node so graph transitions become OTel spans
def traced_node(name, fn):
def wrapped(state):
with tracer.start_as_current_span(f"graph.node.{name}") as span:
span.set_attribute("gen_ai.system", "openai")
span.set_attribute("gen_ai.request.model", state.get("model", ""))
result = fn(state)
span.set_attribute("gen_ai.usage.input_tokens", result.get("input_tokens", 0))
span.set_attribute("gen_ai.usage.output_tokens", result.get("output_tokens", 0))
return result
return wrapped
graph = StateGraph(dict)
graph.add_node("retrieve", traced_node("retrieve", retrieve_docs))
graph.add_node("generate", traced_node("generate", llm_call))
```
The `gen_ai.*` attributes follow the OpenTelemetry GenAI semantic conventions — same wire format consumed by Phoenix, Langfuse v3, ClawPulse, and Datadog LLM Observability. Tracing once, exporting anywhere is the only future-proof bet in 2026.
FAQ
Q: What is the best open-source LangSmith alternative in 2026?
A: Langfuse (most features, requires Postgres + ClickHouse since v3), Phoenix (simplest single-binary install), and raw OpenTelemetry Collector + ClickHouse + Grafana (cheapest at high scale). All three implement the OTel GenAI semantic conventions, so you can switch later without instrumentation changes.
Q: Can I run a LangSmith alternative inside my AWS or Azure VPC?
A: Yes. Langfuse OSS, Phoenix, and ClawPulse Agency-tier all support fully isolated deployments. AWS recipe: RDS Postgres + ECS Fargate ClickHouse + App Runner (~$120/mo at 5M spans). Azure: Postgres Flexible Server + Container Apps (~$95/mo + Private Endpoint fees).
Q: Should I migrate to Langfuse v3 with ClickHouse, or pick an alternative?
A: Below 5M spans/month, the migration overhead is not worth it — stay on Postgres or switch to a managed alternative. Above 50M spans/month, columnar storage is essential — either migrate, or move to a vendor that handles it for you.
Q: Does ClawPulse work with LangGraph and LangChain?
A: Yes. Set `OTEL_EXPORTER_OTLP_ENDPOINT=https://otel.clawpulse.org/v1/traces` and `LANGCHAIN_TRACING_V2=true`. ClawPulse consumes both the native LangChain callback format and OpenTelemetry GenAI spans. Book a demo or start a 14-day trial.
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{"@type":"Question","name":"What is the best open-source LangSmith alternative in 2026?","acceptedAnswer":{"@type":"Answer","text":"Langfuse (most features, requires Postgres + ClickHouse since v3), Phoenix (simplest single-binary install), and raw OpenTelemetry Collector + ClickHouse + Grafana (cheapest at high scale)."}},
{"@type":"Question","name":"Can I run a LangSmith alternative inside my AWS or Azure VPC?","acceptedAnswer":{"@type":"Answer","text":"Yes — Langfuse OSS, Phoenix, and ClawPulse Agency-tier all support fully isolated deployments on AWS (RDS + ECS Fargate + App Runner) or Azure (Postgres Flexible Server + Container Apps)."}},
{"@type":"Question","name":"Should I migrate to Langfuse v3 with ClickHouse?","acceptedAnswer":{"@type":"Answer","text":"Below 5M spans per month, the migration overhead exceeds the benefit. Above 50M spans per month, columnar storage is essential."}},
{"@type":"Question","name":"Does ClawPulse work with LangGraph and LangChain?","acceptedAnswer":{"@type":"Answer","text":"Yes. Set OTEL_EXPORTER_OTLP_ENDPOINT to https://otel.clawpulse.org/v1/traces and LANGCHAIN_TRACING_V2=true. ClawPulse consumes both LangChain native callbacks and OpenTelemetry GenAI spans."}}
]
}