Review

Datadog LLM Observability: serious AI ops for Datadog shops

Datadog LLM Observability is a strong fit for teams that want LLM traces, evaluations, and security controls inside the same observability stack they already use.

Last updated April 2026 · Pricing and features verified against official documentation

Datadog has spent more than a decade turning messy production systems into something engineers can actually reason about. LLM Observability is the natural extension of that idea: prompts, responses, tool calls, latency, token usage, and evaluation data are treated as ordinary production telemetry rather than as a special AI sidecar.

That is what makes the product more credible than many of the newer AI observability tools around it. Datadog shipped LLM Observability in 2024, then expanded it in June 2025 with AI Agent Monitoring, LLM Experiments, and an AI Agents Console. The New Stack covered the product at DASH 2024 as part of the wider push to make observability work for the LLM era, and TechTarget captured the 2025 upgrade cycle: better visibility, but still plenty of buyer caution about pricing and complexity.

The honest case for Datadog LLM Observability is straightforward. If your team already runs Datadog for APM, logs, RUM, or security, this is one of the cleanest ways to add AI tracing without introducing another platform to govern. The product goes beyond a trace viewer; it gives you evaluation workflows, redaction, data access controls, and the ability to correlate model behavior with the rest of the system.

The honest case against it is just as straightforward. Datadog is not selling a lean AI-native tool here. It is selling another layer inside a large observability platform, with Datadog pricing and Datadog complexity attached. If you want open-source control, a narrower AI workflow, or self-hosting, Langfuse, LangSmith, and Braintrust are better starting points.

Datadog LLM Observability is a serious product. It is also a product that makes the most sense when Datadog already owns the rest of your stack.

What the Product Actually Is Now

Datadog LLM Observability is best understood as Datadog’s AI observability layer inside a much larger monitoring and security platform. The product now covers end-to-end tracing, automated and human-assisted evaluations, dataset creation from production traces, and security checks for hallucinations, prompt injection, and sensitive data exposure. Datadog also supports OpenTelemetry, the HTTP API, and SDK-based instrumentation in Python, Node.js, and Java.

That broader scope matters because the product is no longer just about staring at a broken prompt chain. In Datadog’s current docs, the workflow starts with traces and then expands into quality monitoring, cost monitoring, anomaly detection, and redaction. In other words, it behaves like production infrastructure, not like an AI sandbox.

Strengths

It connects AI behavior to the rest of production. Datadog’s real advantage is not that it can show a trace. Many tools can do that. It is that LLM traces can sit alongside APM, RUM, logs, and infrastructure data, which makes root-cause work much faster when the problem lives outside the model itself.

The evaluation loop is genuinely useful. The product does more than record output quality. It lets teams build datasets from production traces, compare prompts and model swaps in Playground, and use built-in evaluators to catch hallucinations, prompt-injection attempts, and PII exposure before those failures ship again.

It is easy to adopt if Datadog already runs your stack. Teams already using Datadog do not need to justify a new vendor just to observe AI systems. The same platform can handle AI telemetry, security scanning, and the rest of the application surface, which is a real operational advantage for larger teams.

The security controls are better than the usual AI product defaults. Datadog’s docs call out data access control, redaction, and restricted team access for sensitive LLM applications. Combined with the trust center’s compliance posture, that makes the product credible for organizations that cannot casually ship raw prompts into a consumer-style assistant.

Weaknesses

It inherits the Datadog platform tax. If you are not already a Datadog customer, this is a heavy way to buy AI observability. The product is broad, powerful, and clearly designed for platform teams. It is not the lightest answer for someone who just wants to inspect a few agent runs.

The pricing story is still messy. The dedicated LLM Observability page now shows Free and Pro plans, with the Pro plan at $240 per month and a May 1, 2026 pricing change, while Datadog’s broader pricing page still frames the product as starting at $8 per 10K monitored LLM requests with annual or on-demand billing and a 100K minimum. That is not fatal, but it is a sign buyers need to read the current page carefully instead of trusting old screenshots or cached documentation.

It is less opinionated than the best specialist tools. Datadog gives you observability plus security plus platform context. That breadth is useful, but it also means the product can feel less focused than Langfuse for open-source teams, less deployment-oriented than LangSmith, and less eval-centric than Braintrust.

Pricing

Datadog’s pricing makes the buying decision look simple until you ask what kind of customer Datadog actually wants. The product page presents a Free tier at $0 per month with up to 40K LLM spans, 15-day retention, unlimited context and evals, and full feature access. Pro is $240 per month, starts at 100K LLM spans, keeps the same 15-day retention, and adds the same full feature set for production use.

That is the clearest customer-facing framing, but it is not the only one. Datadog’s main pricing page still describes LLM Observability as starting at $8 per 10K monitored LLM requests, billed annually or $12 on-demand, with a 100K minimum. That mismatch is enough to make the product feel more like metered infrastructure pricing than a clean SaaS subscription, and it is exactly the kind of thing that creates budget surprises if nobody checks the current page before purchase.

For individuals and small teams, the Free tier is the obvious place to start. For teams that are already running AI in production, Pro is the practical choice if Datadog already owns the rest of the observability stack. The value proposition is less about absolute price and more about consolidation: you are paying to keep AI telemetry inside the same system that already knows the rest of your application.

Privacy

Datadog’s privacy posture is closer to ordinary enterprise SaaS than to a consumer AI app. The company says personal information is used to provide, improve, and secure Datadog products, and its privacy policy makes clear that data may be collected, transferred, accessed, or stored outside the user’s country. The trust center lists a long compliance roster, including SOC 2, GDPR, HIPAA, and ISO/IEC 27001, among other certifications and frameworks.

For LLM Observability specifically, Datadog documents data access controls and redaction tools so teams can limit who sees sensitive traces and remove sensitive values before they leave the application. I did not find a public statement saying customer traces are used to train models by default, which is the right answer for a product like this. The practical risk is not model training; it is that observability data often contains prompts, payloads, identifiers, and URLs that teams should not assume will sanitize themselves.

Who It’s Best For

Teams already standardized on Datadog. If your engineers already use Datadog for monitoring and incident response, LLM Observability is the most natural place to add AI telemetry. You get the AI layer without adding a new observability vendor.

Platform teams shipping production agents. These teams need traces, evals, cost visibility, and the ability to compare behavior across releases. Datadog is a strong fit because it lets them keep that work inside the same operational system they already use for the rest of the stack.

Enterprises that want one vendor for observability and security. Datadog’s appeal is consolidation. If the buying team would rather extend an existing contract than stand up a separate AI-native tool, this product is built for that procurement pattern.

Teams that need security controls around AI traces. The combination of redaction, access control, and the broader Datadog trust posture makes the product a better fit for sensitive workloads than a consumer-grade assistant or a lightweight logging tool.

Who Should Look Elsewhere

Teams that want open-source control or self-hosting should start with Langfuse.
Teams that want a broader agent engineering platform with deployment options should compare LangSmith first.
Teams that care most about evaluation workflows and release-quality gates should look closely at Braintrust.
Teams that want a lighter tracing-only product should look at Arize Phoenix or Helicone before paying for Datadog’s broader platform.

Bottom Line

Datadog LLM Observability is the right answer when the question is not “what is the best AI observability tool?” but “how do we add AI observability without creating another island?” For Datadog customers, that answer is unusually strong: the product is integrated, operationally useful, and serious enough to track real production systems rather than demos.

The limitation is the same one that has always made Datadog powerful and expensive. This is a platform decision, not a point tool decision. If you are willing to buy into that model, Datadog LLM Observability is one of the most credible enterprise answers in the category. If you are not, the specialist tools will feel cleaner and cheaper for a long time.