Documentation
OpenAI Example
Trace real OpenAI chat completion calls.
Overview
This example traces a real OpenAI chat completion. It captures the full request-response cycle including prompt, response content, model name, latency, token usage, and streaming chunks. The wrap_openai monkey-patches the OpenAI client so every chat.completions.create call is automatically traced.
Code
openai_example.pyCopy
python
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from openai import OpenAI
from tracellm import trace
from tracellm.integrations.openai import wrap_openai
client = OpenAI()
client = wrap_openai(client)
@trace(project="openai-demo", environment="development")
def ask_llm(prompt: str) -> str:
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": prompt}],
max_tokens=500,
temperature=0.7,
)
return response.choices[0].message.content
if __name__ == "__main__":
result = ask_llm(
"Explain how transformer attention works in three sentences."
)
print(f"\nResponse received ({len(result)} chars)")
print(result)Tip
Set
export OPENAI_API_KEY="sk-..." before running. Make sure the TraceLLM stack is running (tracellm start) so traces are persisted.Expected Output
Console outputCopy
text
╭── TraceLLM Trace ───────────────────────────── SUCCESS ──╮ │ │ │ Trace ID tr_f9e2a1b7 │ │ Prompt Explain how transformer attention works in │ │ three sentences. │ │ Model gpt-4.1-mini │ │ Project openai-demo │ │ Environment development │ │ Latency 1,873.42 ms │ │ Token Count 142 │ │ Retries 0 │ │ Steps 1 │ │ Status SUCCESS │ │ │ ╰──────────────────────────────────────────────────────────────╯ # Tool Duration Status Detail 1 openai_chat 1873ms OK Response received (486 chars) Transformer attention works by computing three vectors — Query, Key, and Value — from each input token. It calculates attention scores by taking the dot product of every Query with every Key, then applies a softmax to produce a probability distribution. These scores determine how much each token contributes to the output, allowing the model to focus on relevant parts of the input when generating each token.
Dashboard Result
Open http://localhost:3000/traces to see the trace in the dashboard:
Dashboard UICopy
text
TraceLLM Dashboard > Traces
Status Trace ID Prompt Model Latency Tokens Time
─────── ─────────────── ─────────────────────────────────────────── ───────────── ────────── ──────── ─────────────────────
● Success tr_f9e2a1b7 Explain how transformer attention works in gpt-4.1-mini 1,873 ms 142 2026-05-31 14:22:10
three sentences.
> Clicking the trace opens the detail view:
┌─ tr_f9e2a1b7 ───────────────────────────────────────────── [Success] ─┐
│ Model gpt-4.1-mini | Latency 1,873 ms | Tokens 142 │
│ Retries 0 | Steps 1 | At 2026-05-31 14:22:10 │
└───────────────────────────────────────────────────────────────────────┘
┌─ Step Timeline ───────────────────────────────────────────────────────┐
│ │
│ openai_chat ─────────────────────────────────────── 1,873ms OK │
│ │
└────────────────────────────────────────────────────────────────────────┘
┌─ Prompt ────────────────┐ ┌─ Response ───────────────────────────────┐
│ Explain how transformer │ │ Transformer attention works by computing │
│ attention works in │ │ three vectors — Query, Key, and Value — │
│ three sentences. │ │ from each input token... │
└─────────────────────────┘ └──────────────────────────────────────────┘Replay Result
Use the CLI to replay the trace step-by-step:
terminalCopy
bash
tracellm replay tr_f9e2a1b7
Replay outputCopy
text
╭────────────────── Replaying execution timeline... ──────────────────╮
│ │
│ ╭─ Replay ───────────────────────────────────────────────────────╮ │
│ │ │ │
│ │ trace_id tr_f9e2a1b7 │ │
│ │ status SUCCESS │ │
│ │ latency 1873.42 ms │ │
│ │ retries 0 │ │
│ │ steps 1 │ │
│ │ │ │
│ ╰─────────────────────────────────────────────────────────────────╯ │
│ │
│ ╭─ Step 1/1 ────────────────────────────────╮ │
│ │ │ │
│ │ step 1/1 │ │
│ │ tool openai_chat │ │
│ │ duration 1873 ms │ │
│ │ status OK │ │
│ │ input {'model': 'gpt-4.1-mini', ...} │ │
│ │ output {'content': 'Transformer att... │ │
│ │ │ │
│ ╰───────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────╯
╭── TraceLLM Trace ───────────────────────────── SUCCESS ──╮
│ │
│ Trace ID tr_f9e2a1b7 │
│ Prompt Explain how transformer attention... │
│ Model gpt-4.1-mini │
│ Project openai-demo │
│ Environment development │
│ Latency 1,873.42 ms │
│ Token Count 142 │
│ Retries 0 │
│ Steps 1 │
│ Status SUCCESS │
│ │
╰──────────────────────────────────────────────────────────╯
Replay complete