MongoDB
Configure and operate the trace database.
MongoDB
Version 1 Requirement
TraceLLM currently requires MongoDB for trace storage.
Required environment variables:
MONGO_URLDB_NAME
Future versions may support additional storage options.
Tracey Guide
MONGO_URL and DB_NAME are set before running tracellm start.TraceLLM uses MongoDB as its persistent store for all trace documents, project records, and API keys. The connection is managed via the Motor async driver (AsyncIOMotorClient), which integrates natively with FastAPI's async event loop. The CLI bridges sync code to Motor through a persistent event loop in db.py.
Connection Management
MongoDB connection is managed in app/database/mongodb.py. The module uses a singleton pattern with module-level globals:
# Module-level globals in mongodb.py
client: Optional[AsyncIOMotorClient] = None
database: Optional[AsyncIOMotorDatabase] = None
async def connect_to_mongo(mongo_url, db_name):
if database is not None:
return database # Already connected
client = AsyncIOMotorClient(mongo_url)
database = client[db_name]
await client.admin.command("ping") # Verify connectivity
return database
def get_database() -> AsyncIOMotorDatabase:
if database is None:
raise RuntimeError("MongoDB is not connected yet.")
return database
async def close_mongo_connection():
if client is not None:
client.close()Warning
Collections & Schema
Three MongoDB collections store all TraceLLM data:
| Collection | Schema Model | Purpose | Key Indexes |
|---|---|---|---|
| traces | TraceSchema | Full trace documents with steps, metadata, status | trace_id, created_at, status, model_name, project_id, environment |
| projects | ProjectSchema | Project records with name, description, timestamps | project_id (unique), name (unique) |
| api_keys | ApiKeySchema | API key records with key hash, project, environment | key (unique), project_id, environment |
Each trace document follows the TraceSchema Pydantic model, which enforces field types, defaults, and validators at both write and read boundaries:
TraceSchema:
trace_id: str # UUID4, prefixed "tr_"
prompt: str # Input prompt or operation name
response: Optional[str] # LLM or system response text
latency: float # Total execution time in ms (>= 0)
token_count: int # Estimated or actual tokens (>= 0)
model_name: Optional[str]# Model identifier (e.g. gpt-4o)
project_id: str # Project grouping ("default")
project_name: Optional[str]
api_key: Optional[str] # Stored for audit purposes
environment: str # "development", "staging", "production"
status: Literal["success", "warning", "failed"]
steps: list[StepSchema] # Ordered execution steps
retry_count: int # Number of retries
slow_request: bool # True if latency >= 1500ms
failure_reason: Optional[str]
created_at: datetime # Execution start (UTC)
updated_at: datetime # Persistence time (UTC)
StepSchema:
step_id: str # UUID4
tool_name: str # e.g. "vector_retrieval"
input: dict # Input parameters
output: dict # Returned result
duration: float # Wall-clock time in ms (>= 0)
success: bool # Completed without error
timestamp: datetime # Execution time (UTC)Index Strategy
Indexes are created automatically during the FastAPI startup event via the on_event("startup") handler. The creation functions are idempotent and safe to call on every restart:
# traces collection
traces.create_index("trace_id") # Single trace lookup
traces.create_index("created_at") # Time-range queries
traces.create_index("status") # Filter by status
traces.create_index("model_name") # Filter by model
traces.create_index("project_id") # Multi-tenant isolation
traces.create_index("environment") # Environment scoping
# projects collection
projects.create_index("project_id", unique=True)
projects.create_index("name", unique=True)
# api_keys collection
api_keys.create_index("key", unique=True) # Key lookup
api_keys.create_index("project_id") # List by project
api_keys.create_index("environment") # Filter by envInfo
traces collection indexes support all filter combinations used by the dashboard: status + project, model + environment, latency range + status, and time-sorted queries for the analytics time-series charts.Trace Normalization Pipeline
Before insertion, every trace document passes through a normalization pipeline in normalize_trace_document() (in trace_service.py):
Input: raw trace dict from @trace/CLI
│
├── 1. Parse created_at ──► _coerce_datetime()
│ Supports datetime objects, ISO strings, or falls back to utcnow()
│
├── 2. Normalize steps ──► _normalize_steps()
│ Maps input/input_data, output/output_data keys
│ Validates each step against StepSchema
│ Generates step_id if missing
│
├── 3. Infer retry count ──► _infer_retry_count()
│ Counts duplicate tool_name occurrences in step list
│ Uses explicit retry_count if provided
│
├── 4. Infer status ──► _infer_status()
│ explicit status > any failed step > failure_reason/retries > success
│
├── 5. Infer failure_reason ──► _infer_failure_reason()
│ explicit message > first failed step's output.error > tool_name
│
├── 6. Set slow_request flag
│ True if latency >= SLOW_TRACE_THRESHOLD_MS (1500ms)
│
└── 7. Validate ──► TraceSchema.model_dump(mode="python")
Pydantic validation catches negative values, wrong types, etc.
Output: clean MongoDB documentCommon Query Patterns
The trace service provides these query patterns used by the API and dashboard:
# List traces with filters
db.traces.find({
status: "failed",
project_id: "my-app",
environment: "production",
latency: { $gte: 100, $lte: 5000 },
token_count: { $gte: 50 }
}).sort({ created_at: -1 }).limit(50)
# Get single trace
db.traces.findOne({ trace_id: "tr_2kf9q3m1" })
# Analytics - all traces in date order
db.traces.find({}).sort({ created_at: 1 })
# Failures - recent failed/retry/slow traces
db.traces.find({
$or: [
{ status: "failed" },
{ retry_count: { $gt: 0 } },
{ slow_request: true }
]
}).sort({ created_at: -1 }).limit(25)Running MongoDB
Start a local MongoDB instance for development:
# Docker (recommended)
docker run -d --name tracellm-mongo -p 27017:27017 mongo:7
# Native
mongod --dbpath /data/db --port 27017
# Verify connection
mongosh --eval "db.runCommand({ ping: 1 })"Tip
MONGO_URL to your Atlas SRV connection string. The startup.py module tests connectivity with a 3-second timeout and logs a warning if unreachable.