Hey HN,
I built RAG Logger, a lightweight open-source logging tool specifically designed for Retrieval-Augmented Generation (RAG) applications. LangSmith is excellent, but my usage is quite minimal, and I would prefer a locally hosted version that is easy to customize.
Key features:
Detailed step-by-step pipeline tracking
Performance monitoring (embedding, retrieval, LLM generation)
Structured JSON logs with timing and metadata
Zero external dependencies
Easy integration with existing RAG systems
The tool helps debug RAG applications by tracking query understanding, embedding generation, document retrieval, and LLM responses. Each step is timed and logged with relevant metadata.
Really awesome seeing more people work on this! I’m one of the founders of Opik https://github.com/comet-ml/opik which does similar things but also has a UI and supports massive scale. Curious to hear if you have any feedback!
How is this a replacement for LangSmith? I browsed the source and I could only find what appear to be a few small helper functions for emitting structured logs.
I’m less familiar with LangSmith, but browsing their site suggests they happen to offer observability into LLM interactions in addition to other parts of the workflow lifecycle. This just seems to handle logging and you have to pass all the data yourself- it’s not instrumenting an LLM client, for example.
> in addition to other parts of the workflow lifecycle
FWIW this is primarily based on the LangChain framework so it's fairly turnkey, but has no integration with the rest of your application. You can use the @traceable decorator in python to decorate a custom function in code too, but this doesn't integrate with frameworks like OpenTelemetry, which makes it hard to see everything happens.
So for example, if your LLM feature is plugged into another feature area in the rest of your product, you need to do a lot more work to capture things like which user is involved, or if you did some post-processing on a response later down the road, what steps might have had to be taken to produce a better response, etc. It's quite useful for chat apps right now, but most enterprise RAG use cases will likely want to instrument with OpenTelemetry directly.
Congrats on the launch. Cool to see a RAG specific tracing tool. Excited to try it out. Full disclosure, I am the cofounder and core-maintainer of Langtrace(https://github.com/Scale3-Labs/langtrace) which is also an open source tool for tracing and observing your LLM stack and our SDKs are OTEL based. Based on my experience, I think the biggest challenge right now specifically for RAG pipelines is the lack of flexibility in the current crop of tracing tools to not just visualize the entire retrieval flow across all the components of the stack - the framework calls, vectorDB retrievals, re-ranker i/o if any and the final LLM inference. But, also being able to do experiments by freezing a setup, iterate on it and measuring the performance and improving it to clearly know how the changes map to the performance end to end. This is what we think about mostly while we are building Langtrace as well.
You can, which is why tools like Traceloop do this.
Although it's worth noting that long context + observability doesn't always work with o11y systems since they usually put limits on the size of a log body or trace attribute.
I've just published to Github my own LLM logging and debugging tool with local storage: https://github.com/zby/llm_recorder It is more for debugging than observability in production like your package.
I think I am ready to push it to PyPi now.
It replaces the llm client and logs everything that goes through it.
It is very simplistic in comparison with the remote loggers - but you can use all the local tools - like grep or your favourite editor. The feature that I needed from it is replaying past interactions. I use it for debugging execution paths that happens only sometimes. Can Langfuse do that?
Cool project, but this doesn't replace langsmith at all.
The power of langsmith is seeing full traces of moving through the graph and being able to inspect the inputs and outputs for each step. I suppose your framework supports that but langsmith is all free out of the box. Your code is really a replacement for open telemetry or something akin to new relic / datadog. Which is a much tougher sell IMO. Why use this over open telemetry?
Is anyone using Prometheus / Grafana for LLM metrics? Seems like there’s a lot of existing leverage there. What makes LLM metrics different than other performance metrics? Why not use a single system to collect and analyze both?
reply