Feedback First: Accelerating AI Development with Early Insights

Explore the pivotal role of early user feedback in refining and directing the development of generative AI applications.

Jerzy Czopek

March 08, 2024

Generative AI, LLM

Feedback First: Accelerating AI Development with Early Insights

In the rapidly evolving landscape of generative AI, developing applications based on Large Language Models (LLMs) presents a unique set of challenges and opportunities. As I have recently learned, one of the most critical steps in this journey is actively seeking out and using early user feedback. In this blog post, I will explain why it’s so crucial to listen to users right from the start.

Gaining Insights into User Interaction

When building LLM-based applications, understanding how users interact with your tool is crucial. This is especially true for solutions like chatbots that allow users to formulate their inquiries without any restrictions. It’s important to remember that LLM applications are meant to be used by a wide range of users, meaning users come with varying degrees of familiarity and expectations. Instead of constraining how users interact with these tools, it’s more beneficial to observe and learn if we can accommodate their requests. This approach not only enhances user satisfaction but also broadens our understanding of how to improve the application for a diverse user base.

Publishing your application early allows you to collect valuable feedback that sheds light on user expectations and queries. Often, it is only through observing real user interactions that we can gain a full understanding of what users are looking for and the type of answers they expect.

Managing User Expectations

Managing stakeholder expectations is crucial, as the outputs in the early stages might not be satisfactory, and the tool might have a lot of rough edges.

Stakeholders need to be aware that initial outputs might be far from perfection and could significantly misalign with user expectations. This is not uncommon when building LLM-based applications. It is also important to assure stakeholders that the quality will improve over time with several rounds of improvements, but they must provide constructive feedback.

Users should understand that it might take a couple of iterations to get the application to the level they expect. This underscores the importance of gathering feedback as soon as possible and iterating over the application to satisfy users. Constructive feedback from users is crucial for this iterative process. Detailed, actionable insights from users enable developers to make precise adjustments and improvements.

Influencing Project Direction

User feedback doesn’t just offer insights into user needs; it also plays a pivotal role in shaping the project’s direction. Based on feedback from early users, as well as from our own assessment of the inputs and outputs, we can clearly identify problems or gaps and make necessary adjustments.

Solutions to those problems could range from minor tweaks in the logic or prompts to significant changes in the application’s architecture, approach, or data processing methods. In a recent project I was involved in, user tests revealed that a simple vector search didn’t satisfy user queries. This led us to introduce a new and more structured data source, significantly improving the application’s effectiveness and user satisfaction.

Feedback Collection

To streamline the feedback gathering process, consider implementing automated processes. This approach not only saves time but also ensures that you’re consistently collecting data on how users interact with your application. Automated feedback can provide a wealth of information for setting up evaluation datasets and identifying areas for improvement. There are several tools/platforms out there already, such as:

LangSmith - A DevOps platform by LangChain for LLM application lifecycle management, offering features for development, testing, deployment, and monitoring, with seamless LangChain integration.
Langfuse - An open-source platform for debugging and iterating LLM applications. Offers collaborative tools and a self-hosted option, enhancing team efficiency.
PromptFlow - Facilitates the creation of executable flows that combine LLMs, prompts, and Python tools in a visual graph. It supports debugging, team collaboration, prompt testing, and deploying real-time endpoints to leverage LLMs effectively.

These tools not only facilitate the streamlined collection of user feedback but also support various stages of LLM application development, from debugging and collaboration to deployment and monitoring. The above list of tools is by no means exhaustive - the pace of growth in the whole generative AI ecosystem is impressive. New tools appear almost every day, and existing tools evolve just as quickly.

Setting up automated feedback gathering might require some work at the beginning but will definitely pay off in the long run.

Another valuable aspect of automated feedback and telemetry gathering is the ability to capture intermediate outputs, such as documents retrieved from a vector store in a RAG application. This capability facilitates the evaluation of context relevancy and enables the implementation and assessment of improvements to context quality. Adjustments can be made to search queries, the configuration of the underlying datastore, and possibly other areas.

Iterating on the Feedback

Using feedback from users and monitoring tools, we can rapidly improve the application. This process helps find places where improvements can be made, which might be as straightforward as:

Changing the order of messages in a chat model’s prompt,
Making the logic for rewriting queries better.

There could also be more complex changes needed, such as:

Adding a new data source,
Changing how the application uses the data source, like adding extra filters or sorting,
Fixing missing data caused by errors in how the data is processed.

This process of making changes based on feedback is key to improving the application, meeting user needs, and improving the overall performance.

Setting a Baseline

Early feedback also allows you to establish a baseline for your application, offering a clear view of its initial performance and user satisfaction levels. This insight is invaluable for tracking the application’s progress over time. Such feedback enables the creation of tests designed to prevent any decline in performance or satisfaction as the application evolves. It further facilitates the evaluation of the impact resulting from implemented changes and enhancements.

Conclusion

In the development of generative AI applications, early user feedback is not just beneficial; it’s essential. Martin Fowler also touched on the importance of user feedback in LLM-based application development in his recent blog post, Engineering Practices for LLM Application Development.

It provides developers with a deeper understanding of user interactions, influences the project’s direction, and helps in fine-tuning the application to better meet user expectations. After all, meeting those expectations is paramount – we build these applications for them.

Cover photo by Midjourney Bot v6