Version: 25.3

Refine the GenAI app based on evaluation insights

The end goal for GenAI evaluation is to use the insights to refine your LLM application or system until it is production-worthy and meets your criteria. This is a stage in the evaluation workflow.

Now that you have a trustworthy benchmark, you can use a variety to techniques to improve your GenAI system. These include:

LLM fine tuning: Fine tuning allows you to change the LLM's parameters to adapt its performance to your criteria. Snorkel integrates with Amazon SageMaker, one fine-tuning option. Snorkel's LLM fine tuning and alignment tutorial has an example of how to use SageMaker to fine tune your LLM.
RAG tuning: On request, Snorkel can provide an example notebook with instructions for using Snorkel to tune a RAG system.
Prompt development: Improve the system & user prompts used in your LLM application using Snorkel's Prompt development workflow.

Once your GenAI application has been sufficiently improved, it can undergo another round of evaluation. Continue to track your evaluation progress until the system meets your performance thresholds.

Next steps

After refining your GenAI app to meet initial performance requirements, you can continue to evaluate production data in an ongoing process. When you identify performance issues or areas for improvement through continued evaluations, you can revisit earlier steps in the evaluation workflow, such as refining your benchmark, creating a continuous cycle of assessment and improvement.

Next steps​

Next steps