Features
Our best in class safety judge beats Prometheus2, Wildguard, Llamaguard3 and prompted GPT4, Claude-3.5 on BigGenBench.
Currently Top 5 for safety evaluations on RewardBench
Our lightening fast custom reliability judge evaluator beats beats Lynx, prompted GPT4, Claude-3.5 on Aggrefact and HaluBench