Features

One-stop Post-training Platform

Evaluate AI on bespoke metrics using Collinear AI Judges to quantify gap in performance

safety judge
Collinear Guard

Our best in class safety judge beats Prometheus2, Wildguard, Llamaguard3 and prompted GPT4, Claude-3.5 on BigGenBench.
Currently Top 5 for safety evaluations on RewardBench

Supports 3 Key Safety evaluation tasks:
Prompt
Response
Refusal
RELIABILITY judge
Veritas

Our lightening fast custom reliability judge evaluator beats beats Lynx, prompted GPT4, Claude-3.5 on Aggrefact and HaluBench

Compatible with any input formats:
NLI
QA
Dialog

Fix any gaps with

SYNTHETIC DATA

Weaver

Synthetic Data Generation Engine for conversational and preference data
Supports automated data curation for:
Instruction following
Preference data
ALIGNMENT

Auto-align

Use your judge output to optimize desired behavior and limit any undesired behavior
Supports fine tuning for:
SFT
RLAIF