Last updated

Open Source Contributions

Selected merged PRs to open-source systems I use for RL infrastructure and software-engineering agent training.

slime

  • CISPO advantage estimator - added the MiniMax-M1 CISPO estimator at slime's policy-loss seam, with tests for surrogate value and gradient routing.

Harbor