outrider v1.6.3Live on the GitHub Marketplace
// product · the ExperimentOps platform

From hypothesis to decision, on the record

Remyx gives your AI development the structure of the scientific method. Recommendations grounded in your codebase and recent research, results captured from the tools you already run, and a decision record your whole team builds on.

// 01 capabilities

Everything an experiment needs

recommendations

Ranked candidates for what to try next, drawn from your codebase, your experiment history, and this week's research. Each one arrives with the evidence behind it.

research_interests

Named descriptions of what to track. Create one from free-form context, a GitHub repo, or a project's experiments, and the pipeline matches new arXiv work to it daily.

experiment_records

Every experiment links its hypothesis, PR, ticket, metrics, and decision in one place. Provenance stays intact from the first commit to the rollout.

validation

Results flow in from your eval suites, experiment trackers, and production signals, so every recommendation reflects your product, your data, and your users.

decision_records

Ship, iterate, or reject, each call captured with a structured rationale. Your team always knows which changes to keep and which to roll back.

portfolio_view

Leads see every active experiment, its trajectory, and pending calls in one dashboard, without interrupting anyone's flow.

// 02 decision_records

Every experiment leaves a record the next one builds on

What you tried, what happened, what you decided. Captured once, feeding every recommendation after.

  • hypothesis, change, and results, linked
  • the decision, with its rationale
  • searchable history your next hire can read
EXP-0412 · retrieval-reranker CLOSED · 2 DAYS
hypothesisRerank after retrieval raises groundedness within the latency budget.
changePR #214 rerank top-20 → top-5 before context build
resultsgroundedness +4.8 pts · answer relevance +2.9 pts · p95 +41ms, in budget
verdictSHIP holds on both eval suites; watch latency on long contexts

# illustrative example, not a customer result

changedexperimentverdict
Mretrieval/reranker.pyEXP-0412SHIP
Mtools/schema.jsonEXP-0405SHIP
Mprompts/system.mdEXP-0398ITERATE
Mrouter/fallback.pyEXP-0391REJECT
Mcontext/builder.pyEXP-0387SHIP

# illustrative example

// 03 the_loop

Every validated result sharpens the next recommendation

Each draft PR runs through your project's eval suite, offline and your A/B or online signal. A gate skips low-signal PRs; a pass promotes it and assigns reviewers, on your policy, starting observe-only.

Every validated result trains the engine, and the changes you evaluate and A/B test are the strongest signal. Over time, recommendations arrive higher-confidence and matched to a call site in your code, with less to triage.

// 04 integrations

Drops into the stack you already run

GitHub, Linear, Jira, and Slack for planning and shipping. MLflow, Weights & Biases, Arize, Langfuse, Statsig, and LaunchDarkly for measurement. Claude Code, Modal, and Hugging Face for build and run. More land every month.

# Claude Code today, more providers soon.

Put your next experiment on the record

Connect your repo and get your first recommendation in minutes.