From hypothesis to decision, on the record

Remyx gives your AI development the structure of the scientific method. Recommendations grounded in your codebase and recent research, results captured from the tools you already run, and a decision record your whole team builds on.

Start for free See pricing

// 01 capabilities

Everything an experiment needs

recommendations

Ranked candidates for what to try next, drawn from your codebase, your experiment history, and this week's research. Each one arrives with the evidence behind it.

research_interests

Named descriptions of what to track. Create one from free-form context, a GitHub repo, or a project's experiments, and the pipeline matches new arXiv work to it daily.

experiment_records

Every experiment links its hypothesis, PR, ticket, metrics, and decision in one place. Provenance stays intact from the first commit to the rollout.

validation

Results flow in from your eval suites, experiment trackers, and production signals, so every recommendation reflects your product, your data, and your users.

decision_records

Ship, iterate, or reject, each call captured with a structured rationale. Your team always knows which changes to keep and which to roll back.

portfolio_view

Leads see every active experiment, its trajectory, and pending calls in one dashboard, without interrupting anyone's flow.

// 02 decision_records

Every experiment leaves a record the next one builds on

What you tried, what happened, what you decided. Captured once, feeding every recommendation after.

hypothesis, change, and results, linked
the decision, with its rationale
searchable history your next hire can read

EXP-0412 · retrieval-reranker CLOSED · 2 DAYS

hypothesisRerank after retrieval raises groundedness within the latency budget.

changePR #214 rerank top-20 → top-5 before context build

resultsgroundedness +4.8 pts · answer relevance +2.9 pts · p95 +41ms, in budget

verdictSHIP holds on both eval suites; watch latency on long contexts

# illustrative example, not a customer result

changedexperimentverdict

Mretrieval/reranker.pyEXP-0412SHIP

Mtools/schema.jsonEXP-0405SHIP

Mprompts/system.mdEXP-0398ITERATE

Mrouter/fallback.pyEXP-0391REJECT

Mcontext/builder.pyEXP-0387SHIP

# illustrative example

// 03 the_loop

Every validated result sharpens the next recommendation

Each draft PR runs through your project's eval suite, offline and your A/B or online signal. A gate skips low-signal PRs; a pass promotes it and assigns reviewers, on your policy, starting observe-only.

Every validated result trains the engine, and the changes you evaluate and A/B test are the strongest signal. Over time, recommendations arrive higher-confidence and matched to a call site in your code, with less to triage.

// 04 integrations

Drops into the stack you already run

GitHub, Linear, Jira, and Slack for planning and shipping. MLflow, Weights & Biases, Arize, Langfuse, Statsig, and LaunchDarkly for measurement. Claude Code, Modal, and Hugging Face for build and run. More land every month.

# Claude Code today, more providers soon.

Put your next experiment on the record

Connect your repo and get your first recommendation in minutes.

Start for free See pricing