Today, we are open-sourcing SAGE-OSS-40B, a 40 billion parameter mixture-of-experts language model, heavily fine-tuned natively around the LoopCoder architecture and released under the Apache 2.0 license.
This is not our latest model, nor is it trained entirely from scratch. It is something different: a window into how we think about reasoning at SAGEA under extreme compute constraints.
Why We're Releasing This
As a small research lab in Nepal operating primarily on computing grants and barter arrangements, training massive dense models from scratch is largely out of our reach. Instead, we have to think differently about reasoning, parameter utilization, and generalization. SAGE-OSS-40B represents our strategy of starting with an existing open-source MoE base and drastically altering its logic pathways with our own structural adaptations.
By integrating Multi-Context Heads (MCH) and Inverse Reasoning (IR) onto the LoopCoder architecture, we sought to embed iterative reasoning directly into the model. This is not just prompting behavior or a post-training patch—it is a core architectural modification. The model performs two reasoning passes over a sliding token window, evaluating backward viability using IR, before finally producing output.
We built this before Actus and before Celer 2.5. It informed how we think about reasoning in our current model families, and we believe it is worth sharing with the community.
Architecture Overview
SAGE-OSS-40B is a 40B mixture-of-experts model with several notable properties:
- Multi-Context Heads (MCH): Dedicated attention heads structurally trained to track independent and parallel reasoning paths.
- Inverse Reasoning (IR): A constraint-driven backward pass that verifies logical steps prior to the forward text generation.
- Loop reasoning:
loop_num: 2iterative passes with aloop_window_sizeof 64 tokens via LoopCoder. - Long context: 131,072-token context window with RoPE theta at 500,000.
- Efficient attention: Grouped query attention with 40 attention heads and 8 KV heads.
- BF16 release: Fine-tuned and released in bfloat16; can fit on a single A100 with appropriate quantization.
- 76,800 vocabulary: Expanded vocabulary reflecting multilingual and domain-specific training needs.
- 80 layers, hidden size 5120: A deep architecture designed for robust representational capacity.
The model uses a custom sageloopcoder architecture with dedicated config and modeling files. You will need trust_remote_code=True to run it.
Intended Use
This is a research model. It is not instruction-tuned and it is not RLHF-aligned. If you are looking for a production chat assistant, this is not the right model.
It is a base model with a distinct reasoning architecture that the community can study, fine-tune, and extend. If you are researching iterative inference, loop-based reasoning, or MoE scaling, this release is designed for you.
Comparison Against Current SAGE Models
To set expectations clearly, SAGE-OSS-40B trails our current flagship production lines. That is expected: this model reflects an earlier research direction.
Fig 1: MMLU and AgentBench values are internal directional scores for relative positioning. They are not external leaderboard claims.
The takeaway is straightforward: SAGE-OSS-40B is a useful and study-worthy open model. It outperforms Celer Low 2.5, while Celer Mid 2.5 and Celer High 2.5 remain ahead by a moderate margin. SAGE 2.4 Actus continues to lead, especially on agentic workloads.
Safety & Responsibility
As part of our commitment to responsible AI development, we are explicitly clarifying the nature of this release. SAGE-OSS-40B is a foundational research artifact, not a highly aligned consumer product.
Unlike our production models (SAGE Actus and SAGE Celer), this system has not undergone extensive constitutional alignment, instruction-tuning, or Reinforcement Learning from Human Feedback (RLHF). Because it is a raw baseline model, it reflects the biases present within its training distribution and may produce unsafe, unethical, or problematic outputs if subjected to adversarial prompting.
Prior to this open release, we conducted internal red-teaming evaluations to detect specific catastrophic risks—namely, capabilities related to chemical, biological, radiological, and nuclear (CBRN) threat proliferation. Our evaluation found that SAGE-OSS-40B remains within the bounds of safety for open weights, as it provides no specialized operational knowledge beyond what is already publicly accessible.
We release SAGE-OSS-40B to accelerate scientific inquiry into MoE parameter scaling and efficient iterative reasoning. However, any developers seeking to integrate this architecture into downstream environments are strongly advised to implement their own domain-specific guardrails, prompt filtering layer, and post-training alignment mechanisms.

