Usage & Documentation

Get started with Moxin-LLM. Find guides for running inference, optimizing for deployment, and fine-tuning the model for your own applications.

Quick Start

Run Moxin-LLM in Minutes

Get up and running quickly using the Hugging Face `transformers` library. This example uses the `Moxin-7B-Instruct` model.


from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "moxin-org/moxin-instruct-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Format the prompt for instruction-following
prompt = "Can you explain the concept of regularization in machine learning?"
formatted_prompt = f"<|user|>\n{prompt}<|end|>\n<|assistant|>"

inputs = tokenizer(formatted_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Guides

Prompting for Best Results

Using Moxin-7B-Instruct

The instruct model is fine-tuned for dialogue and following commands. For best results, structure your prompts as a conversation. The example above shows a standard format.

Using Moxin-7B-Reasoning

This model excels at chain-of-thought (CoT) tasks like math and logic. It was enhanced with Group Relative Policy Optimization (GRPO). To leverage its full potential, ask it to "think step-by-step" or "show its work."

Deployment & Optimization

High Performance on the Edge

Optimized for On-Device AI

Moxin-LLM is specifically designed for efficient performance on edge devices like PCs and mobile phones. This focus addresses the need for privacy and low-latency applications.

The OminiX Engine

For best performance, we recommend using our self-developed OminiX inference and fine-tuning engine. It is optimized for various edge hardware, including domestic NPUs.

Proven Efficiency

Our optimization techniques are powerful enough to deploy a 235B parameter model on a single notebook computer, achieving speeds of around 14 tokens per second.

Fine-Tuning Moxin-LLM

Leverage Moxin's complete openness to create your own specialized models. Our transparency with training data and scripts makes the fine-tuning process more efficient and effective.

Step 1: Start with Moxin-7B-Base

The `Moxin-7B-Base` model is the ideal starting point for any custom fine-tuning project.

Step 2: Prepare Your Custom Dataset

Collect and format your data for a specific task, such as robotics commands, professional translation terms, or any other domain-specific knowledge.

Step 3: Run the Fine-Tuning Process

Use standard open-source training scripts to fine-tune the model on your dataset. Our open approach ensures you have full control and visibility.

Step 4: Deploy Your Custom Model

Once trained, your specialized model is ready to be deployed, bringing powerful, customized AI to your specific application.