Model Card: Arc-Intelligence/crm-01-4b
crm-01-4B
is a Qwen3-4B-based model fine-tuned to solve high-stakes CRM policy tasks (e.g., Quote Approval, Invalid Configuration Identification). Training focused on tool-aware instruction following, enabling the model to call helper functions defined in crm_sandbox/env/functions.py
and return concise, policy-compliant answers. It inherits all Qwen3 capabilities—including the optional thinking / non-thinking modes—and expands them with domain-specific skills.
It was trained with our Learning Orchestrator method on the CRMArena-Pro
benchmark, focusing on the Policy skill category. The resulting model can ingest user queries, invoke the correct Salesforce-style tool functions, and return definitive policy-compliant answers.
Intended Use
We recommend integrating the model in an agent framework (e.g., LangChain, CrewAI) or calling it directly via transformers
. It is capable of:
- Primary Use: Executing CRM policy tasks end-to-end – validating discounts, approvals, or configuration rules – by calling the appropriate helper functions and returning a final answer.
- Out-of-Scope: Open-ended chit-chat, generic creative writing, or domains far outside CRM without additional fine-tuning.
# Quick start with Hugging Face transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
model_id = "Arc-Intelligence/crm-01-4b"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
agent = pipeline("text-generation", model=model, tokenizer=tok, max_new_tokens=256)
FEW_SHOT = (
"You are an enterprise policy agent.\n"
"• Answer in ≤ 3 sentences.\n"
"• Use tool calls when needed.\n"
"• Tool call format:\n"
" <tool> {\"name\": \"search_knowledge_articles\", \"arguments\": {"""search_term""": "..."}} </tool>\n"
"\n"
)
prompt = FEW_SHOT + "USER: /think Can I apply a 20% discount on Analytics Suite?" # '/think' triggers Qwen3 reasoning mode
print(agent(prompt)[0]["generated_text"]) # The model may include a <think> block followed by the answer
Expected Output (concise):
• Check discount policy via tool call:
<tool>{"name":"search_knowledge_articles","arguments":{"search_term":"discount policy"}}</tool>
• If max_discount ≥ requested, approve; otherwise call:
<tool>{"name":"approval_workflow","arguments":{"deal_value":"..."}}</tool>
• Validate deal size & customer tier before final submit.
Limitations and Bias
This model inherits the potential biases of its base model (Qwen3-4B
) and the CRMArena-Pro dataset. In addition, users should be mindful of the following:
- Verbosity – The model tends to elaborate unless guided. Include a few-shot prefix (see example) specifying tone, length, and format to obtain concise output.
- Skill Specialisation – Training focused on Policy tasks (32 % average gain; 57 % on Quote Approval). Performance on CRUD, Analytics, or Knowledge tasks may lag and may require additional fine-tuning.
- Tool Schema Dependence – Generated strategies assume the tool signatures defined in
crm_sandbox/env/functions.py
. Mismatched tool schemas will degrade effectiveness. - Domain Bias – Knowledge is centred on CRM scenarios. For non-CRM domains outputs may be generic.
Changelog (Training Highlights)
- Learning Orchestrator fine-tuning – 45-minutes run on 2× H100 GPUs.
- Policy skill accuracy +32.1 % (CRMArena).
- Quote Approval task +57 % relative improvement.
For full methodology and benchmark details, see our blog post.
Training Data
The model was fine-tuned on the B2B split of the CRMArena-Pro
dataset. This split covers 19 enterprise-grade tasks spanning 4 business skill categories (Workflow, Policy, Text, Database) and 3 difficulty levels (Easy / Medium / Hard). The B2B training set contains 2,140 procedural workflow examples, all derived from a simulated Salesforce environment representative of real B2B organisations.
Training Procedure
The model was trained using our Learning Orchestrator methodology, which implements a Self-Evolving Curriculum (SEC). A Qwen3-4B
base model was fine-tuned with Group Relative Policy Optimization (GRPO). The reward signal came directly from task success on CRMArena-Pro policy tasks, enabling the agent to learn when and how to call the appropriate tools and produce concise, compliant answers. For full details, please see our [Technical Report].
Evaluation Results
crm-01-4B
achieved a +32.1% relative improvement in overall task success (Policy category) on a held-out set of CRMArena-Pro
tasks.
- Key Improvement Area: Performance on high-stakes, policy-driven tasks saw the most significant gains, with a +57.0% relative improvement on the "Quote Approval" task.
- Downloads last month
- 13