Arc-Intelligence/crm-01-4b

Model Card: Arc-Intelligence/crm-01-4b

crm-01-4B is a Qwen3-4B-based model fine-tuned to solve high-stakes CRM policy tasks (e.g., Quote Approval, Invalid Configuration Identification). Training focused on tool-aware instruction following, enabling the model to call helper functions defined in crm_sandbox/env/functions.py and return concise, policy-compliant answers. It inherits all Qwen3 capabilities—including the optional thinking / non-thinking modes—and expands them with domain-specific skills.

It was trained with our Learning Orchestrator method on the CRMArena-Pro benchmark, focusing on the Policy skill category. The resulting model can ingest user queries, invoke the correct Salesforce-style tool functions, and return definitive policy-compliant answers.

Intended Use

We recommend integrating the model in an agent framework (e.g., LangChain, CrewAI) or calling it directly via transformers. It is capable of:

Primary Use: Executing CRM policy tasks end-to-end – validating discounts, approvals, or configuration rules – by calling the appropriate helper functions and returning a final answer.
Out-of-Scope: Open-ended chit-chat, generic creative writing, or domains far outside CRM without additional fine-tuning.

# Quick start with Hugging Face transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "Arc-Intelligence/crm-01-4b"
tok = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")

agent = pipeline("text-generation", model=model, tokenizer=tok, max_new_tokens=256)

FEW_SHOT = (
    "You are an enterprise policy agent.\n"
    "• Answer in ≤ 3 sentences.\n"
    "• Use tool calls when needed.\n"
    "• Tool call format:\n"
    "  <tool> {\"name\": \"search_knowledge_articles\", \"arguments\": {"""search_term""": "..."}} </tool>\n"
    "\n"
)

prompt = FEW_SHOT + "USER: /think Can I apply a 20% discount on Analytics Suite?"  # '/think' triggers Qwen3 reasoning mode

print(agent(prompt)[0]["generated_text"])  # The model may include a <think> block followed by the answer

Expected Output (concise):

 • Check discount policy via tool call:
  <tool>{"name":"search_knowledge_articles","arguments":{"search_term":"discount policy"}}</tool>
 • If max_discount ≥ requested, approve; otherwise call:
  <tool>{"name":"approval_workflow","arguments":{"deal_value":"..."}}</tool>
 • Validate deal size & customer tier before final submit.

Limitations and Bias

This model inherits the potential biases of its base model (Qwen3-4B) and the CRMArena-Pro dataset. In addition, users should be mindful of the following:

Verbosity – The model tends to elaborate unless guided. Include a few-shot prefix (see example) specifying tone, length, and format to obtain concise output.
Skill Specialisation – Training focused on Policy tasks (32 % average gain; 57 % on Quote Approval). Performance on CRUD, Analytics, or Knowledge tasks may lag and may require additional fine-tuning.
Tool Schema Dependence – Generated strategies assume the tool signatures defined in crm_sandbox/env/functions.py. Mismatched tool schemas will degrade effectiveness.
Domain Bias – Knowledge is centred on CRM scenarios. For non-CRM domains outputs may be generic.

Changelog (Training Highlights)

Learning Orchestrator fine-tuning – 45-minutes run on 2× H100 GPUs.
Policy skill accuracy +32.1 % (CRMArena).
Quote Approval task +57 % relative improvement.

For full methodology and benchmark details, see our blog post.

Training Data

The model was fine-tuned on the B2B split of the CRMArena-Pro dataset. This split covers 19 enterprise-grade tasks spanning 4 business skill categories (Workflow, Policy, Text, Database) and 3 difficulty levels (Easy / Medium / Hard). The B2B training set contains 2,140 procedural workflow examples, all derived from a simulated Salesforce environment representative of real B2B organisations.

Training Procedure

The model was trained using our Learning Orchestrator methodology, which implements a Self-Evolving Curriculum (SEC). A Qwen3-4B base model was fine-tuned with Group Relative Policy Optimization (GRPO). The reward signal came directly from task success on CRMArena-Pro policy tasks, enabling the agent to learn when and how to call the appropriate tools and produce concise, compliant answers. For full details, please see our [Technical Report].

Evaluation Results

crm-01-4B achieved a +32.1% relative improvement in overall task success (Policy category) on a held-out set of CRMArena-Pro tasks.

Key Improvement Area: Performance on high-stakes, policy-driven tasks saw the most significant gains, with a +57.0% relative improvement on the "Quote Approval" task.

Arc-Intelligence
/

crm-01-4b