aws-neuron
/

optimum-neuron-cache

Model card Files Files and versions Community

optimum-neuron-cache / neuronxcc-2.15.128.0+56dc5a86 /0_REGISTRY /0.0.25.dev0 /training /llama /meta-llama /Llama-3.2-1B /098fefb1fa6e4aed109d.json

samir-souza

Synchronizing local compiler cache.

90f79a2 verified 8 months ago

raw

history blame

4.1 kB

{"architectures": ["LlamaForCausalLM"], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128256, "eos_token_id": 128257, "head_dim": 64, "hidden_act": "silu", "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 8192, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "neuron": {"compiler_version": "2.15.128.0+56dc5a86", "input_specs": {"chosen": ["<|im_start|>user\nFind the quotient when $x^5-23x^3+11x^2-14x+10$ is divided by $x+5$.<|im_end|>\n<|im_start|>assistant\nI recognize this as a polynomial division problem, and I recall that there is a method called synthetic division that can help me find the quotient more efficiently than using long division.\nSynthetic division involves writing the coefficients of the dividend polynomial (the one being divided) in a row, and writing the opposite of the divisor's constant term in a box to the left of the row.\nSo, in this case, I write -5 in the box, and I write 1, -23, 11, -14, 10 in the row, leaving some space below them.\nI also notice that the dividend polynomial is missing a term with $x^4$, so I insert a zero as a placeholder for that coefficient.\nNow I have:\n\n -5 | 1 0 -23 11 -14 10\n |_____________________________\nNext, I start the synthetic division process by bringing down the first coefficient, 1, to the bottom row.\nThen I multiply -5 by 1 and write the result, -5, below the second coefficient, 0.\nI add 0 and -5 and write the sum, -5, in the bottom row.\nI repeat this process for the rest of the coefficients: multiply -5 by -5 and write 25 below -23, add -23 and 25 and write 2 in the bottom row, multiply -5 by 2 and write -10 below 11, add 11 and -10 and write 1 in the bottom row, multiply -5 by 1 and write -5 below -14, add -14 and -5 and write -19 in the bottom row, multiply -5 by -19 and write 95 below 10, add 10 and 95 and write 105 in the bottom row.\nNow I have:\n\n -5 | 1 0 -23 11 -14 10\n | -5 25 -10 -5 95\n |_____________________________\n 1 -5 2 1 -19 105\nThe last number in the bottom row, 105, is the remainder of the division.\nThe other numbers in the bottom row are the coefficients of the quotient polynomial, starting from the highest power of x and decreasing by one.\nSo, the quotient polynomial is $x^4-5x^3+2x^2+x-19$, and the remainder is 105.\nI can write the final answer as the quotient polynomial plus the remainder divided by the divisor, like this:\n\n# Answer\n\nx^4-5x^3+2x^2+x-19+105/(x+5)<|im_end|>\n"], "chosen_attention_mask": [1, 512], "chosen_input_ids": [1, 512], "chosen_labels": [1, 512], "prompt": ["Find the quotient when $x^5-23x^3+11x^2-14x+10$ is divided by $x+5$."], "prompt_attention_mask": [1, 512], "prompt_input_ids": [1, 512], "question": ["Find the quotient when $x^5-23x^3+11x^2-14x+10$ is divided by $x+5$."], "rejected": ["<|im_start|>user\nFind the quotient when $x^5-23x^3+11x^2-14x+10$ is divided by $x+5$.<|im_end|>\n<|im_start|>assistant\nWe use synthetic division.\n\\[\n\\begin{array}{rrrrrrr}\n\\multicolumn{1}{r|}{-5} & {1} & 0 & -23 & 11 & -14 & 10 \\\\\n\\multicolumn{1}{r|}{} & & -5& 25& -10 & -5 & 95 \\\\\n\\cline{2-7}\n & 1& -5& 2& 1 & -19& \\multicolumn{1}{|r}{105} \\\\\n\\end{array}\n\\]So we have a quotient of $\\boxed{x^4-5x^3+2x^2+x-19}$ and a remainder of $105$.<|im_end|>\n"], "rejected_attention_mask": [1, 512], "rejected_input_ids": [1, 512], "rejected_labels": [1, 512], "source": ["prm_dpo_pairs"]}, "model_class": "PeftModelForCausalLM", "num_neuron_cores_per_node": 2, "pipeline_parallel_size": 1, "precision": "bfloat16", "tensor_parallel_size": 1, "training": true}, "num_attention_heads": 32, "num_hidden_layers": 16, "num_key_value_heads": 8, "pad_token_id": 128257, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": {"factor": 32.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3"}, "rope_theta": 500000.0, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "use_cache": true, "vocab_size": 128258}