--- language: - en library_name: transformers license: apache-2.0 pipeline_tag: text-generation tags: - PolyCom - PolyNorm - PolyReLU --- # Introduction This repository contains the checkpoints of ICLR 2025 paper **[“Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models](https://arxiv.org/pdf/2411.03884)”.** In this work, we introduce a novel activation function called **Polynomial Composition (PolyCom)**, which enhances the expressiveness of large language models (LLMs) through dynamic polynomial compositions. Our method significantly improves the performance of dense and mixture of experts (MoE) models across a variety of downstream tasks, without adding significant computational overhead. # Datasets and Training We use the [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) dataset and pretrain the PolyCom model on 250B tokens. For more training details, please refer to [the source code](https://github.com/BryceZhuo/PolyCom). # Inference Here is an example of how to use the PolyCom model for inference: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained(path_of_model, device_map="cuda",trust_remote_code=True) tokenizer = AutoTokenizer.from_pretrained(path_of_model, padding_side="right",trust_remote_code=True) prompt = "Hello, my name is" input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda') greedy_output = model.generate(input_ids) print(tokenizer.decode(greedy_output[0], skip_special_tokens=True)) ``` # Citing this work If you find this work helpful or use it in your research, please consider citing our paper: ```bibtex @inproceedings{zhuo2025polycom, title={Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models}, author={Zhijian Zhuo and Ya Wang and Yutao Zeng and Xiaoqing Li and Xun Zhou and Jinwen Ma}, booktitle={ICLR 2025}, year={2025} } ```