File size: 3,350 Bytes
dd50fcb 19284a5 dd50fcb 59ebedd dd50fcb b9ee771 dd50fcb 19284a5 dd50fcb aa46ef0 dd50fcb d489eac 9ec37ee dd50fcb aa46ef0 dd50fcb aa46ef0 2f21db7 aa46ef0 2f21db7 19284a5 dd50fcb 59ebedd 19284a5 dd50fcb aa46ef0 dd50fcb aa46ef0 dd50fcb aa46ef0 dd50fcb 9e6dbbd dd50fcb 19284a5 dd50fcb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
# DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis, AAAI 2025.
### [Arxiv Paper](https://arxiv.org/abs/2412.12225)
## Main Contributions
Our main contributions can be summarized as follows:
- **Proposed Framework:** In this study, we propose a Disentangled-Language-Focused (DLF) multimodal representation learning framework to promote MSA tasks. The framework follows a structured pipeline: feature extraction, disentanglement, enhancement, fusion, and prediction.
- **Language-Focused Attractor (LFA):** We develop the LFA to fully harness the potential of the dominant language modality within the modality-specific space. The LFA exploits the language-guided multimodal cross-attention mechanisms to achieve a targeted feature enhancement ($X$->Language).
- **Hierarchical Predictions:** We devise hierarchical predictions to leverage the pre-fused and post-fused features, improving the total MSA accuracy.
## The Framework

The framework of DLF. Please refer to [Paper Link](https://arxiv.org/abs/2412.12225) for details.
## Usage
### Prerequisites
- Python 3.9.13
- PyTorch 1.13.0
- CUDA 11.7
### Installation
- Create a conda environment. Please make sure you have installed conda before.
```
conda create -n DLF python==3.9.13
```
- Activate the built DLF environment.
```
conda activate DLF
```
- Install Pytorch with CUDA
```
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
```
- Clone this repo.
```
git clone https://github.com/pwang322/DLF.git
```
- Install the necessary packages.
```
cd DLF
pip install -r requirements.txt
```
### Datasets
Data files (containing processed MOSI, MOSEI datasets) can be downloaded from [here](https://drive.google.com/drive/folders/1BBadVSptOe4h8TWchkhWZRLJw8YG_aEi?usp=sharing).
You can first build and then put the downloaded datasets into `./dataset` directory and revise the path in `./config/config.json`. For example, if the processed the MOSI dataset is located in `./dataset/MOSI/aligned_50.pkl`. Please make sure "dataset_root_dir": "./dataset" and "featurePath": "MOSI/aligned_50.pkl".
Please note that the meta information and the raw data are not available due to the privacy of YouTube content creators. For more details, please follow the [official website](https://github.com/ecfm/CMU-MultimodalSDK) of these datasets.
### Run the Codes
- Training
You can first set the training dataset name in `./train.py` as "mosei" or "mosi", and then run:
```
python3 train.py
```
By default, the trained model will be saved in `./pt` directory. You can change this in `train.py`.
- Testing
You can first set the testing dataset name in `./test.py` as "mosei" or "mosi", and then test the trained model:
```
python3 test.py
```
We also provide pre-trained models for testing. ([Google drive](https://drive.google.com/drive/folders/1GgCfC1ITAnRRw6RScGc7c2YUg5Ccbdba?usp=sharing))
### Citation
If you find the code and our idea helpful in your research or work, please cite the following paper.
```
@article{wang2024dlf,
title={DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis},
author={Wang, Pan and Zhou, Qiang and Wu, Yawen and Chen, Tianlong and Hu, Jingtong},
journal={arXiv preprint arXiv:2412.12225},
year={2024}
}
```
|