File size: 3,350 Bytes
dd50fcb
 
19284a5
 
dd50fcb
 
 
 
 
59ebedd
dd50fcb
 
 
b9ee771
dd50fcb
19284a5
dd50fcb
 
 
 
 
 
 
 
 
 
aa46ef0
dd50fcb
 
 
d489eac
9ec37ee
 
 
 
dd50fcb
 
 
aa46ef0
dd50fcb
aa46ef0
 
2f21db7
aa46ef0
2f21db7
19284a5
dd50fcb
 
 
 
59ebedd
19284a5
dd50fcb
 
 
 
aa46ef0
dd50fcb
 
 
 
 
 
 
 
 
 
 
aa46ef0
dd50fcb
 
 
aa46ef0
dd50fcb
 
9e6dbbd
dd50fcb
 
19284a5
dd50fcb
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis, AAAI 2025.

### [Arxiv Paper](https://arxiv.org/abs/2412.12225)

## Main Contributions

Our main contributions can be summarized as follows:

- **Proposed Framework:** In this study, we propose a Disentangled-Language-Focused (DLF) multimodal representation learning framework to promote MSA tasks. The framework follows a structured pipeline: feature extraction, disentanglement, enhancement, fusion, and prediction.
- **Language-Focused Attractor (LFA):** We develop the LFA to fully harness the potential of the dominant language modality within the modality-specific space. The LFA exploits the language-guided multimodal cross-attention mechanisms to achieve a targeted feature enhancement ($X$->Language).
- **Hierarchical Predictions:** We devise hierarchical predictions to leverage the pre-fused and post-fused features, improving the total MSA accuracy. 


## The Framework
![](./imgs/Framework.png)
The framework of DLF. Please refer to [Paper Link](https://arxiv.org/abs/2412.12225) for details.


## Usage

### Prerequisites
- Python 3.9.13
- PyTorch 1.13.0
- CUDA 11.7

### Installation
- Create a conda environment. Please make sure you have installed conda before.
```
conda create -n DLF python==3.9.13
```
- Activate the built DLF environment.
```
conda activate DLF
```
- Install Pytorch with CUDA
```
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
```
- Clone this repo.
```
git clone https://github.com/pwang322/DLF.git
```
- Install the necessary packages.
```
cd DLF
pip install -r requirements.txt
```

### Datasets
Data files (containing processed MOSI, MOSEI datasets) can be downloaded from [here](https://drive.google.com/drive/folders/1BBadVSptOe4h8TWchkhWZRLJw8YG_aEi?usp=sharing). 
You can first build and then put the downloaded datasets into `./dataset` directory and revise the path in `./config/config.json`. For example, if the processed the MOSI dataset is located in `./dataset/MOSI/aligned_50.pkl`. Please make sure "dataset_root_dir": "./dataset" and "featurePath": "MOSI/aligned_50.pkl".
Please note that the meta information and the raw data are not available due to the privacy of YouTube content creators. For more details, please follow the [official website](https://github.com/ecfm/CMU-MultimodalSDK) of these datasets.

### Run the Codes
- Training

You can first set the training dataset name in `./train.py` as "mosei" or "mosi", and then run:
```
python3 train.py
```
By default, the trained model will be saved in `./pt` directory. You can change this in `train.py`.

- Testing

You can first set the testing dataset name in `./test.py` as "mosei" or "mosi", and then test the trained model:
```
python3 test.py
```
We also provide pre-trained models for testing. ([Google drive](https://drive.google.com/drive/folders/1GgCfC1ITAnRRw6RScGc7c2YUg5Ccbdba?usp=sharing))


### Citation
If you find the code and our idea helpful in your research or work, please cite the following paper.

```
@article{wang2024dlf,
  title={DLF: Disentangled-Language-Focused Multimodal Sentiment Analysis},
  author={Wang, Pan and Zhou, Qiang and Wu, Yawen and Chen, Tianlong and Hu, Jingtong},
  journal={arXiv preprint arXiv:2412.12225},
  year={2024}
}
```