Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,131 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model:
|
3 |
+
- Lightricks/LTX-Video
|
4 |
+
library_name: diffusers
|
5 |
+
---
|
6 |
+
|
7 |
+
<p align="center">
|
8 |
+
<img src="https://github.com/mkturkcan/suturingmodels/blob/main/static/images/title.svg?raw=true" />
|
9 |
+
</p>
|
10 |
+
|
11 |
+
# Towards Suturing World Models (LTX-Video, i2v)
|
12 |
+
|
13 |
+
<p align="center">
|
14 |
+
<img src="https://github.com/mkturkcan/suturingmodels/blob/main/static/images/i2v_lora_sample.jpg?raw=true" />
|
15 |
+
</p>
|
16 |
+
|
17 |
+
|
18 |
+
This repository hosts the fine-tuned LTX-Video image-to-video (i2v) diffusion model specialized for generating realistic robotic surgical suturing videos, capturing fine-grained sub-stitch actions including needle positioning, targeting, driving, and withdrawal. The model can differentiate between ideal and non-ideal surgical techniques, making it suitable for applications in surgical training, skill evaluation, and autonomous surgical system development.
|
19 |
+
|
20 |
+
## Model Details
|
21 |
+
|
22 |
+
- **Base Model**: LTX-Video
|
23 |
+
- **Resolution**: 768×512 pixels (Adjustable)
|
24 |
+
- **Frame Length**: 49 frames per generated video (Adjustable)
|
25 |
+
- **Fine-tuning Method**: Low-Rank Adaptation (LoRA)
|
26 |
+
- **Data Source**: Annotated laparoscopic surgery exercise videos (∼2,000 clips)
|
27 |
+
|
28 |
+
## Usage Example
|
29 |
+
|
30 |
+
```python
|
31 |
+
import os
|
32 |
+
import argparse
|
33 |
+
import torch
|
34 |
+
from diffusers.utils import export_to_video, load_image
|
35 |
+
from stg_ltx_i2v_pipeline import LTXImageToVideoSTGPipeline
|
36 |
+
|
37 |
+
def generate_video_from_image(
|
38 |
+
image_path,
|
39 |
+
prompt,
|
40 |
+
output_dir="outputs",
|
41 |
+
width=768,
|
42 |
+
height=512,
|
43 |
+
num_frames=49,
|
44 |
+
lora_path="mehmetkeremturkcan/Suturing-LTX-I2V",
|
45 |
+
lora_weight=1.0,
|
46 |
+
prefix="suturingmodel, ",
|
47 |
+
negative_prompt="worst quality, inconsistent motion, blurry, jittery, distorted",
|
48 |
+
stg_mode="STG-A",
|
49 |
+
stg_applied_layers_idx=[19],
|
50 |
+
stg_scale=1.0,
|
51 |
+
do_rescaling=True
|
52 |
+
):
|
53 |
+
# Create output directory if it doesn't exist
|
54 |
+
if not os.path.exists(output_dir):
|
55 |
+
os.makedirs(output_dir)
|
56 |
+
# Load the model
|
57 |
+
pipe = LTXImageToVideoSTGPipeline.from_pretrained(
|
58 |
+
"a-r-r-o-w/LTX-Video-0.9.1-diffusers",
|
59 |
+
torch_dtype=torch.bfloat16,
|
60 |
+
local_files_only=False
|
61 |
+
)
|
62 |
+
# Apply LoRA weights
|
63 |
+
pipe.load_lora_weights(
|
64 |
+
lora_path,
|
65 |
+
weight_name="pytorch_lora_weights.safetensors",
|
66 |
+
adapter_name="suturing"
|
67 |
+
)
|
68 |
+
pipe.set_adapters("suturing", lora_weight)
|
69 |
+
pipe.to("cuda")
|
70 |
+
# Prepare the image and prompt
|
71 |
+
image = load_image(image_path).resize((width, height))
|
72 |
+
full_prompt = prefix + prompt if prefix else prompt
|
73 |
+
# Generate output filename
|
74 |
+
basename = os.path.basename(image_path).split('.')[0]
|
75 |
+
output_filename = f"{basename}_i2v.mp4"
|
76 |
+
output_path = os.path.join(output_dir, output_filename)
|
77 |
+
# Generate the video
|
78 |
+
print(f"Generating video with prompt: {full_prompt}")
|
79 |
+
video = pipe(
|
80 |
+
image=image,
|
81 |
+
prompt=full_prompt,
|
82 |
+
negative_prompt=negative_prompt,
|
83 |
+
width=width,
|
84 |
+
height=height,
|
85 |
+
num_frames=num_frames,
|
86 |
+
num_inference_steps=50,
|
87 |
+
decode_timestep=0.03,
|
88 |
+
decode_noise_scale=0.025,
|
89 |
+
generator=None,
|
90 |
+
stg_mode=stg_mode,
|
91 |
+
stg_applied_layers_idx=stg_applied_layers_idx,
|
92 |
+
stg_scale=stg_scale,
|
93 |
+
do_rescaling=do_rescaling
|
94 |
+
).frames[0]
|
95 |
+
|
96 |
+
# Export the video
|
97 |
+
export_to_video(video, output_path, fps=24)
|
98 |
+
print(f"Video saved to: {output_path}")
|
99 |
+
return output_path
|
100 |
+
|
101 |
+
generate_video_from_image(
|
102 |
+
image_path="../suturing_datasetv2/images/9_railroad_final_8487-8570_NeedleWithdrawalNonIdeal.png",
|
103 |
+
prompt="A needlewithdrawalnonideal clip, generated from a backhand task."
|
104 |
+
)
|
105 |
+
```
|
106 |
+
|
107 |
+
## Applications
|
108 |
+
- **Surgical Training**: Generate demonstrations of both ideal and non-ideal surgical techniques for training purposes.
|
109 |
+
- **Skill Evaluation**: Assess surgical skills by comparing actual procedures against model-generated standards.
|
110 |
+
- **Robotic Automation**: Inform autonomous surgical robotic systems for real-time guidance and procedure automation.
|
111 |
+
|
112 |
+
## Quantitative Performance
|
113 |
+
| Metric | Performance |
|
114 |
+
|-------------------------|---------------|
|
115 |
+
| L2 Reconstruction Loss | 0.24501 |
|
116 |
+
| Inference Time | ~18.7 seconds per video |
|
117 |
+
|
118 |
+
## Future Directions
|
119 |
+
Further improvements will focus on increasing model robustness, expanding the dataset diversity, and enhancing real-time applicability to robotic surgical scenarios.
|
120 |
+
|
121 |
+
## Citation
|
122 |
+
Please cite our work if you find this model useful:
|
123 |
+
|
124 |
+
```bibtex
|
125 |
+
@article{turkcan2024suturing,
|
126 |
+
title={Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks},
|
127 |
+
author={Turkcan, Mehmet Kerem and Ballo, Mattia and Filicori, Filippo and Kostic, Zoran},
|
128 |
+
journal={arXiv preprint arXiv:2024},
|
129 |
+
year={2024}
|
130 |
+
}
|
131 |
+
```
|