mehmetkeremturkcan commited on
Commit
2cf332f
·
verified ·
1 Parent(s): 62c6c47

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -0
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Lightricks/LTX-Video
4
+ library_name: diffusers
5
+ ---
6
+
7
+ <p align="center">
8
+ <img src="https://github.com/mkturkcan/suturingmodels/blob/main/static/images/title.svg?raw=true" />
9
+ </p>
10
+
11
+ # Towards Suturing World Models (LTX-Video, i2v)
12
+
13
+ <p align="center">
14
+ <img src="https://github.com/mkturkcan/suturingmodels/blob/main/static/images/i2v_lora_sample.jpg?raw=true" />
15
+ </p>
16
+
17
+
18
+ This repository hosts the fine-tuned LTX-Video image-to-video (i2v) diffusion model specialized for generating realistic robotic surgical suturing videos, capturing fine-grained sub-stitch actions including needle positioning, targeting, driving, and withdrawal. The model can differentiate between ideal and non-ideal surgical techniques, making it suitable for applications in surgical training, skill evaluation, and autonomous surgical system development.
19
+
20
+ ## Model Details
21
+
22
+ - **Base Model**: LTX-Video
23
+ - **Resolution**: 768×512 pixels (Adjustable)
24
+ - **Frame Length**: 49 frames per generated video (Adjustable)
25
+ - **Fine-tuning Method**: Low-Rank Adaptation (LoRA)
26
+ - **Data Source**: Annotated laparoscopic surgery exercise videos (∼2,000 clips)
27
+
28
+ ## Usage Example
29
+
30
+ ```python
31
+ import os
32
+ import argparse
33
+ import torch
34
+ from diffusers.utils import export_to_video, load_image
35
+ from stg_ltx_i2v_pipeline import LTXImageToVideoSTGPipeline
36
+
37
+ def generate_video_from_image(
38
+ image_path,
39
+ prompt,
40
+ output_dir="outputs",
41
+ width=768,
42
+ height=512,
43
+ num_frames=49,
44
+ lora_path="mehmetkeremturkcan/Suturing-LTX-I2V",
45
+ lora_weight=1.0,
46
+ prefix="suturingmodel, ",
47
+ negative_prompt="worst quality, inconsistent motion, blurry, jittery, distorted",
48
+ stg_mode="STG-A",
49
+ stg_applied_layers_idx=[19],
50
+ stg_scale=1.0,
51
+ do_rescaling=True
52
+ ):
53
+ # Create output directory if it doesn't exist
54
+ if not os.path.exists(output_dir):
55
+ os.makedirs(output_dir)
56
+ # Load the model
57
+ pipe = LTXImageToVideoSTGPipeline.from_pretrained(
58
+ "a-r-r-o-w/LTX-Video-0.9.1-diffusers",
59
+ torch_dtype=torch.bfloat16,
60
+ local_files_only=False
61
+ )
62
+ # Apply LoRA weights
63
+ pipe.load_lora_weights(
64
+ lora_path,
65
+ weight_name="pytorch_lora_weights.safetensors",
66
+ adapter_name="suturing"
67
+ )
68
+ pipe.set_adapters("suturing", lora_weight)
69
+ pipe.to("cuda")
70
+ # Prepare the image and prompt
71
+ image = load_image(image_path).resize((width, height))
72
+ full_prompt = prefix + prompt if prefix else prompt
73
+ # Generate output filename
74
+ basename = os.path.basename(image_path).split('.')[0]
75
+ output_filename = f"{basename}_i2v.mp4"
76
+ output_path = os.path.join(output_dir, output_filename)
77
+ # Generate the video
78
+ print(f"Generating video with prompt: {full_prompt}")
79
+ video = pipe(
80
+ image=image,
81
+ prompt=full_prompt,
82
+ negative_prompt=negative_prompt,
83
+ width=width,
84
+ height=height,
85
+ num_frames=num_frames,
86
+ num_inference_steps=50,
87
+ decode_timestep=0.03,
88
+ decode_noise_scale=0.025,
89
+ generator=None,
90
+ stg_mode=stg_mode,
91
+ stg_applied_layers_idx=stg_applied_layers_idx,
92
+ stg_scale=stg_scale,
93
+ do_rescaling=do_rescaling
94
+ ).frames[0]
95
+
96
+ # Export the video
97
+ export_to_video(video, output_path, fps=24)
98
+ print(f"Video saved to: {output_path}")
99
+ return output_path
100
+
101
+ generate_video_from_image(
102
+ image_path="../suturing_datasetv2/images/9_railroad_final_8487-8570_NeedleWithdrawalNonIdeal.png",
103
+ prompt="A needlewithdrawalnonideal clip, generated from a backhand task."
104
+ )
105
+ ```
106
+
107
+ ## Applications
108
+ - **Surgical Training**: Generate demonstrations of both ideal and non-ideal surgical techniques for training purposes.
109
+ - **Skill Evaluation**: Assess surgical skills by comparing actual procedures against model-generated standards.
110
+ - **Robotic Automation**: Inform autonomous surgical robotic systems for real-time guidance and procedure automation.
111
+
112
+ ## Quantitative Performance
113
+ | Metric | Performance |
114
+ |-------------------------|---------------|
115
+ | L2 Reconstruction Loss | 0.24501 |
116
+ | Inference Time | ~18.7 seconds per video |
117
+
118
+ ## Future Directions
119
+ Further improvements will focus on increasing model robustness, expanding the dataset diversity, and enhancing real-time applicability to robotic surgical scenarios.
120
+
121
+ ## Citation
122
+ Please cite our work if you find this model useful:
123
+
124
+ ```bibtex
125
+ @article{turkcan2024suturing,
126
+ title={Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks},
127
+ author={Turkcan, Mehmet Kerem and Ballo, Mattia and Filicori, Filippo and Kostic, Zoran},
128
+ journal={arXiv preprint arXiv:2024},
129
+ year={2024}
130
+ }
131
+ ```