mehmetkeremturkcan
/

Suturing-LTX-I2V

Diffusers

Model card Files Files and versions Community

mehmetkeremturkcan commited on Mar 11

Commit

2cf332f

verified ·

1 Parent(s): 62c6c47

Create README.md

Browse files

Files changed (1) hide show

README.md +131 -0

README.md ADDED Viewed

	@@ -0,0 +1,131 @@

+---
+base_model:
+- Lightricks/LTX-Video
+library_name: diffusers
+---
+<p align="center">
+  <img src="https://github.com/mkturkcan/suturingmodels/blob/main/static/images/title.svg?raw=true"  />
+</p>
+# Towards Suturing World Models (LTX-Video, i2v)
+<p align="center">
+  <img src="https://github.com/mkturkcan/suturingmodels/blob/main/static/images/i2v_lora_sample.jpg?raw=true"  />
+</p>
+This repository hosts the fine-tuned LTX-Video image-to-video (i2v) diffusion model specialized for generating realistic robotic surgical suturing videos, capturing fine-grained sub-stitch actions including needle positioning, targeting, driving, and withdrawal. The model can differentiate between ideal and non-ideal surgical techniques, making it suitable for applications in surgical training, skill evaluation, and autonomous surgical system development.
+## Model Details
+- **Base Model**: LTX-Video
+- **Resolution**: 768×512 pixels (Adjustable)
+- **Frame Length**: 49 frames per generated video (Adjustable)
+- **Fine-tuning Method**: Low-Rank Adaptation (LoRA)
+- **Data Source**: Annotated laparoscopic surgery exercise videos (∼2,000 clips)
+## Usage Example
+```python
+import os
+import argparse
+import torch
+from diffusers.utils import export_to_video, load_image
+from stg_ltx_i2v_pipeline import LTXImageToVideoSTGPipeline
+def generate_video_from_image(
+    image_path,
+    prompt,
+    output_dir="outputs",
+    width=768,
+    height=512,
+    num_frames=49,
+    lora_path="mehmetkeremturkcan/Suturing-LTX-I2V",
+    lora_weight=1.0,
+    prefix="suturingmodel, ",
+    negative_prompt="worst quality, inconsistent motion, blurry, jittery, distorted",
+    stg_mode="STG-A",
+    stg_applied_layers_idx=[19],
+    stg_scale=1.0,
+    do_rescaling=True
+):
+    # Create output directory if it doesn't exist
+    if not os.path.exists(output_dir):
+        os.makedirs(output_dir)
+    # Load the model
+    pipe = LTXImageToVideoSTGPipeline.from_pretrained(
+        "a-r-r-o-w/LTX-Video-0.9.1-diffusers",
+        torch_dtype=torch.bfloat16,
+        local_files_only=False
+    )
+    # Apply LoRA weights
+    pipe.load_lora_weights(
+        lora_path,
+        weight_name="pytorch_lora_weights.safetensors",
+        adapter_name="suturing"
+    )
+    pipe.set_adapters("suturing", lora_weight)
+    pipe.to("cuda")
+    # Prepare the image and prompt
+    image = load_image(image_path).resize((width, height))
+    full_prompt = prefix + prompt if prefix else prompt
+    # Generate output filename
+    basename = os.path.basename(image_path).split('.')[0]
+    output_filename = f"{basename}_i2v.mp4"
+    output_path = os.path.join(output_dir, output_filename)
+    # Generate the video
+    print(f"Generating video with prompt: {full_prompt}")
+    video = pipe(
+        image=image,
+        prompt=full_prompt,
+        negative_prompt=negative_prompt,
+        width=width,
+        height=height,
+        num_frames=num_frames,
+        num_inference_steps=50,
+        decode_timestep=0.03,
+        decode_noise_scale=0.025,
+        generator=None,
+        stg_mode=stg_mode,
+        stg_applied_layers_idx=stg_applied_layers_idx,
+        stg_scale=stg_scale,
+        do_rescaling=do_rescaling
+    ).frames[0]
+    # Export the video
+    export_to_video(video, output_path, fps=24)
+    print(f"Video saved to: {output_path}")
+    return output_path
+generate_video_from_image(
+    image_path="../suturing_datasetv2/images/9_railroad_final_8487-8570_NeedleWithdrawalNonIdeal.png",
+    prompt="A needlewithdrawalnonideal clip, generated from a backhand task."
+)
+```
+## Applications
+- **Surgical Training**: Generate demonstrations of both ideal and non-ideal surgical techniques for training purposes.
+- **Skill Evaluation**: Assess surgical skills by comparing actual procedures against model-generated standards.
+- **Robotic Automation**: Inform autonomous surgical robotic systems for real-time guidance and procedure automation.
+## Quantitative Performance
+| Metric                  | Performance |
+|-------------------------|---------------|
+| L2 Reconstruction Loss | 0.24501       |
+| Inference Time         | ~18.7 seconds per video |
+## Future Directions
+Further improvements will focus on increasing model robustness, expanding the dataset diversity, and enhancing real-time applicability to robotic surgical scenarios.
+## Citation
+Please cite our work if you find this model useful:
+```bibtex
+@article{turkcan2024suturing,
+  title={Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks},
+  author={Turkcan, Mehmet Kerem and Ballo, Mattia and Filicori, Filippo and Kostic, Zoran},
+  journal={arXiv preprint arXiv:2024},
+  year={2024}
+}
+```