--- base_model: - Lightricks/LTX-Video library_name: diffusers ---

# Towards Suturing World Models (LTX-Video, i2v)

This repository hosts the fine-tuned LTX-Video image-to-video (i2v) diffusion model specialized for generating realistic robotic surgical suturing videos, capturing fine-grained sub-stitch actions including needle positioning, targeting, driving, and withdrawal. The model can differentiate between ideal and non-ideal surgical techniques, making it suitable for applications in surgical training, skill evaluation, and autonomous surgical system development. ## Model Details - **Base Model**: LTX-Video - **Resolution**: 768×512 pixels (Adjustable) - **Frame Length**: 49 frames per generated video (Adjustable) - **Fine-tuning Method**: Low-Rank Adaptation (LoRA) - **Data Source**: Annotated laparoscopic surgery exercise videos (∼2,000 clips) ## Usage Example ```python import os import argparse import torch from diffusers.utils import export_to_video, load_image from stg_ltx_i2v_pipeline import LTXImageToVideoSTGPipeline def generate_video_from_image( image_path, prompt, output_dir="outputs", width=768, height=512, num_frames=49, lora_path="mehmetkeremturkcan/Suturing-LTX-I2V", lora_weight=1.0, prefix="suturingmodel, ", negative_prompt="worst quality, inconsistent motion, blurry, jittery, distorted", stg_mode="STG-A", stg_applied_layers_idx=[19], stg_scale=1.0, do_rescaling=True ): # Create output directory if it doesn't exist if not os.path.exists(output_dir): os.makedirs(output_dir) # Load the model pipe = LTXImageToVideoSTGPipeline.from_pretrained( "a-r-r-o-w/LTX-Video-0.9.1-diffusers", torch_dtype=torch.bfloat16, local_files_only=False ) # Apply LoRA weights pipe.load_lora_weights( lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="suturing" ) pipe.set_adapters("suturing", lora_weight) pipe.to("cuda") # Prepare the image and prompt image = load_image(image_path).resize((width, height)) full_prompt = prefix + prompt if prefix else prompt # Generate output filename basename = os.path.basename(image_path).split('.')[0] output_filename = f"{basename}_i2v.mp4" output_path = os.path.join(output_dir, output_filename) # Generate the video print(f"Generating video with prompt: {full_prompt}") video = pipe( image=image, prompt=full_prompt, negative_prompt=negative_prompt, width=width, height=height, num_frames=num_frames, num_inference_steps=50, decode_timestep=0.03, decode_noise_scale=0.025, generator=None, stg_mode=stg_mode, stg_applied_layers_idx=stg_applied_layers_idx, stg_scale=stg_scale, do_rescaling=do_rescaling ).frames[0] # Export the video export_to_video(video, output_path, fps=24) print(f"Video saved to: {output_path}") return output_path generate_video_from_image( image_path="../suturing_datasetv2/images/9_railroad_final_8487-8570_NeedleWithdrawalNonIdeal.png", prompt="A needlewithdrawalnonideal clip, generated from a backhand task." ) ``` ## Applications - **Surgical Training**: Generate demonstrations of both ideal and non-ideal surgical techniques for training purposes. - **Skill Evaluation**: Assess surgical skills by comparing actual procedures against model-generated standards. - **Robotic Automation**: Inform autonomous surgical robotic systems for real-time guidance and procedure automation. ## Quantitative Performance | Metric | Performance | |-------------------------|---------------| | L2 Reconstruction Loss | 0.24501 | | Inference Time | ~18.7 seconds per video | ## Future Directions Further improvements will focus on increasing model robustness, expanding the dataset diversity, and enhancing real-time applicability to robotic surgical scenarios.