Generate clusters and visualizations from images
Transcribe audio and YouTube videos to text
Process video to analyze human visual motion