Video Understanding#
Video is just an array of images plus an audio track.
ffmpeg#
The Swiss Army Knife of video processing. You must know how to use it.
# Extract 1 frame per second
ffmpeg -i video.mp4 -vf fps=1 frame_%04d.jpg
# Extract audio track
ffmpeg -i video.mp4 -q:a 0 -map a audio.mp3Video LLMs#
Models like Gemini 1.5 Pro natively accept video files. They perform temporal reasoning (understanding the sequence of events over time), which is impossible with single images.