Minimum Requirements: Video Files for Processing

Learn the essential minimum requirements for processing video files efficiently and effectively.

Updated at April 15th, 2026

This document defines the technical specifications and general requirements for all video and audio assets processed within the platform. The purpose is to ensure consistent input quality across all machine learning pipelines, enabling reliable analysis, feature extraction, and accurate model performance.

All submitted media files must adhere to the following guidelines regarding file size, structure, and encoding. These requirements are designed to minimize decoding issues, avoid data loss during processing, and maintain visual and acoustic integrity across different workflows.

The specifications are grouped into three main sections:

  • General Requirements — Overall file structure and metadata prerequisites.
  • Video Criteria — Technical standards for visual data such as resolution, bitrate, and codec.
  • Audio Criteria — Audio encoding, sampling, and channel setup requirements.

Adhering to these criteria ensures that assets can be automatically validated, efficiently decoded, and seamlessly integrated into the machine learning and analytics pipelines without manual preprocessing.
 

Preferred and Supported Formats

While various container and codec combinations are supported, certain formats have proven to be significantly more stable, efficient, and reliable across all processing pipelines.

Preferred Format (Gold Standard)
For optimal performance, compatibility, and processing efficiency, the following format is strongly recommended:

  • Container: MP4
  • Video Codec: H.264 (AVC)
  • Audio Codec: AAC

This combination provides the best balance between quality, file size, decoding performance, and compatibility across machine learning workflows.

Well-Supported Alternatives
The following formats are also supported and generally perform well within the platform:

  • MP4 / MOV with H.265 (HEVC) and AAC
    Suitable for high-resolution (e.g., 4K) and HDR content with improved compression efficiency.
  • MOV with ProRes (e.g., ProRes 422) and PCM or AAC audio
    Commonly used as high-quality mezzanine format in post-production workflows. Note that larger file sizes may impact processing time and storage requirements.
 

 

General Requirements

Criterion Value Required
Filesize Max. 10 GB per Video-File yes
Duration ≥10 Minutes yes
DRM / Encryption No DRM Protection or other encryption methods. yes
File structure One file per Asset containing audio and video data. Segmented Files are not supported. yes
No Watermarks / Logos Visual overlays may interfere with feature extraction models (e.g., object or brand detection). yes
No Letterbox / Pillarbox Crop black bars to preserve only active image area. yes
File Naming Use standardized format: movie_title_resolution_date_version.ext Optional
Supplementary Data Provide poster, title, synopsis, genre, language, cast as separate data via API. Optional
 

Video Criteria

 

Criterion

 

Recommendation / Value

 

Required

 

Resolution

≥ 720p

yes 

Bitrate

Minimum 5 Mbps (higher preferred). Avoid excessive compression artifacts.

yes 

Container Format

Various Container Formats supported. 

yes 

Video Codec

Various Audio and Video Codecs supported

yes 

Framerate

Constant frame rate (e.g. 24 or 25 fps). No variable frame rate (VFR).

yes 

GOP Structure

Short GOP preferred (e.g. 1–12 frames per keyframe) for frame-level accuracy.

yes 

Duration

Any, but should be >= 10 minutes only for Binge Markers product

Optional 

Color Space / Depth

Rec. 709 / 8-bit (standard). Optionally Rec. 2020 or 10-bit for HDR workflows.

Optional 

Interlacing

Always progressive (no interlaced content).

yes 

Audio Criteria

Criterion Recommendation / Value Required
Audio Codec Various Audio and Video Codecs supported. One File per Asset. yes
Sampling Rate Minimum 44.1 kHz or 48 kHz. yes
Number of Channels Mono or Stereo (2.0). Avoid multichannel (5.1 / 7.1) unless explicitly required. yes
Loudness Normalization Target: -23 LUFS (for consistent speech/sound detection). Optional