Computer Vision · Color Science · Image Analytics

Cinema Color
Intelligence

Upload any film still — or select from the reference library — to extract dominant palettes in LAB space, measure perceptual temperature, palette entropy, and match against 15 director color signatures.

Color grading is one of the least-quantified craft elements in filmmaking. This tool runs the full analysis pipeline in-browser: convert to CIELAB, cluster dominant colors via k-means, compute warm/cool ratios and palette entropy, then compare the resulting fingerprint against director signatures via CIE76 ΔE distance. No data leaves your browser.

Analysis Tool

Analysis Output —

Dominant Palette — k=5 Clusters in CIELAB

Width proportional to cluster weight fraction

Color Temperature Distribution

Warm (red/orange/yellow) Cool (blue/cyan) Neutral

Warm Fraction

—

Palette Entropy

—

Mean Lightness L*

—

Director Palette Affinity — CIE76 ΔE vs. 15 Signature Centroids

ΔE < 5 = very close · ΔE 10–20 = distinguishable · ΔE > 25 = distinctly different · Affinity = 100 × e^−ΔE/22

How It Works

Concept 01

LAB Color Space

RGB is device-dependent and perceptually non-linear. CIELAB separates luminance (L*) from chroma and makes Euclidean distance meaningful — a ΔE of 2.3 is the threshold of human perception. ΔE > 10 is clearly distinct at a glance.

Concept 02

K-Means Clustering

Each frame is downsampled to 64×36 pixels and its LAB values grouped into k=5 dominant colors using k-means++ initialization. The resulting centroids and their pixel-weight fractions form the frame's palette fingerprint.

Concept 03

Warm/Cool Ratio

Each cluster's hue angle (atan2(b*, a*)) determines its temperature class. Reds/oranges/yellows (0–60° and 300–360°) are warm; blues/cyans (180–270°) cool; everything else neutral. Fractions are weighted by cluster pixel share.

Concept 04

Palette Entropy

Shannon entropy on the 5 cluster weight fractions, normalized to [0,1] against ln(5). Low entropy = one or two colors dominate. High entropy = broad distribution. Fast-cut sequences reliably score higher than slow atmospheric films.

Concept 05

Director Signature

Each director's signature is a weighted-average LAB centroid across their filmography. The tool computes CIE76 ΔE between your frame's palette centroid and all 15 director signatures, surfacing the three closest perceptual matches.

Why LAB instead of HSV? HSV hue is intuitive but perceptually non-linear — a 10° shift at orange looks very different from a 10° shift at green. CIELAB's ΔE is perceptually uniform: Euclidean distance in that space predicts perceived difference. That means "are these two directors' palettes meaningfully different?" gets an answer with a real human interpretation. A ΔE of 18 — the mean gap between directors in the full dataset — is well above the just-noticeable difference threshold and comfortably in the "clearly distinct at a glance" range.

Key Findings

Color Temperature Drift

3rd Act Shift

In 14 of 18 sampled films, warm/cool ratio shifts measurably in the final quarter of runtime. Crime and thriller genres consistently cool; coming-of-age films warm. The direction is genre-predictive even when narrative content is unknown.

Director Signature Gap

ΔE 18.4

Mean perceptual color distance between director palette fingerprints across the sampled corpus. Well above the just-noticeable threshold of ~2.3. Directors can be distinguished from palette data alone at 82% accuracy using a k-nearest neighbor classifier.

Entropy & Pacing

r = 0.61

Palette entropy correlates at r = 0.61 with estimated edit rate. Fast-cut sequences exhibit higher entropy; slow atmospheric films hold tight palettes across long takes. Color design relaxes during rapid montage.

Cluster Stability

k=5 Optimal

Elbow method and silhouette score converge at k=5. At k=3 the model loses saturation nuance; at k=7 it splits on lighting variation rather than intentional palette choices. Five clusters maps to how colorists describe their own work.

Full Pipeline

The browser tool analyzes a single frame in real-time. The Python pipeline below scales this to entire films — adaptive frame sampling, Pydantic validation, DuckDB ingestion, and dbt aggregation to director-level signatures across a multi-film corpus.

Frame Ingestion

ffmpeg extracts frames at adaptive sampling rate. Scene boundary detection (histogram diff threshold) determines local sample density. Each frame tagged with film ID, timestamp, and scene index in a sidecar JSON.

ffmpegPythonJSON

Color Extraction

OpenCV reads each frame, converts BGR→LAB. Pixels flattened and clustered with sklearn KMeans (k=5, n_init=10). Centroid colors and weight fractions emitted as Pydantic-validated records; bad frames (title cards, hard-cut artifacts) flagged and excluded.

OpenCVscikit-learnPydantic v2

DuckDB Ingestion

Validated records bulk-loaded into DuckDB via Parquet staging. Schema: frame_id, film_id, scene_id, timestamp_s, cluster_rank, L, a, b, weight_fraction — one row per cluster per frame.

DuckDBParquet

dbt

Transformation Layer

dbt models aggregate frame-level records to scene and director granularity. Warm/cool fractions, palette entropy, and pairwise ΔE between director signature centroids computed as SQL expressions via CROSS JOIN.

dbt-duckdbSQL

Python — Color Extraction Core

# cinema_extract.py — per-frame palette clustering
import cv2, numpy as np
from sklearn.cluster import KMeans

def extract_palette(frame_bgr, k=5):
    small   = cv2.resize(frame_bgr, (64, 36))
    lab     = cv2.cvtColor(small, cv2.COLOR_BGR2LAB)
    pixels  = lab.reshape(-1, 3).astype(np.float32)
    km      = KMeans(n_clusters=k, n_init=10, random_state=42)
    km.fit(pixels)
    counts  = np.bincount(km.labels_, minlength=k)
    weights = counts / counts.sum()
    return sorted([
        {'L': float(c[0]), 'a': float(c[1]),
         'b': float(c[2]), 'weight': float(weights[i])}
        for i, c in enumerate(km.cluster_centers_)
    ], key=lambda x: -x['weight'])

Glossary

Abbreviations and technical terms used throughout this page.

CIELAB / LAB

A color space defined by the International Commission on Illumination (CIE) that separates lightness (L*) from color information (a*, b*). Unlike RGB, equal numerical distances in LAB correspond to equal perceived differences by the human eye.

L*, a*, b*

The three axes of CIELAB space. L* = lightness (0 black → 100 white). a* = green–red axis (negative = green, positive = red). b* = blue–yellow axis (negative = blue, positive = yellow).

ΔE (Delta E)

The Euclidean distance between two colors in LAB space — a single number representing how perceptually different they are. ΔE < 2.3 is below the threshold of human perception; ΔE > 10 is clearly distinct at a glance.

CIE76

The 1976 CIE standard formula for computing ΔE — the simplest version, calculated as straight Euclidean distance in CIELAB. Later revisions (CIEDE2000) add corrections for hue angle and chroma; CIE76 is used here for its computational simplicity and interpretability.

K-Means

A clustering algorithm that partitions a set of data points into k groups by minimizing the distance of each point to its group's center. Here, k=5 means each frame is summarized by its 5 most representative colors.

K-Means++

An initialization strategy for k-means that spreads the starting cluster centers apart rather than picking them randomly. Reduces the chance of the algorithm converging to a poor local solution, producing more stable and accurate palette results.

RGB

Red, Green, Blue — the standard color model used by screens. Each color is expressed as a mix of red, green, and blue light on a scale of 0–255. RGB is device-dependent and not perceptually uniform, which is why this pipeline converts to LAB before analysis.

W/C Ratio

Warm/Cool Ratio — the fraction of warm-classified palette pixels divided by the fraction of cool-classified pixels. A ratio above 1.0 means the frame skews warm; below 1.0 means it skews cool. A ratio of ∞ means no cool pixels were detected.

Palette Entropy

A measure borrowed from information theory (Shannon entropy) applied to the 5 cluster weight fractions. Normalized here to 0–1 against the theoretical maximum for 5 equally-weighted clusters. Low = tight, committed palette. High = broad, dispersed color distribution.

Hue Angle

The position of a color on the color wheel, measured in degrees 0–360°. Computed as atan2(b*, a*) in LAB space. Used to classify each cluster as warm (reds/oranges/yellows: 0–60° and 300–360°), cool (blues/cyans: 180–270°), or neutral.

Director of Photography — the person responsible for a film's visual look, including lighting, camera, and color. Also called cinematographer. The DP often has as much influence over a film's palette as the director.

dbt

Data Build Tool — an open-source framework for writing and managing SQL-based data transformations. Used in the full pipeline to aggregate frame-level cluster records into scene-level and director-level summaries.

DuckDB

A fast, embedded analytical database designed for running complex queries on local files. The full pipeline stores all frame cluster records in DuckDB and queries them via SQL for aggregation — no external server required.

Pydantic

A Python library for data validation using type annotations. Used in the pipeline to enforce that every extracted palette record meets strict constraints — L* in [0,100], cluster weights summing to 1.0 ± 0.001 — before any bad frame can pollute the analysis.

ffmpeg

A widely-used open-source tool for processing video and audio files. The pipeline uses it to extract individual frames from film files at a controlled rate before color analysis begins.

Cinema ColorIntelligence

Cinema Color
Intelligence