Cinema Color
Intelligence
Extracting palette fingerprints from film frames and surfacing measurable aesthetic signatures by director, genre, and narrative arc.
Color grading is one of the least-quantified craft elements in filmmaking. Cinematographers and directors spend enormous effort building distinctive visual languages, but evaluating those decisions has always been subjective. This pipeline makes it empirical: extract every frame, convert to a perceptually uniform color space, cluster dominant palettes, and aggregate into per-film and per-director fingerprints that can actually be compared.
The result is a dataset of color behavior across films, with enough structure to ask questions like: does this director consistently shift toward cooler tones during third-act tension? Does warm palette density predict critic reception in a given genre? Where in a film does color entropy peak?
Starting from any video file, the pipeline runs fully automated. Frames are extracted at a rate calibrated to scene-change frequency, not wall-clock time, so action-heavy cuts get proportionally more samples than static establishing shots.
Why LAB instead of HSV? HSV hue angle is intuitive but perceptually non-linear: a ΔH of 10° means a very different amount of perceived change at green vs. orange. CIELAB's ΔE metric is perceptually uniform, meaning Euclidean distance in that space predicts human perception of difference. That matters when you want to ask "are these two directors' palettes meaningfully different?" and get an answer that has a real perceptual interpretation rather than an arbitrary number.
# cinema_extract.py — per-frame palette clustering import cv2 import numpy as np from sklearn.cluster import KMeans from pydantic import BaseModel, validator class PaletteRecord(BaseModel): film_id: str scene_id: int frame_ts: float centroids: list[dict] # [{L, a, b, weight}, ...] @validator('centroids') def weights_sum_to_one(cls, v): total = sum(c['weight'] for c in v) if abs(total - 1.0) > 0.001: raise ValueError(f'Weights sum to {total:.4f}, expected 1.0') return v def extract_palette(frame_bgr: np.ndarray, k: int = 5) -> list[dict]: # Downsample to 64x36 before clustering small = cv2.resize(frame_bgr, (64, 36)) lab = cv2.cvtColor(small, cv2.COLOR_BGR2LAB) pixels = lab.reshape(-1, 3).astype(np.float32) km = KMeans(n_clusters=k, n_init=10, tol=1e-4, random_state=42) km.fit(pixels) counts = np.bincount(km.labels_, minlength=k) weights = counts / counts.sum() return sorted([ {'L': float(c[0]), 'a': float(c[1]), 'b': float(c[2]), 'weight': float(weights[i])} for i, c in enumerate(km.cluster_centers_) ], key=lambda x: -x['weight'])
-- mart/director_palette_signature.sql WITH scene_agg AS ( SELECT film_id, scene_id, AVG( CASE WHEN -- warm: reds/oranges/yellows (hue angle 0-60 + 300-360) (ATAN2(b, a) * 180.0 / PI() + 360) % 360 BETWEEN 0 AND 60 OR (ATAN2(b, a) * 180.0 / PI() + 360) % 360 > 300 THEN weight_fraction ELSE 0 END ) AS warm_frac, AVG( CASE WHEN -- cool: blues/cyans (hue angle 180-270) (ATAN2(b, a) * 180.0 / PI() + 360) % 360 BETWEEN 180 AND 270 THEN weight_fraction ELSE 0 END ) AS cool_frac, -SUM(weight_fraction * LN(weight_fraction + 1e-9)) AS palette_entropy FROM frame_clusters GROUP BY film_id, scene_id ) SELECT d.director_id, AVG(s.warm_frac) AS mean_warm, AVG(s.cool_frac) AS mean_cool, 1 - AVG(s.warm_frac) - AVG(s.cool_frac) AS mean_neutral, AVG(s.palette_entropy) AS mean_entropy FROM scene_agg s JOIN film_metadata f ON s.film_id = f.film_id JOIN director_lookup d ON f.director_id = d.director_id GROUP BY d.director_id ORDER BY mean_warm DESC
Film selection. The 18-film corpus was chosen to span genre (crime, drama, sci-fi, coming-of-age), decade (1985 to 2022), and known cinematographic style. No studio was overrepresented by more than 3 films. Directors with fewer than 2 films in the corpus were excluded from the director comparison to avoid single-film noise.
Exclusion criteria. Frames flagged during Pydantic validation include opening title cards (predominantly black or white), hard-cut artifact frames (motion blur exceeding a perceptual threshold), and credits sequences. Approximately 3.1% of extracted frames were excluded.
Warm/cool classification. The 60° warm window and 90° cool window are approximations drawn from standard color theory; they don't account for saturation. A low-saturation orange near neutral gray classifies as warm even though it reads as gray. A future iteration would weight classification by chroma (C* in LAB) so desaturated pixels contribute less to temperature tallies.
ΔE director gap. The reported mean ΔE of 18.4 is computed pairwise between all director signature centroids (mean palette across the full filmography), then averaged. Pairwise variance is high; the most stylistically similar director pair in the corpus had ΔE ~8.1, the most distinct pair ~31.6.