11.2.3 Face Detection#

Duration:: 35-40 minutes
Level:: Intermediate
Prerequisites:: Module 3.3.5 (Delaunay Triangulation)

Overview#

Face detection is one of the most compelling applications of computer vision, enabling creative interactions between humans and digital systems. In this exercise, you will use MediaPipe’s Face Mesh to detect 478 facial landmarks and transform them into striking low-poly portrait art using Delaunay triangulation.

This exercise connects directly to Module 3.3.5 Delaunay Triangulation, applying those geometric concepts to a new creative context. By combining face detection with triangulation, you will create a pipeline that transforms any face photograph into stylized geometric art, demonstrating how foundational algorithms transfer to advanced applications.

Learning Objectives#

By the end of this exercise, you will be able to:

Distinguish between face detection (locating faces) and face recognition (identifying who)
Use MediaPipe Face Mesh to detect 478 facial landmarks in real-time
Apply Delaunay triangulation to facial landmarks for low-poly art generation
Create interactive face-driven generative effects using webcam input

Quick Start: See It In Action#

Run this code to transform a face photo into low-poly art:

Generate low-poly face art in under 20 lines#

import cv2
import numpy as np
from scipy.spatial import Delaunay
import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

# Load image and detect face landmarks
image = cv2.cvtColor(cv2.imread("sample_face.jpg"), cv2.COLOR_BGR2RGB)
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image)
detector = vision.FaceLandmarker.create_from_options(
    vision.FaceLandmarkerOptions(base_options=python.BaseOptions(
        model_asset_path="face_landmarker.task"), num_faces=1))
landmarks = np.array([[lm.x * image.shape[1], lm.y * image.shape[0]]
                      for lm in detector.detect(mp_image).face_landmarks[0]])

# Triangulate and render
tri = Delaunay(landmarks)
output = np.zeros_like(image)
for simplex in tri.simplices:
    pts = landmarks[simplex].astype(np.int32)
    centroid = pts.mean(axis=0).astype(int)
    color = image[centroid[1], centroid[0]].tolist()
    cv2.fillPoly(output, [pts], color)

The pipeline follows three key steps: detect facial landmarks, triangulate the points, and fill each triangle with sampled colors from the original image. The result is a geometric abstraction that preserves the essential features of the face while creating a distinctive artistic effect.

Side-by-side comparison of original portrait photograph and low-poly triangulated version — Original portrait transformed into low-poly art using 478 facial landmarks and Delaunay triangulation.#

Note

The sample face image used in this exercise is from Unsplash (CC0 license). Download: https://images.unsplash.com/photo-1507003211169-0a1dd7228f2d?w=800&q=80

Core Concepts#

Concept 1: Face Detection vs. Face Recognition#

Face detection and face recognition are often confused, but they serve fundamentally different purposes [ViolaJones2004]:

Face Detection: Locates faces in an image and provides their bounding boxes or landmark positions. It answers “Where are the faces?” without identifying who they belong to.
Face Recognition: Identifies specific individuals by comparing detected faces against a database. It answers “Who is this person?”

This exercise focuses exclusively on face detection, specifically using facial landmark detection to find 478 precise points on a face. These landmarks enable creative applications without any identification or privacy concerns.

The evolution of face detection algorithms reflects decades of computer vision research:

1990s-2000s: Haar Cascade classifiers using hand-crafted features [ViolaJones2004]
2010s: Deep neural network approaches with improved accuracy
2020s: MediaPipe and similar frameworks enabling real-time detection on mobile devices [MediaPipe2019] [BlazeFace2019]

Did You Know?

The Viola-Jones algorithm (2001) was revolutionary because it could detect faces at 15 frames per second on a 700 MHz processor, using a cascade of simple classifiers to quickly reject non-face regions [ViolaJones2004]. This was the technology behind early digital cameras’ face detection features.

Concept 2: The MediaPipe Face Mesh#

MediaPipe Face Mesh provides 478 three-dimensional landmarks covering the entire face surface [MediaPipe2019]. Unlike simple bounding box detection, these landmarks capture detailed facial geometry including:

Face Oval: 36 points defining the jawline and face contour
Eyes: 128 points covering eyelids, corners, and surrounding area
Eyebrows: 44 points for each eyebrow’s shape
Nose: 40 points from bridge to nostrils
Lips: 40 points for inner and outer lip contours
Irises: 10 points (5 per eye) for precise eye tracking
Face Surface: 180 additional points covering cheeks and forehead

Schematic diagram showing color-coded facial regions with labeled landmark groups — MediaPipe Face Mesh regions. Each color represents a different facial feature group, with 478 total landmarks providing detailed surface coverage. Diagram generated with Claude - Opus 4.5.#

The MediaPipe Tasks API [MediaPipeDocs] provides a simple interface for detection:

Detecting facial landmarks#

import mediapipe as mp
from mediapipe.tasks import python
from mediapipe.tasks.python import vision

# Configure and create detector
base_options = python.BaseOptions(model_asset_path="face_landmarker.task")
options = vision.FaceLandmarkerOptions(
    base_options=base_options,
    num_faces=1  # Detect up to 1 face
)
detector = vision.FaceLandmarker.create_from_options(options)

# Detect landmarks in an image
mp_image = mp.Image(image_format=mp.ImageFormat.SRGB, data=image_rgb)
result = detector.detect(mp_image)

# Extract pixel coordinates
height, width = image_rgb.shape[:2]
landmarks = []
for lm in result.face_landmarks[0]:
    x = lm.x * width   # Convert normalized (0-1) to pixels
    y = lm.y * height
    landmarks.append((x, y))

Each landmark has normalized coordinates between 0 and 1, which must be multiplied by the image dimensions to get pixel positions.

Concept 3: From Landmarks to Low-Poly Art#

The connection between face detection and geometric art lies in Delaunay triangulation [Delaunay1934], covered in Module 3.3.5. Facial landmarks provide semantically meaningful points that, when triangulated, create a mesh that follows facial features naturally.

The low-poly pipeline consists of four stages:

Stage 1: Detect Landmarks

MediaPipe provides 478 points distributed according to facial anatomy. Unlike random point placement, these points cluster around important features (eyes, nose, mouth) while remaining sparse on flat areas (cheeks, forehead).

Stage 2: Add Boundary Points

To ensure the triangulation covers the entire image (not just the face), add corner and edge points:

Adding boundary points for full coverage#

corners = np.array([
    [0, 0], [width, 0], [width, height], [0, height],
    [width/2, 0], [width, height/2], [width/2, height], [0, height/2]
])
all_points = np.vstack([landmarks, corners])

Stage 3: Triangulate

Apply Delaunay triangulation [SciPyDocs] to connect all points into non-overlapping triangles:

Creating the triangle mesh#

from scipy.spatial import Delaunay
triangulation = Delaunay(all_points)

Stage 4: Sample and Render

For each triangle, sample the color at its centroid and fill with that solid color using OpenCV [OpenCVDocs]:

Rendering triangles with sampled colors#

for simplex in triangulation.simplices:
    triangle = all_points[simplex].astype(np.int32)
    centroid = np.mean(triangle, axis=0).astype(int)
    color = image[centroid[1], centroid[0]]  # Sample at centroid
    cv2.fillPoly(output, [triangle], color.tolist())

The result is a geometric abstraction where facial features remain recognizable because the triangle density is highest around eyes, nose, and mouth, exactly where humans focus their attention.

Final low-poly face art showing triangulated portrait with color-sampled fills — Low-poly face art generated from 478 landmarks plus 8 boundary points, creating approximately 960 triangles.#

Hands-On Exercises#

Exercise 1: Execute and Explore#

Run the basic face detection script to visualize all 478 landmarks:

Download face_detection_basic.py

python face_detection_basic.py

The script detects facial landmarks and draws them with color-coded regions:

Green: Face oval (jawline)
Blue: Eyebrows
Orange: Eyes
Yellow: Nose
Red: Lips
Cyan: Irises
Gray: Face surface (cheeks, forehead)

Face photograph with 478 colored landmark points overlaid showing different facial regions — All 478 MediaPipe landmarks visualized with color-coded regions. Notice the higher density around eyes and lips.#

After running the code, answer these reflection questions:

How many total landmarks does MediaPipe detect on a face?
Which facial features have the highest landmark density?
What happens when no face is detected in the image?

Exercise 2: Modify Parameters#

Experiment with the face landmark visualization by modifying these aspects.

Goal 1: Filter landmarks to show only specific regions

Modify the visualization to show only eye landmarks (indices 33-133 and 263-364):

Filtering to show only eye regions#

for idx, landmark in enumerate(face_landmarks):
    # Only draw eye landmarks
    if (33 <= idx <= 133) or (263 <= idx <= 364):
        x = int(landmark.x * width)
        y = int(landmark.y * height)
        cv2.circle(annotated_image, (x, y), 3, (255, 128, 0), -1)

Goal 2: Change the landmark visualization style

Instead of colored circles, draw connected lines between landmarks:

Connecting landmarks with lines#

# Draw connections between consecutive landmarks
points = [(int(lm.x * width), int(lm.y * height)) for lm in face_landmarks]
for i in range(len(points) - 1):
    cv2.line(annotated_image, points[i], points[i+1], (0, 255, 0), 1)

Goal 3: Visualize landmark indices

Add text labels showing landmark numbers (useful for understanding the mesh topology):

Labeling key landmarks with indices#

key_indices = [1, 33, 133, 362, 263, 13, 14, 152]  # Nose, eyes, lips, chin
for idx in key_indices:
    lm = face_landmarks[idx]
    x, y = int(lm.x * width), int(lm.y * height)
    cv2.putText(annotated_image, str(idx), (x, y),
                cv2.FONT_HERSHEY_SIMPLEX, 0.4, (255, 255, 255), 1)

Exercise 3: Create Low-Poly Face Art#

Build the complete low-poly face transformation by completing the starter code below.

Download lowpoly_starter.py

Requirements:

Load an image and detect facial landmarks
Apply Delaunay triangulation to the landmark points
Add boundary points for full canvas coverage
Sample colors from the original image at triangle centroids
Render the result as filled triangles

Challenge Extension: Add white edge lines to create a “stained glass” effect:

Adding edge visualization#

# After filling all triangles, draw edges
for simplex in triangulation.simplices:
    triangle = all_points[simplex].astype(np.int32)
    cv2.polylines(output, [triangle], True, (255, 255, 255), 1)

Low-poly face art with white triangle edges visible, creating a stained glass effect — Low-poly face with visible edges, creating a stained glass aesthetic.#

Exercise 4: Real-Time Webcam Low-Poly Face#

Transform your live webcam feed into real-time low-poly art.

Download realtime_lowpoly.py

Run the real-time script:

python realtime_lowpoly.py

Controls:

Q: Quit the application
S: Save the current frame as an image
E: Toggle edge visibility
F: Toggle FPS display

The script applies the same low-poly pipeline to each video frame, processing at approximately 15-30 FPS depending on your hardware. Key adaptations for real-time processing include:

Video Mode: MediaPipe uses RunningMode.VIDEO for temporal consistency
Timestamp Handling: Each frame requires a monotonically increasing timestamp
Efficient Rendering: OpenCV’s fillPoly is faster than matplotlib for real-time use

Key differences for video processing#

# Configure for video mode (not static images)
options = vision.FaceLandmarkerOptions(
    base_options=base_options,
    running_mode=vision.RunningMode.VIDEO,  # Important!
    num_faces=1
)

# Each frame needs a timestamp (milliseconds)
timestamp_ms = int(time.time() * 1000)
result = detector.detect_for_video(mp_image, timestamp_ms)

Challenge: Modify the real-time script to support multiple faces by changing num_faces=1 to num_faces=3 and iterating over all detected faces.

TouchDesigner Extension (Optional)#

Duration:: +20 minutes (optional)

For those with TouchDesigner experience, this extension demonstrates how the NumPy-based face detection concepts translate directly to real-time performance applications.

Note

Requirements:

TouchDesigner 2022.20000 or later (tested on 2025.31310)
Python 3.11 with scipy installed
MediaPipe TouchDesigner Plugin

Download the complete project:

Download TouchDesigner Project

Download Script SOP Code

Output Examples:

Wireframe Delaunay triangulation on face — Wireframe triangulation effect#

Delaunay triangulation with noise distortion — With noise distortion effect#

This extension demonstrates how creative coding concepts transfer from Python prototyping to real-time interactive installations.

Summary#

Key Takeaways#

Face detection locates faces without identifying who they belong to; face recognition identifies individuals
MediaPipe Face Mesh provides 478 3D landmarks covering the entire face surface in real-time
Delaunay triangulation applied to facial landmarks creates low-poly art that preserves facial features
Color sampling at centroids produces the characteristic flat-shaded geometric look
The technique transfers directly from still images to real-time video with minor API changes

Common Pitfalls#

BGR vs RGB: OpenCV loads images as BGR, but MediaPipe expects RGB. Always convert with cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
Normalized coordinates: MediaPipe landmarks are 0-1 normalized. Multiply by image width/height to get pixels
Empty results: Always check if result.face_landmarks: before accessing landmarks
Centroid bounds: When sampling colors, clip centroid coordinates to valid image bounds to avoid index errors
Video timestamps: In video mode, timestamps must be monotonically increasing (never go backward)

References#

[ViolaJones2004] (1,2,3)

Viola, P., & Jones, M. J. (2004). Robust Real-Time Face Detection. International Journal of Computer Vision, 57(2), 137-154. https://doi.org/10.1023/B:VISI.0000013087.49260.fb

[MediaPipe2019] (1,2)

Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., … & Grundmann, M. (2019). MediaPipe: A Framework for Building Perception Pipelines. arXiv preprint. https://arxiv.org/abs/1906.08172

[BlazeFace2019]

Bazarevsky, V., Kartynnik, Y., Vakunov, A., Raveendran, K., & Grundmann, M. (2019). BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs. arXiv preprint. https://arxiv.org/abs/1907.05047

[Delaunay1934]

Delaunay, B. (1934). Sur la sphere vide. Bulletin de l’Academie des Sciences de l’URSS, 6, 793-800.

[MediaPipeDocs]

Google. (2024). Face Landmarker guide for Python. Google AI Edge. https://ai.google.dev/edge/mediapipe/solutions/vision/face_landmarker/python

[OpenCVDocs]

OpenCV Developers. (2024). OpenCV Python Tutorials. OpenCV Documentation. https://docs.opencv.org/4.x/d6/d00/tutorial_py_root.html

[SciPyDocs]

SciPy Developers. (2024). scipy.spatial.Delaunay. SciPy Documentation. https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.Delaunay.html

[MediaPipeTD]

Blankensmith, T. (2024). MediaPipe TouchDesigner Plugin. GitHub. torinmb/mediapipe-touchdesigner