πŸ‘¦πŸ»

βš–οΈ Classical vs Modern Computer Vision: When to Use Each

Understanding when to use geometric methods vs deep learning in production perception systems, and how to combine them for robust performance
7 min read

Classical vs Modern Computer Vision: When to Use Each

Computer vision has evolved dramatically with deep learning, but classical techniques remain essential for production systems. Understanding when to use geometric methods vs learned approachesβ€”and how to combine themβ€”is critical for building robust perception systems.

The Two Paradigms

Classical Computer Vision (Pre-2012)

Philosophy: Hand-crafted features + geometric reasoning + optimization

Core Techniques:

  • Feature detection (SIFT, SURF, ORB, FAST)
  • Edge detection (Canny, Sobel)
  • Geometric primitives (lines, circles, planes)
  • Epipolar geometry, homographies
  • Structure from Motion (SfM)
  • Bundle adjustment
  • Kalman filtering, particle filters

Strengths:

  • Mathematically interpretable
  • Predictable behavior
  • Efficient (low compute/memory)
  • No training data needed
  • Generalizes to new scenes
  • Precise geometric reasoning

Weaknesses:

  • Limited semantic understanding
  • Struggles with cluttered scenes
  • Sensitive to viewpoint/lighting changes
  • Manual feature engineering
  • Breaks down with textureless surfaces

Modern Computer Vision (2012+)

Philosophy: Learn features + representations + reasoning end-to-end from data

Core Techniques:

  • Convolutional Neural Networks (CNNs)
  • Transformers (Vision Transformers, DETR)
  • Semantic segmentation (U-Net, DeepLab, Mask R-CNN)
  • Object detection (YOLO, Faster R-CNN)
  • Depth estimation (MonoDepth, DPT)
  • Optical flow networks (FlowNet, RAFT)
  • Neural rendering (NeRF, Gaussian Splatting)

Strengths:

  • Rich semantic understanding
  • Handles complex appearance variations
  • Works on textureless/ambiguous regions
  • Learns domain-specific priors
  • State-of-the-art accuracy on benchmarks

Weaknesses:

  • Requires large labeled datasets
  • Computationally expensive (GPU needed)
  • Unpredictable edge case behavior
  • Difficult to interpret/debug
  • Can fail silently without geometric consistency
  • Overfits to training distribution

When to Use Classical CV

1. Geometric Reasoning Tasks

Camera Calibration:

  • Zhang's method is simple, robust, well-understood
  • No need for ML when geometry is exact

Pose Estimation (PnP):

  • Given 2D-3D correspondences, classical solvers are optimal
  • EPnP, P3P are fast and accurate
  • Used in production: Astrobee, AR/VR systems, robotics

Stereo Vision:

  • Rectification is pure geometry
  • Semi-global matching works well for textured scenes
  • No training data needed

Structure from Motion:

  • COLMAP, OpenSfM provide robust 3D reconstruction
  • Handles sparse features efficiently
  • Bundle adjustment optimizes geometry globally

2. Resource-Constrained Environments

Embedded Systems:

  • Space robotics (Astrobee): ARM processors, limited memory
  • Drones: Real-time on mobile GPUs
  • Edge devices: Can't run large neural networks

Example: Astrobee on ISS

  • BRISK features: 15-20ms per frame on ARM Cortex-A9
  • EPnP + RANSAC: 10ms for pose estimation
  • Total: 30 FPS real-time on space-grade hardware
  • No GPU needed, deterministic performance

3. Limited or No Training Data

Novel Environments:

  • Space stations, underwater, caves
  • No pre-existing datasets
  • Classical methods generalize without training

Custom Scenarios:

  • Unique industrial inspection tasks
  • Scientific instruments
  • One-off robotics applications

4. Safety-Critical Systems

Interpretability:

  • Geometric methods have clear failure modes
  • Can prove mathematical correctness
  • Easier to validate and test

Predictability:

  • Deterministic behavior
  • No silent failures from out-of-distribution data

Example: Autonomous vehicles

  • Use classical methods for localization (HD maps + particle filters)
  • Combine with learned perception for robustness

5. Real-Time Requirements

Low Latency:

  • Feature detection: < 10ms
  • Optical flow (Lucas-Kanade): < 5ms
  • Template matching: < 1ms

Predictable Timing:

  • No variable network inference time
  • Can guarantee real-time deadlines

When to Use Modern CV (Deep Learning)

1. Semantic Understanding

Object Recognition:

  • Classify objects, detect instances
  • Handle intra-class variation
  • Learn complex appearance models

Scene Understanding:

  • Semantic segmentation (road, sidewalk, building, vegetation)
  • Instance segmentation (separate objects)
  • Panoptic segmentation (stuff + things)

Example: Zipline Delivery Zones

  • Segment aerial imagery into safe landing zones
  • Identify obstacles (trees, power lines, buildings)
  • Learn from labeled satellite/drone data

2. Dense Prediction Tasks

Monocular Depth Estimation:

  • Predict depth from single image
  • Classical methods require stereo or motion
  • Networks learn geometric priors from data

Optical Flow:

  • Dense motion field estimation
  • FlowNet, RAFT outperform classical methods
  • Better at motion boundaries

Surface Normal Estimation:

  • Predict 3D orientation from shading
  • Learns shape-from-shading priors

3. Ill-Posed Problems

Super-Resolution:

  • Hallucinate high-frequency details
  • Learned priors from natural image statistics

Denoising/Inpainting:

  • Remove artifacts, fill missing regions
  • Neural priors for plausible completions

HDR Reconstruction:

  • Merge multiple exposures
  • Handle saturated regions

4. Large-Scale Annotated Datasets

Pre-Trained Models:

  • ImageNet (1.4M images)
  • COCO (330K images)
  • Cityscapes (5K urban scenes)

Transfer Learning:

  • Fine-tune on specific task
  • Requires less data than training from scratch

5. Complex Appearance Variations

Illumination Changes:

  • Shadows, highlights, reflections
  • Learned features are more robust than hand-crafted

Occlusions:

  • Partially visible objects
  • Networks learn to complete from context

Clutter:

  • Crowded scenes with overlapping objects
  • Attention mechanisms focus on relevant features

Hybrid Approaches: Best of Both Worlds

The most robust production systems combine classical and modern techniques.

Architecture Pattern

Raw Sensor Data
      ↓
[Classical Preprocessing]
- Undistortion
- Rectification
- Feature detection
      ↓
[Learned Feature Extraction]
- CNN backbone
- Feature pyramid
      ↓
[Classical Geometric Reasoning]
- Epipolar constraints
- PnP / Triangulation
- Bundle adjustment
      ↓
[Learned Refinement]
- Pose refinement network
- Depth completion
      ↓
Output (Pose, Depth, Segmentation)

Example Systems

ORB-SLAM3 + DepthNet:

  • Classical: ORB features, tracking, bundle adjustment
  • Learned: Monocular depth prediction
  • Combination: Depth network provides scale, SLAM provides consistency

DeepTAM:

  • Classical: Camera pose optimization
  • Learned: Dense depth prediction
  • Combination: Geometric consistency constrains network

Astrobee + Future ML:

  • Classical: Current production system (feature tracking, PnP)
  • Learned: Potential future addition (semantic understanding, texture-less localization)
  • Combination: ML provides hints, geometry provides precision

When to Combine

Learned Feature Detection + Classical Matching:

  • SuperPoint, DISK for learned keypoints
  • Geometric verification (RANSAC, epipolar constraints)
  • Best of both: robust features + geometric consistency

Classical Depth + Learned Completion:

  • Stereo matching for sparse/semi-dense depth
  • Neural network fills holes, smooths noise
  • Hybrid: metric accuracy + dense output

Semantic Segmentation + Geometric SLAM:

  • Segment scene into semantic classes
  • Use only static classes (road, building) for SLAM
  • Ignore dynamic objects (cars, people)
  • Reduces drift in dynamic environments

Decision Framework

Choose Classical If:

  • βœ… Problem has clear geometric structure
  • βœ… Computation is constrained (embedded systems)
  • βœ… Interpretability is critical (safety systems)
  • βœ… Training data is scarce or unavailable
  • βœ… Need predictable, deterministic behavior
  • βœ… Real-time deadlines are strict

Choose Modern If:

  • βœ… Need semantic understanding (object classes, attributes)
  • βœ… Have large labeled dataset or pre-trained models
  • βœ… Problem is appearance-based (recognition, classification)
  • βœ… Dealing with complex variations (lighting, occlusions)
  • βœ… Have GPU compute available
  • βœ… Can tolerate occasional edge-case failures

Combine Both If:

  • βœ… Building production system (most robust approach)
  • βœ… Need both geometric precision and semantic understanding
  • βœ… Want geometric constraints to validate neural outputs
  • βœ… Targeting autonomous systems (cars, drones, robots)

Production Considerations

Validation Strategy

Classical Methods:

  • Unit tests on synthetic data
  • Verify geometric properties (epipolar error, reprojection error)
  • Analytical error bounds

Learned Methods:

  • Test set evaluation (but may not cover edge cases)
  • Cross-validation on multiple datasets
  • Adversarial testing
  • Out-of-distribution detection

Hybrid:

  • Use geometry to validate neural predictions
  • Flag inconsistencies for human review
  • Degrade gracefully (fallback to classical)

Debugging

Classical:

  • Visualize features, matches, epipolar lines
  • Check algebraic constraints
  • Trace mathematical errors

Learned:

  • Visualize activation maps, attention
  • Check for overfitting, dataset bias
  • Analyze failure modes statistically

Deployment

Classical:

  • Lightweight: CPU-only, low memory
  • Deterministic: Same input β†’ same output
  • Portable: Easy to cross-compile for embedded

Learned:

  • Heavy: GPU required for real-time
  • Stochastic: Dropout, quantization introduce variability
  • Complex: ONNX, TensorRT, model optimization

Case Studies

Zipline Autonomous Delivery

Offboard Perception (Cloud-side):

  • Classical: Structure from Motion for 3D mapping
  • Learned: Semantic segmentation for safe zones
  • Hybrid: Geometric 3D + semantic understanding

Onboard Perception (Aircraft):

  • Classical: Visual odometry, PnP localization
  • Learned: Obstacle detection, landing zone validation
  • Hybrid: Learned detections validated by geometry

Self-Driving Cars

Localization:

  • Classical: Particle filter with HD maps, ICP
  • Learned: Learned features for robust matching
  • Hybrid: Classical provides metric pose, learned handles appearance changes

Perception:

  • Learned: Object detection (pedestrians, vehicles)
  • Classical: Sensor fusion (cameras, lidar, radar)
  • Hybrid: Geometric tracking + semantic classification

Mobile Robotics (Astrobee)

Current System:

  • Classical: BRISK features, bag-of-words, EPnP
  • Why: Resource-constrained, safety-critical, no training data for ISS

Future Enhancements:

  • Learned: Semantic understanding (airlock, handrails, equipment)
  • Hybrid: Classical pose + learned scene understanding

Key Takeaways

  1. Classical CV is not obsolete - Essential for geometric reasoning, resource-constrained, safety-critical systems

  2. Deep learning excels at semantic tasks - Object recognition, segmentation, learning complex appearance models

  3. Hybrid systems are most robust - Combine geometric constraints with learned features

  4. Choose based on constraints:

    • Data availability (classical needs less)
    • Compute budget (classical is lightweight)
    • Interpretability (classical is transparent)
    • Task type (geometric vs semantic)
  5. Production systems benefit from both:

    • Classical provides metric accuracy and consistency
    • Learned provides robustness and semantic understanding
    • Geometric constraints validate neural predictions
  6. Understand the math - Even when using learned methods, geometric reasoning validates outputs

For perception engineers:

  • Master both paradigms
  • Know when each applies
  • Build hybrid systems for production
  • Use geometry to keep networks honest
  • Always validate with classical methods

References

  • Hartley & Zisserman, "Multiple View Geometry in Computer Vision"
  • Szeliski, "Computer Vision: Algorithms and Applications"
  • LeCun et al., "Deep Learning" (Nature 2015)
  • Kendall et al., "Geometric Loss Functions for Camera Pose Regression" (CVPR 2017)
  • DeTone et al., "SuperPoint: Self-Supervised Interest Point Detection" (CVPR 2018)
  • ORB-SLAM3: https://github.com/UZ-SLAMLab/ORB_SLAM3