Classical vs Modern Computer Vision: When to Use Each
Computer vision has evolved dramatically with deep learning, but classical techniques remain essential for production systems. Understanding when to use geometric methods vs learned approachesβand how to combine themβis critical for building robust perception systems.
The Two Paradigms
Classical Computer Vision (Pre-2012)
Philosophy: Hand-crafted features + geometric reasoning + optimization
Core Techniques:
- Feature detection (SIFT, SURF, ORB, FAST)
- Edge detection (Canny, Sobel)
- Geometric primitives (lines, circles, planes)
- Epipolar geometry, homographies
- Structure from Motion (SfM)
- Bundle adjustment
- Kalman filtering, particle filters
Strengths:
- Mathematically interpretable
- Predictable behavior
- Efficient (low compute/memory)
- No training data needed
- Generalizes to new scenes
- Precise geometric reasoning
Weaknesses:
- Limited semantic understanding
- Struggles with cluttered scenes
- Sensitive to viewpoint/lighting changes
- Manual feature engineering
- Breaks down with textureless surfaces
Modern Computer Vision (2012+)
Philosophy: Learn features + representations + reasoning end-to-end from data
Core Techniques:
- Convolutional Neural Networks (CNNs)
- Transformers (Vision Transformers, DETR)
- Semantic segmentation (U-Net, DeepLab, Mask R-CNN)
- Object detection (YOLO, Faster R-CNN)
- Depth estimation (MonoDepth, DPT)
- Optical flow networks (FlowNet, RAFT)
- Neural rendering (NeRF, Gaussian Splatting)
Strengths:
- Rich semantic understanding
- Handles complex appearance variations
- Works on textureless/ambiguous regions
- Learns domain-specific priors
- State-of-the-art accuracy on benchmarks
Weaknesses:
- Requires large labeled datasets
- Computationally expensive (GPU needed)
- Unpredictable edge case behavior
- Difficult to interpret/debug
- Can fail silently without geometric consistency
- Overfits to training distribution
When to Use Classical CV
1. Geometric Reasoning Tasks
Camera Calibration:
- Zhang's method is simple, robust, well-understood
- No need for ML when geometry is exact
Pose Estimation (PnP):
- Given 2D-3D correspondences, classical solvers are optimal
- EPnP, P3P are fast and accurate
- Used in production: Astrobee, AR/VR systems, robotics
Stereo Vision:
- Rectification is pure geometry
- Semi-global matching works well for textured scenes
- No training data needed
Structure from Motion:
- COLMAP, OpenSfM provide robust 3D reconstruction
- Handles sparse features efficiently
- Bundle adjustment optimizes geometry globally
2. Resource-Constrained Environments
Embedded Systems:
- Space robotics (Astrobee): ARM processors, limited memory
- Drones: Real-time on mobile GPUs
- Edge devices: Can't run large neural networks
Example: Astrobee on ISS
- BRISK features: 15-20ms per frame on ARM Cortex-A9
- EPnP + RANSAC: 10ms for pose estimation
- Total: 30 FPS real-time on space-grade hardware
- No GPU needed, deterministic performance
3. Limited or No Training Data
Novel Environments:
- Space stations, underwater, caves
- No pre-existing datasets
- Classical methods generalize without training
Custom Scenarios:
- Unique industrial inspection tasks
- Scientific instruments
- One-off robotics applications
4. Safety-Critical Systems
Interpretability:
- Geometric methods have clear failure modes
- Can prove mathematical correctness
- Easier to validate and test
Predictability:
- Deterministic behavior
- No silent failures from out-of-distribution data
Example: Autonomous vehicles
- Use classical methods for localization (HD maps + particle filters)
- Combine with learned perception for robustness
5. Real-Time Requirements
Low Latency:
- Feature detection: < 10ms
- Optical flow (Lucas-Kanade): < 5ms
- Template matching: < 1ms
Predictable Timing:
- No variable network inference time
- Can guarantee real-time deadlines
When to Use Modern CV (Deep Learning)
1. Semantic Understanding
Object Recognition:
- Classify objects, detect instances
- Handle intra-class variation
- Learn complex appearance models
Scene Understanding:
- Semantic segmentation (road, sidewalk, building, vegetation)
- Instance segmentation (separate objects)
- Panoptic segmentation (stuff + things)
Example: Zipline Delivery Zones
- Segment aerial imagery into safe landing zones
- Identify obstacles (trees, power lines, buildings)
- Learn from labeled satellite/drone data
2. Dense Prediction Tasks
Monocular Depth Estimation:
- Predict depth from single image
- Classical methods require stereo or motion
- Networks learn geometric priors from data
Optical Flow:
- Dense motion field estimation
- FlowNet, RAFT outperform classical methods
- Better at motion boundaries
Surface Normal Estimation:
- Predict 3D orientation from shading
- Learns shape-from-shading priors
3. Ill-Posed Problems
Super-Resolution:
- Hallucinate high-frequency details
- Learned priors from natural image statistics
Denoising/Inpainting:
- Remove artifacts, fill missing regions
- Neural priors for plausible completions
HDR Reconstruction:
- Merge multiple exposures
- Handle saturated regions
4. Large-Scale Annotated Datasets
Pre-Trained Models:
- ImageNet (1.4M images)
- COCO (330K images)
- Cityscapes (5K urban scenes)
Transfer Learning:
- Fine-tune on specific task
- Requires less data than training from scratch
5. Complex Appearance Variations
Illumination Changes:
- Shadows, highlights, reflections
- Learned features are more robust than hand-crafted
Occlusions:
- Partially visible objects
- Networks learn to complete from context
Clutter:
- Crowded scenes with overlapping objects
- Attention mechanisms focus on relevant features
Hybrid Approaches: Best of Both Worlds
The most robust production systems combine classical and modern techniques.
Architecture Pattern
Raw Sensor Data
β
[Classical Preprocessing]
- Undistortion
- Rectification
- Feature detection
β
[Learned Feature Extraction]
- CNN backbone
- Feature pyramid
β
[Classical Geometric Reasoning]
- Epipolar constraints
- PnP / Triangulation
- Bundle adjustment
β
[Learned Refinement]
- Pose refinement network
- Depth completion
β
Output (Pose, Depth, Segmentation)
Example Systems
ORB-SLAM3 + DepthNet:
- Classical: ORB features, tracking, bundle adjustment
- Learned: Monocular depth prediction
- Combination: Depth network provides scale, SLAM provides consistency
DeepTAM:
- Classical: Camera pose optimization
- Learned: Dense depth prediction
- Combination: Geometric consistency constrains network
Astrobee + Future ML:
- Classical: Current production system (feature tracking, PnP)
- Learned: Potential future addition (semantic understanding, texture-less localization)
- Combination: ML provides hints, geometry provides precision
When to Combine
Learned Feature Detection + Classical Matching:
- SuperPoint, DISK for learned keypoints
- Geometric verification (RANSAC, epipolar constraints)
- Best of both: robust features + geometric consistency
Classical Depth + Learned Completion:
- Stereo matching for sparse/semi-dense depth
- Neural network fills holes, smooths noise
- Hybrid: metric accuracy + dense output
Semantic Segmentation + Geometric SLAM:
- Segment scene into semantic classes
- Use only static classes (road, building) for SLAM
- Ignore dynamic objects (cars, people)
- Reduces drift in dynamic environments
Decision Framework
Choose Classical If:
- β
Problem has clear geometric structure
- β
Computation is constrained (embedded systems)
- β
Interpretability is critical (safety systems)
- β
Training data is scarce or unavailable
- β
Need predictable, deterministic behavior
- β
Real-time deadlines are strict
Choose Modern If:
- β
Need semantic understanding (object classes, attributes)
- β
Have large labeled dataset or pre-trained models
- β
Problem is appearance-based (recognition, classification)
- β
Dealing with complex variations (lighting, occlusions)
- β
Have GPU compute available
- β
Can tolerate occasional edge-case failures
Combine Both If:
- β
Building production system (most robust approach)
- β
Need both geometric precision and semantic understanding
- β
Want geometric constraints to validate neural outputs
- β
Targeting autonomous systems (cars, drones, robots)
Production Considerations
Validation Strategy
Classical Methods:
- Unit tests on synthetic data
- Verify geometric properties (epipolar error, reprojection error)
- Analytical error bounds
Learned Methods:
- Test set evaluation (but may not cover edge cases)
- Cross-validation on multiple datasets
- Adversarial testing
- Out-of-distribution detection
Hybrid:
- Use geometry to validate neural predictions
- Flag inconsistencies for human review
- Degrade gracefully (fallback to classical)
Debugging
Classical:
- Visualize features, matches, epipolar lines
- Check algebraic constraints
- Trace mathematical errors
Learned:
- Visualize activation maps, attention
- Check for overfitting, dataset bias
- Analyze failure modes statistically
Deployment
Classical:
- Lightweight: CPU-only, low memory
- Deterministic: Same input β same output
- Portable: Easy to cross-compile for embedded
Learned:
- Heavy: GPU required for real-time
- Stochastic: Dropout, quantization introduce variability
- Complex: ONNX, TensorRT, model optimization
Case Studies
Zipline Autonomous Delivery
Offboard Perception (Cloud-side):
- Classical: Structure from Motion for 3D mapping
- Learned: Semantic segmentation for safe zones
- Hybrid: Geometric 3D + semantic understanding
Onboard Perception (Aircraft):
- Classical: Visual odometry, PnP localization
- Learned: Obstacle detection, landing zone validation
- Hybrid: Learned detections validated by geometry
Self-Driving Cars
Localization:
- Classical: Particle filter with HD maps, ICP
- Learned: Learned features for robust matching
- Hybrid: Classical provides metric pose, learned handles appearance changes
Perception:
- Learned: Object detection (pedestrians, vehicles)
- Classical: Sensor fusion (cameras, lidar, radar)
- Hybrid: Geometric tracking + semantic classification
Mobile Robotics (Astrobee)
Current System:
- Classical: BRISK features, bag-of-words, EPnP
- Why: Resource-constrained, safety-critical, no training data for ISS
Future Enhancements:
- Learned: Semantic understanding (airlock, handrails, equipment)
- Hybrid: Classical pose + learned scene understanding
Key Takeaways
-
Classical CV is not obsolete - Essential for geometric reasoning, resource-constrained, safety-critical systems
-
Deep learning excels at semantic tasks - Object recognition, segmentation, learning complex appearance models
-
Hybrid systems are most robust - Combine geometric constraints with learned features
-
Choose based on constraints:
- Data availability (classical needs less)
- Compute budget (classical is lightweight)
- Interpretability (classical is transparent)
- Task type (geometric vs semantic)
-
Production systems benefit from both:
- Classical provides metric accuracy and consistency
- Learned provides robustness and semantic understanding
- Geometric constraints validate neural predictions
-
Understand the math - Even when using learned methods, geometric reasoning validates outputs
For perception engineers:
- Master both paradigms
- Know when each applies
- Build hybrid systems for production
- Use geometry to keep networks honest
- Always validate with classical methods
References
- Hartley & Zisserman, "Multiple View Geometry in Computer Vision"
- Szeliski, "Computer Vision: Algorithms and Applications"
- LeCun et al., "Deep Learning" (Nature 2015)
- Kendall et al., "Geometric Loss Functions for Camera Pose Regression" (CVPR 2017)
- DeTone et al., "SuperPoint: Self-Supervised Interest Point Detection" (CVPR 2018)
- ORB-SLAM3: https://github.com/UZ-SLAMLab/ORB_SLAM3