Official Program for CVPR15

Monday, June 8
8:30am-8:40am Ballrooms A,B,C Opening Remarks from Conference Chairs
8:40am-10:10am Oral Session
10:10am-12:30pm Exhibit Hall A Poster Session 1A
12:30pm-2:00pm Exhibit Hall B Lunch
2:00pm-3:30pm Oral Session
3:30pm-6:00pm Exhibit Hall A Poster Session 1B
6:00pm-7:30pm Ballrooms A,B,C Reception & Awards
7:30pm-8:30pm Rooms 302,304,306 PAMI Technical Committee/Computer Vision Foundation Meeting
Tuesday, June 9
8:30am-10:00am Oral Session
10:00am-12:30pm Exhibit Hall A Poster Session 2A
12:30pm-2:00pm Exhibit Hall B Lunch
2:00pm-3:30pm Oral Session
3:30pm-6:00pm Exhibit Hall A Poster Session 2B
6:00pm-9:00pm Sheraton Grand Ballroom Banquet Dinner
Wednesday, June 10
8:30am-10:00am Oral Session
10:30am-11:25am Ballrooms A,B,C Plenary Speaker:
11:30am-12:25pm Ballrooms A,B,C Plenary Speaker:
12:30pm-2:00pm Exhibit Hall B Lunch
2:00pm-3:30pm Oral Session
3:30pm-6:00pm Exhibit Hall A Poster Session 3B

Monday June 8, 8:40am-10:10am
CNN Architectures Depth and 3D Surfaces
Ballrooms A,B,C Rooms 302,304,306
Hypercolumns for Object Segmentation and Fine-Grained Localization DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time
Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection 3D Scanning Deformable Objects With a Single RGBD Sensor
Improving Object Detection With Deep Convolutional Networks via Bayesian Optimization and Structured Prediction An Efficient Volumetric Framework for Shape Tracking
Going Deeper With Convolutions Part-Based Modelling of Compound Scenes From Images
Understanding Image Representations by Measuring Their Equivariance and Equivalence SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images Small-Variance Nonparametric Clustering on the Hypersphere

Monday June 8, 10:10am-12:30pm
Poster Session
Session 1A, Exhibit Hall A
Poster #Title and Authors
1Going Deeper With Convolutions
2Propagated Image Filtering
3Web Scale Photo Hash Clustering on A Single Machine
4Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos
5Supervised Discrete Hashing
6What do 15,000 Object Categories Tell Us About Classifying and Localizing Actions?
7Landmarks-Based Kernelized Subspace Alignment for Unsupervised Domain Adaptation
8Blur Kernel Estimation Using Normalized Color-Line Prior
9A Light Transport Model for Mitigating Multipath Interference in Time-of-Flight Sensors
10Traditional Saliency Reloaded: A Good Old Model in New Shape
11Automatic Construction Of Robust Spherical Harmonic Subspaces
12Leveraging Stereo Matching With Learning-Based Confidence Measures
13Saliency Detection via Cellular Automata
14Efficient Sparse-to-Dense Optical Flow Estimation Using a Learned Basis and Layers
15Learning Multiple Visual Tasks While Discovering Their Structure
16Projection Metric Learning on Grassmann Manifold With Application to Video Based Face Recognition
17Structural Sparse Tracking
18Data-Driven Depth Map Refinement via Multi-Scale Sparse Representation
19Uncalibrated Photometric Stereo Based on Elevation Angle Recovery From BRDF Symmetry of Isotropic Materials
20Attributes and Categories for Generic Instance Search From One Example
21Heat Diffusion Over Weighted Manifolds: A New Descriptor for Textured 3D Non-Rigid Shapes
22A Dynamic Programming Approach for Fast and Robust Object Pose Recognition From Range Images
23Beyond Gaussian Pyramid: Multi-Skip Feature Stacking for Action Recognition
24A Geodesic-Preserving Method for Image Warping
25Shape Driven Kernel Adaptation in Convolutional Neural Network for Robust Facial Traits Recognition
26From Categories to Subcategories: Large-Scale Image Classification With Partial Class Label Refinement
27Combination Features and Models for Human Detection
28Improving Object Detection With Deep Convolutional Networks via Bayesian Optimization and Structured Prediction
29A Metric Parametrization for Trifocal Tensors With Non-Colinear Pinholes
30An Efficient Volumetric Framework for Shape Tracking
31Structured Sparse Subspace Clustering: A Unified Optimization Framework
32Delving Into Egocentric Actions
33Latent Trees for Estimating Intensity of Facial Action Units
34Robust Regression on Image Manifolds for Ordered Label Denoising
35Privacy Preserving Optics for Miniature Vision Sensors
36Deep Transfer Metric Learning
37Small-Variance Nonparametric Clustering on the Hypersphere
38DynamicFusion: Reconstruction and Tracking of Non-Rigid Scenes in Real-Time
39Reliable Patch Trackers: Robust Visual Tracking by Exploiting Reliable Patches
40Predicting Eye Fixations Using Convolutional Neural Networks
41Kernel Fusion for Better Image Deblurring
42Direction Matters: Depth Estimation With a Surface Normal Classifier
43Modeling Local and Global Deformations in Deep Learning: Epitomic Convolution, Multiple Instance Learning, and Sliding Window Detection
44Grasp Type Revisited: A Modern Perspective on a Classical Feature for Vision
45Learning Hypergraph-Regularized Attribute Predictors
46A Coarse-to-Fine Model for 3D Pose Estimation and Sub-Category Recognition
47Deep Neural Networks Are Easily Fooled: High Confidence Predictions for Unrecognizable Images
48Deformable Part Models are Convolutional Neural Networks
49Hypercolumns for Object Segmentation and Fine-Grained Localization
50Mapping Visual Features to Semantic Profiles for Retrieval in Medical Imaging
51Event-Driven Stereo Matching for Real-Time 3D Panoramic Vision
52Graph-Based Simplex Method for Pairwise Energy Minimization With Binary Variables
53Image Denoising via Adaptive Soft-Thresholding Based on Non-Local Samples
543D Scanning Deformable Objects With a Single RGBD Sensor
55Nested Motion Descriptors
56Efficient Minimal-Surface Regularization of Perspective Depth Maps in Variational Stereo
57Maximum Persistency via Iterative Relaxed Inference With Graphical Models
58Deep Hierarchical Parsing for Semantic Segmentation
59Designing Deep Networks for Surface Normal Estimation
60Layered RGBD Scene Flow Estimation
61Hashing With Binary Autoencoders
62SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite
63Collaborative Feature Learning From Social Media
64Diversity-Induced Multi-View Subspace Clustering
65Building a Bird Recognition App and Large Scale Dataset With Citizen Scientists: The Fine Print in Fine-Grained Dataset Collection
66Early Burst Detection for Memory-Efficient Image Retrieval
67Indoor Scene Structure Analysis for Single Image Depth Estimation
68Light Field Layer Matting
69Depth Camera Tracking With Contour Cues
70Radial Distortion Homography
71Efficient Object Localization Using Convolutional Networks
72Just Noticeable Defocus Blur Detection and Estimation
73How Do We Use Our Hands? Discovering a Diverse Set of Common Grasps
74Rotating Your Face Using Multi-Task Deep Neural Network
75Is Object Localization for Free? - Weakly-Supervised Learning With Convolutional Neural Networks
76Super-Resolution Person Re-Identification With Semi-Coupled Low-Rank Discriminant Dictionary Learning
77Dual Domain Filters Based Texture and Structure Preserving Image Non-Blind Deconvolution
78Region-Based Temporally Consistent Video Post-Processing
79Global Refinement of Random Forest
80Adaptive Region Pooling for Object Detection
81Discriminative and Consistent Similarities in Instance-Level Multiple Instance Learning
82MUlti-Store Tracker (MUSTer): A Cognitive Psychology Inspired Approach to Object Tracking
83Finding Action Tubes
84Learning a Convolutional Neural Network for Non-Uniform Motion Blur Removal
85Complexity-Adaptive Distance Metric for Object Proposals Generation
86High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild
87Transformation of Markov Random Fields for Marginal Distribution Estimation
88Sparse Convolutional Neural Networks
89FaceNet: A Unified Embedding for Face Recognition and Clustering
90Cascaded Hand Pose Regression
91Cross-Scene Crowd Counting via Deep Convolutional Neural Networks
92The Application of Two-Level Attention Models in Deep Convolutional Neural Network for Fine-Grained Image Classification
93End-to-End Integration of a Convolution Network, Deformable Parts Model and Non-Maximum Suppression
94A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions
95Neuroaesthetics in Fashion: Modeling the Perception of Fashionability
96Part-Based Modelling of Compound Scenes From Images
97Efficient Parallel Optimization for Potts Energy With Hierarchical Fusion
98Pooled Motion Features for First-Person Videos
99Functional Correspondence by Matrix Completion
100Elastic-Net Regularization of Singular Values for Robust Subspace Learning
101Hardware Compliant Approximate Image Codes
102Photometric Refinement of Depth Maps for Multi-Albedo Objects
103Predicting the Future Behavior of a Time-Varying Probability Distribution
104Classifier Based Graph Construction for Video Segmentation
105ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding
106Mid-Level Deep Pattern Mining
107Prediction of Search Targets From Fixations in Open-World Settings
108Understanding Image Representations by Measuring Their Equivariance and Equivalence
109Effective Learning-Based Illuminant Estimation Using Simple Features
110PAIGE: PAirwise Image Geometry Encoding for Improved Efficiency in Structure-From-Motion
111Dense, Accurate Optical Flow Estimation With Piecewise Parametric Model
112Single-Image Estimation of the Camera Response Function in Near-Lighting
113Multispectral Pedestrian Detection: Benchmark Dataset and Baseline
114A Low-Dimensional Step Pattern Analysis Algorithm With Application to Multimodal Retinal Image Registration
115Bilinear Heterogeneous Information Machine for RGB-D Action Recognition
116MRF Optimization by Graph Approximation
117SALICON: Saliency in Context
118Weakly Supervised Object Detection With Convex Clustering
119Interleaved Text/Image Deep Mining on a Very Large-Scale Radiology Database
120Learning Semantic Relationships for Better Action Retrieval in Images
121Hierarchical Recurrent Neural Network for Skeleton Based Action Recognition

Monday June 8, 2:00pm-3:30pm
Discovery and Dense Correspondences 3D Shape: Matching, Recognition, Reconstruction
Ballrooms A,B,C Rooms 302,304,306
Discovering States and Transformations in Image Collections Category-Specific Object Reconstruction From a Single Image
Unsupervised Object Discovery and Localization in the Wild: Part-Based Matching With Bottom-Up Region Proposals Discriminative Shape From Shading in Uncalibrated Illumination
FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences Learning to Generate Chairs With Convolutional Neural Networks
EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow 3D ShapeNets: A Deep Representation for Volumetric Shapes
Phase-Based Frame Interpolation for Video Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks
Towards Open World Recognition Data-Driven 3D Voxel Patterns for Object Category Recognition

Monday June 8, 3:30pm-6:00pm
Poster Session
Session 1B, Exhibit Hall A
Poster #Title and Authors
1Depth and Surface Normal Estimation From Monocular Images Using Regression on Deep Features and Hierarchical CRFs
2Discriminative Shape From Shading in Uncalibrated Illumination
3Multi-Manifold Deep Metric Learning for Image Set Classification
4Target Identity-Aware Network Flow for Online Multiple Target Tracking
5Adaptive As-Natural-As-Possible Image Stitching
6EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow
7Learning Coarse-to-Fine Sparselets for Efficient Object Detection and Scene Classification
8Continuous Visibility Feature
9FlowWeb: Joint Image Set Alignment by Weaving Consistent, Pixel-Wise Correspondences
10Unsupervised Object Discovery and Localization in the Wild: Part-Based Matching With Bottom-Up Region Proposals
11Supervised Descriptor Learning for Multi-Output Regression
12A Statistical Model of Riemannian Metric Variation for Deformable Shape Analysis
13Temporally Coherent Interpretations for Long Videos Using Pattern Theory
14Line-Sweep: Cross-Ratio For Wide-Baseline Matching and 3D Reconstruction
15Simplified Mirror-Based Camera Pose Computation via Rotation Averaging
16On the Relationship Between Visual Attributes and Convolutional Networks
17Saliency Detection by Multi-Context Deep Learning
18DeepShape: Deep Learned Shape Descriptor for 3D Shape Matching and Retrieval
19Bayesian Adaptive Matrix Factorization With Automatic Model Selection
20Joint Action Recognition and Pose Estimation From Video
21Fast Action Proposals for Human Action Detection and Search
22Joint Multi-Feature Spatial Context for Scene Recognition on the Semantic Manifold
23Large-Scale Damage Detection Using Satellite Imagery
24A Novel Locally Linear KNN Model for Visual Recognition
25Bilinear Random Projections for Locality-Sensitive Binary Codes
26Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation
27Superpixel Segmentation Using Linear Spectral Clustering
28Person Count Localization in Videos From Noisy Foreground and Detections
29Good Features to Track for Visual SLAM
30Discovering States and Transformations in Image Collections
31Generalized Deformable Spatial Pyramid: Geometry-Preserving Dense Correspondence Estimation
32Classifier Adaptation at Prediction Time
33Phase-Based Frame Interpolation for Video
34Matching-CNN Meets KNN: Quasi-Parametric Human Parsing
35Absolute Pose for Cameras Under Flat Refractive Interfaces
36Protecting Against Screenshots: An Image Processing Approach
37Pose-Conditioned Joint Angle Limits for 3D Human Pose Reconstruction
38VisKE: Visual Knowledge Extraction and Question Answering by Visual Verification of Relation Phrases
39A Graphical Model Approach for Matching Partial Signatures
40From Captions to Visual Concepts and Back
41Semi-Supervised Low-Rank Mapping Learning for Multi-Label Classification
42ConceptLearner: Discovering Visual Concepts From Weakly Labeled Image Collections
43Computationally Bounded Retrieval
44Viewpoints and Keypoints
45Discrete Hyper-Graph Matching
46Rolling Shutter Motion Deblurring
47Learning to Generate Chairs With Convolutional Neural Networks
48Accurate Depth Map Estimation From a Lenslet Light Field Camera
49Deep Semantic Ranking Based Hashing for Multi-Label Image Retrieval
50Similarity Learning on an Explicit Polynomial Kernel Feature Map for Person Re-Identification
51Learning to Propose Objects
52Basis Mapping Based Boosting for Object Detection
53Computing the Stereo Matching Cost With a Convolutional Neural Network
54Recognize Complex Events From Static Images by Fusing Deep Channels
55Multi-Feature Max-Margin Hierarchical Bayesian Model for Action Recognition
56Model Recommendation: Generating Object Detectors From Few Samples
57A Linear Least-Squares Solution to Elastic Shape-From-Template
58Robust Large Scale Monocular Visual SLAM
59Membership Representation for Detecting Block-Diagonal Structure in Low-Rank or Sparse Subspace Clustering
60Bayesian Inference for Neighborhood Filters With Application in Denoising
61Deep LAC: Deep Localization, Alignment and Classification for Fine-Grained Recognition
62Unconstrained Realtime Facial Performance Capture
63Blind Optical Aberration Correction by Exploring Geometric and Visual Priors
64Ontological Supervision for Fine Grained Classification of Street View Storefronts
65Finding Distractors In Images
66From Image-Level to Pixel-Level Labeling With Convolutional Networks
67Semantic Alignment of LiDAR Data at City Scale
68Oriented Edge Forests for Boundary Detection
69Query-Adaptive Late Fusion for Image Search and Person Re-Identification
70Filtered Feature Channels for Pedestrian Detection
71GRSA: Generalized Range Swap Algorithm for the Efficient Optimization of MRFs
72PatchCut: Data-Driven Object Segmentation via Local Shape Transfer
73Illumination and Reflectance Spectra Separation of a Hyperspectral Image Meets Low-Rank Matrix Factorization
74Semantic Part Segmentation Using Compositional Model Combining Shape and Appearance
75A Discriminative CNN Video Representation for Event Detection
7624/7 Place Recognition by View Synthesis
77Understanding Image Virality
78Book2Movie: Aligning Video Scenes With Book Chapters
793D Model-Based Continuous Emotion Recognition
80Learning to Rank in Person Re-Identification With Metric Ensembles
81Making Better Use of Edges via Perceptual Grouping
82Real-Time Joint Estimation of Camera Orientation and Vanishing Points
83Sketch-Based 3D Shape Retrieval Using Convolutional Neural Networks
84Salient Object Detection via Bootstrap Learning
85Towards Open World Recognition
86Data-Driven 3D Voxel Patterns for Object Category Recognition
873D ShapeNets: A Deep Representation for Volumetric Shapes
88Robust Image Alignment With Multiple Feature Descriptors and Matching-Guided Neighborhoods
89Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A
90Depth From Shading, Defocus, and Correspondence Using Light-Field Angular Coherence
91New Insights Into Laplacian Similarity Search
92Feature-Independent Context Estimation for Automatic Image Annotation
93Category-Specific Object Reconstruction From a Single Image
94Active Sample Selection and Correction Propagation on a Gradually-Augmented Graph
95Efficient and Accurate Approximations of Nonlinear Convolutional Networks
96Ranking and Retrieval of Image Sequences From Multiple Paragraph Queries
97Casual Stereoscopic Panorama Stitching
98Superpixel Meshes for Fast Edge-Preserving Surface Reconstruction
99Best-Buddies Similarity for Robust Template Matching
100Superdifferential Cuts for Binary Energies
101The S-Hock Dataset: Analyzing Crowds at the Stadium
102Discriminant Analysis on Riemannian Manifold of Gaussian Distributions for Face Recognition With Image Sets
103Texture Representations for Image and Video Synthesis
104Shadow Optimization From Structured Deep Edge Detection
105Total Variation Regularization of Shape Signals
106Learning Similarity Metrics for Dynamic Scene Segmentation
107Subspace Clustering by Mixture of Gaussian Regression
108DASC: Dense Adaptive Self-Correlation Descriptor for Multi-Modal and Multi-Spectral Correspondence
109In Defense of Color-Based Model-Free Tracking
110Best of Both Worlds: Human-Machine Collaboration for Object Annotation
111Robust Multiple Homography Estimation: An Ill-Solved Problem
112Semi-Supervised Domain Adaptation With Subspace Learning for Visual Recognition
113Articulated Motion Discovery Using Pairs of Trajectories
114A Solution for Multi-Alignment by Transformation Synchronisation
115A Convex Optimization Approach to Robust Fundamental Matrix Estimation
116Simultaneous Pose and Non-Rigid Shape With Particle Dynamics
117Semi-Supervised Learning With Explicit Relationship Regularization
118Person Re-Identification by Local Maximal Occurrence Representation and Metric Learning
119Joint Patch and Multi-Label Learning for Facial Action Unit Detection
120Real-Time Visual Analysis of Microvascular Blood Flow for Critical Care

Tuesday June 9, 8:30am-10:00am
Images and Language Multiple View Geometry
Ballrooms A,B,C Rooms 302,304,306
Show and Tell: A Neural Image Caption Generator Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset)
Deep Visual-Semantic Alignments for Generating Image Descriptions Joint Vanishing Point Extraction and Tracking
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description Robust Camera Location Estimation by Convex Programming
Image Specificity Efficient Globally Optimal Consensus Maximisation With Tree Search
Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks R6P - Rolling Shutter Absolute Camera Pose
Becoming the Expert - Interactive Multi-Class Machine Teaching Building Proteins in a Day: Efficient 3D Molecular Reconstruction

Tuesday June 9, 10:00am-12:30pm
Poster Session
Session 2A, Exhibit Hall A
Poster #Title and Authors
1JOTS: Joint Online Tracking and Segmentation
2Gaze-Enabled Egocentric Video Summarization via Constrained Submodular Maximization
3Sparse Depth Super Resolution
4Efficient Illuminant Estimation for Color Constancy Using Grey Pixels
5Can Humans Fly? Action Understanding With Multiple Classes of Actors
6Reweighted Laplace Prior Based Hyperspectral Compressive Sensing for Unknown Sparsity
7Class Consistent Multi-Modal Fusion With Binary Features
8R6P - Rolling Shutter Absolute Camera Pose
9Embedded Phase Shifting: Robust Phase Shifting With Embedded Signals
10Shape and Light Directions From Shading and Polarization
113D Deep Shape Descriptor
12Cross-Age Face Verification by Coordinating With Cross-Face Age Verification
13Beyond Mahalanobis Metric: Cayley-Klein Metric Learning
14From Dictionary of Visual Words to Subspaces: Locality-Constrained Affine Subspace Coding
15FPA-CS: Focal Plane Array-Based Compressive Imaging in Short-Wave Infrared
16BOLD - Binary Online Learned Descriptor For Efficient Image Matching
17Defocus Deblurring and Superresolution for Time-of-Flight Depth Cameras
18Burst Deblurring: Removing Camera Shake Through Fourier Burst Accumulation
19SOM: Semantic Obviousness Metric for Image Quality Assessment
20DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection
21Efficient Globally Optimal Consensus Maximisation With Tree Search
22Mind's Eye: A Recurrent Visual Representation for Image Caption Generation
23Hierarchical Sparse Coding With Geometric Prior For Visual Geo-Location
24P3.5P: Pose Estimation With Unknown Focal Length
25Joint Vanishing Point Extraction and Tracking
26Learning a Non-Linear Knowledge Transfer Model for Cross-View Action Recognition
27Random Tree Walk Toward Instantaneous 3D Human Pose Estimation
28Deep Hashing for Compact Binary Codes Learning
29Completing 3D Object Shape From One Depth Image
30Encoding Based Saliency Detection for Videos and Images
31Online Sketching Hashing
32Enriching Object Detection With 2D-3D Registration and Continuous Viewpoint Estimation
33Representing 3D Texture on Mesh Manifolds for Retrieval and Recognition Applications
34Saliency Propagation From Simple to Difficult
35Learning an Efficient Model of Hand Shape Variation From Depth Images
36On the Minimal Problems of Low-Rank Matrix Factorization
37Symmetry-Based Text Line Detection in Natural Scenes
38DevNet: A Deep Event Network for Multimedia Event Detection and Evidence Recounting
39Learning to Detect Motion Boundaries
40Improving Object Proposals With Multi-Thresholding Straddling Expansion
41Visual Recognition by Counting Instances: A Multi-Instance Cardinality Potential Kernel
42Unconstrained 3D Face Reconstruction
43Becoming the Expert - Interactive Multi-Class Machine Teaching
44Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
45Zero-Shot Object Recognition by Semantic Manifold Distance
46Hyper-Class Augmented and Regularized Deep Learning for Fine-Grained Image Classification
47Direct Structure Estimation for 3D Reconstruction
48Global Supervised Descent Method
49Robust Camera Location Estimation by Convex Programming
50Practical Robust Two-View Translation Estimation
51Learning From Massive Noisy Labeled Data for Image Classification
52KL Divergence Based Agglomerative Clustering for Automated Vitiligo Grading
53Robust Saliency Detection via Regularized Random Walks Ranking
54Weakly Supervised Semantic Segmentation for Social Images
55Image Specificity
56A Multi-Plane Block-Coordinate Frank-Wolfe Algorithm for Training Structural SVMs With a Costly Max-Oracle
57Web-Scale Training for Face Identification
58Dynamically Encoded Actions Based on Spacetime Saliency
59Three Viewpoints Toward Exemplar SVM
60Visual Recognition by Learning From Web Data: A Weakly Supervised Domain Generalization Approach
61Clustering of Static-Adaptive Correspondences for Deformable Object Tracking
62Geo-Semantic Segmentation
63Towards Unified Depth and Semantic Prediction From a Single Image
64Towards Force Sensing From Vision: Observing Hand-Object Interactions to Infer Manipulation Forces
65A MRF Shape Prior for Facade Parsing With Occlusions
66Probability Occupancy Maps for Occluded Depth Images
67Segment Based 3D Object Shape Priors
68Shape-From-Template in Flatland
69Understanding Tools: Task-Oriented Object Modeling, Learning and Recognition
70Deep Roto-Translation Scattering for Object Classification
71Non-Rigid Registration of Images With Geometric and Photometric Deformation by Using Local Affine Fourier-Moment Matching
72Detector Discovery in the Wild: Joint Multiple Instance and Representation Learning
73Deeply Learned Face Representations Are Sparse, Selective, and Robust
74Unsupervised Visual Alignment With Similarity Graphs
75Video Anomaly Detection and Localization Using Hierarchical Feature Representation and Gaussian Process Regression
76Inferring 3D Layout of Building Facades From a Single Image
77Evaluation of Output Embeddings for Fine-Grained Image Classification
78Virtual View Networks for Object Reconstruction
79Real-Time Coarse-to-Fine Topologically Preserving Segmentation
80Supervised Mid-Level Features for Word Image Representation
81Learning Lightness From Human Judgement on Relative Reflectance
82Scene Classification With Semantic Fisher Vectors
83Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks
84Co-Saliency Detection via Looking Deep and Wide
85Adopting an Unconstrained Ray Model in Light-Field Cameras for 3D Shape Reconstruction
86Towards 3D Object Detection With Bimodal Deep Boltzmann Machines Over RGBD Imagery
87An Active Search Strategy for Efficient Object Class Detection
88Geodesic Exponential Kernels: When Curvature and Linearity Conflict
89Transformation-Invariant Convolutional Jungles
90Exemplar SVMs as Visual Feature Encoders
91Object Scene Flow for Autonomous Vehicles
92Reflectance Hashing for Material Recognition
93Joint Photo Stream and Blog Post Summarization and Exploration
94Video Summarization by Learning Submodular Mixtures of Objectives
95Building Proteins in a Day: Efficient 3D Molecular Reconstruction
96Learning Descriptors for Object Recognition and 3D Pose Estimation
97Image Partitioning Into Convex Polygons
98Deep Visual-Semantic Alignments for Generating Image Descriptions
99Unsupervised Learning of Complex Articulated Kinematic Structures Combining Motion and Skeleton Information
100Elastic Functional Coding of Human Actions: From Vector-Fields to Latent Variables
101Show and Tell: A Neural Image Caption Generator
102Descriptor Free Visual Indoor Localization With Line Segments
103Fixation Bank: Learning to Reweight Fixation Candidates
104Deep Networks for Saliency Detection via Local Estimation and Global Search
105Reflection Removal Using Ghosting Cues
106A Dataset for Movie Description
107Fast and Robust Hand Tracking Using Detection-Guided Optimization
108Efficient SDP Inference for Fully-Connected CRFs Based on Low-Rank Decomposition
109Discriminative Learning of Iteration-Wise Priors for Blind Deconvolution
110Eye Tracking Assisted Extraction of Attentionally Important Objects From Videos
111Multi-View Feature Engineering and Learning
112Self Scaled Regularized Robust Regression
113Simultaneous Feature Learning and Hash Coding With Deep Neural Networks
114MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching
115Reconstructing the World* in Six Days *(As Captured by the Yahoo 100 Million Image Dataset)
116Exact Bias Correction and Covariance Estimation for Stereo Vision
117Computing Similarity Transformations From Only Image Correspondences
118Image Segmentation in Twenty Questions
119Interaction Part Mining: A Mid-Level Approach for Fine-Grained Action Recognition
120Sparse Projections for High-Dimensional Binary Codes

Tuesday June 9, 2:00pm-3:30pm
Segmentation in Images and Video 3D Models and Images
Ballrooms A,B,C Rooms 302,304,306
Causal Video Object Segmentation From Persistence of Occlusions Picture: A Probabilistic Programming Language for Scene Perception
Semantic Object Segmentation via Detection in Weakly Labeled Video Rent3D: Floor-Plan Priors for Monocular Layout Estimation
Fully Convolutional Networks for Semantic Segmentation The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose
Shape-Tailored Local Descriptors and Their Application to Segmentation and Tracking 3D Shape Estimation From 2D Landmarks: A Convex Relaxation Approach
Deep Filter Banks for Texture Recognition and Segmentation Holistic 3D Scene Understanding From a Single Geo-Tagged Image
Active Learning for Structured Probabilistic Models With Histogram Approximation Joint SFM and Detection Cues for Monocular 3D Localization in Road Scenes

Tuesday June 9, 3:30pm-6:00pm
Poster Session
Session 2B, Exhibit Hall A
Poster #Title and Authors
1Hierarchically-Constrained Optical Flow
2The k-Support Norm and Convex Envelopes of Cardinality and Rank
3Matching Bags of Regions in RGBD images
4Recurrent Convolutional Neural Network for Object Recognition
5Feedforward Semantic Segmentation With Zoom-Out Features
6The Aperture Problem for Refractive Motion
7Saliency-Aware Geodesic Video Object Segmentation
8DEEP-CARVING: Discovering Visual Attributes by Carving Deep Neural Nets
9Rent3D: Floor-Plan Priors for Monocular Layout Estimation
10Learning a Sequential Search for Landmarks
11Fully Convolutional Networks for Semantic Segmentation
12Deep Correlation for Matching Images and Text
13Multi-Objective Convolutional Learning for Face Labeling
14Deep Multiple Instance Learning for Image Classification and Auto-Annotation
15Multi-Instance Object Segmentation With Occlusion Handling
16Material Recognition in the Wild With the Materials in Context Database
17Understanding Pedestrian Behaviors From Stationary Crowd Groups
18Depth From Focus With Your Mobile Phone
19Fusion Moves for Correlation Clustering
20Second-Order Constrained Parametric Proposals and Sequential Search-Based Structured Prediction for Semantic Segmentation in RGB-D Images
21Metric Imitation by Manifold Transfer for Efficient Vision Applications
22The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose
23Scene Labeling With LSTM Recurrent Neural Networks
24FAemb: A Function Approximation-Based Embedding Method for Image Retrieval
25Automatically Discovering Local Visual Material Attributes
26Depth Image Enhancement Using Local Tangent Plane Approximations
27Video Co-Summarization: Video Summarization by Visual Co-Occurrence
28Watch and Learn: Semi-Supervised Learning for Object Detectors From Video
29Generalized Tensor Total Variation Minimization for Visual Data Recovery
30Active Learning for Structured Probabilistic Models With Histogram Approximation
31Image Parsing With a Wide Range of Classes and Scene-Level Context
32Bayesian Sparse Representation for Hyperspectral Image Super Resolution
33Semantic Object Segmentation via Detection in Weakly Labeled Video
34Learning With Dataset Bias in Latent Subcategory Models
35Project-Out Cascaded Regression With an Application to Face Alignment
36Image Retrieval Using Scene Graphs
37Unifying Holistic and Parts-Based Deformable Model Fitting
38Small Instance Detection by Integer Programming on Object Density Maps
39Motion Part Regularization: Improving Action Recognition via Trajectory Selection
40Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection
41Fine-Grained Visual Categorization via Multi-Stage Metric Learning
42Saturation-Preserving Specular Reflection Separation
43Joint SFM and Detection Cues for Monocular 3D Localization in Road Scenes
44Fisher Vectors Meet Neural Networks: A Hybrid Classification Architecture
45UniHIST: A Unified Framework for Image Restoration With Marginal Histogram Constraints
46Human Action Segmentation With Hierarchical Supervoxel Consistency
47Robust Manhattan Frame Estimation From a Single RGB-D Image
48Learning to Segment Under Various Forms of Weak Supervision
49Fast and Accurate Image Upscaling With Super-Resolution Forests
50Light Field From Micro-Baseline Image Pair
51Efficient ConvNet-Based Marker-Less Motion Capture in General Scenes With a Low Number of Cameras
52Learning Scene-Specific Pedestrian Detectors Without Real Data
53Deep Filter Banks for Texture Recognition and Segmentation
54Multiple Random Walkers and Their Application to Image Cosegmentation
55Beyond the Shortest Path : Unsupervised Domain Adaptation by Sampling Subspaces Along the Spline Flow
56Spherical Embedding of Inlier Silhouette Dissimilarities
57Semantics-Preserving Hashing for Cross-View Retrieval
58Object Proposal by Multi-Branch Hierarchical Segmentation
59Ambient Occlusion via Compressive Visibility Estimation
60Shape-Tailored Local Descriptors and Their Application to Segmentation and Tracking
61Scalable Object Detection by Filter Compression With Regularized Sparse Coding
62An Improved Deep Learning Architecture for Person Re-Identification
63Understanding Classifier Errors by Examining Influential Neighbors
64Riemannian Coding and Dictionary Learning: Kernels to the Rescue
65Scalable Structure From Motion for Densely Sampled Videos
66Parsing Occluded People by Flexible Compositions
67Joint Calibration of Ensemble of Exemplar SVMs
68Holistic 3D Scene Understanding From a Single Geo-Tagged Image
69A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
70DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing Loss for Contour Detection
71Convolutional Feature Masking for Joint Object and Stuff Segmentation
72A Fixed Viewpoint Approach for Dense Reconstruction of Transparent Objects
73Low-Level Vision by Consensus in a Spatial Hierarchy of Regions
74Line Drawing Interpretation in a Multi-View Context
75Toward User-Specific Tracking by Detection of Human Shapes in Multi-Cameras
76Intra-Frame Deblurring by Leveraging Inter-Frame Camera Motion
77Salient Object Subitizing
78Hierarchical-PEP Model for Real-World Face Recognition
79The Common Self-Polar Triangle of Concentric Circles and Its Application to Camera Calibration
80Taking a Deeper Look at Pedestrians
81Learning to Segment Moving Objects in Videos
82GMMCP Tracker: Globally Optimal Generalized Maximum Multi Clique Problem for Multiple Object Tracking
83Learning Graph Structure for Multi-Label Image Classification via Clique Generation
84Matrix Completion for Resolving Label Ambiguity
85Video Magnification in Presence of Large Motions
86Flying Objects Detection From a Single Moving Camera
87Line-Based Multi-Label Energy Optimization for Fisheye Image Rectification and Calibration
88Adaptive Eye-Camera Calibration for Head-Worn Devices
89Modeling Object Appearance Using Context-Conditioned Component Analysis
90Displets: Resolving Stereo Ambiguities Using Object Knowledge
91Time-to-Contact From Image Intensity
92Transferring a Semantic Representation for Person Re-Identification and Search
93Robust Video Segment Proposals With Painless Occlusion Handling
94Face Alignment Using Cascade Gaussian Process Regression Trees
95Regularizing Max-Margin Exemplars by Reconstruction and Generative Models
96A Fast Algorithm for Elastic Shape Distances Between Closed Planar Curves
97Reflection Removal for In-Vehicle Black Box Videos
98Tree Quantization for Large-Scale Similarity Search and Classification
99Integrating Parametric and Non-Parametric Models For Scene Labeling
100Mining Semantic Affordances of Visual Object Categories
101Causal Video Object Segmentation From Persistence of Occlusions
102Multiple Instance Learning for Soft Bags via Top Instances
103Multiclass Semantic Video Segmentation With Object-Level Active Inference
104Effective Face Frontalization in Unconstrained Images
105Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
106Weakly Supervised Localization of Novel Objects Using Appearance Transfer
107First-Person Pose Recognition Using Egocentric Workspaces
108Simultaneous Time-of-Flight Sensing and Photometric Stereo With a Single ToF Sensor
109Active Learning and Discovery of Object Categories in the Presence of Unnameable Instances
110Learning to Compare Image Patches via Convolutional Neural Networks
111Watch-n-Patch: Unsupervised Understanding of Actions and Relations
112Optimal Graph Learning With Partial Tags and Multiple Features for Image and Video Annotation
113DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection
114Picture: A Probabilistic Programming Language for Scene Perception
115Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization
116Fusing Subcategory Probabilities for Texture Classification
117Video Event Recognition With Deep Hierarchical Context Model
118Object-Based RGBD Image Co-Segmentation With Mutex Constraint
119Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors
1203D Shape Estimation From 2D Landmarks: A Convex Relaxation Approach

Wednesday June 10, 8:30am-10:00am
Action and Event Recognition Computational Photography
Ballrooms A,B,C Rooms 302,304,306
How Many Bits Does it Take for a Stimulus to Be Salient? Visual Vibrometry: Estimating Material Properties From Small Motion in Video
Deeply Learned Attributes for Crowded Scene Understanding Recovering Inner Slices of Translucent Objects by Multi-Frequency Illumination
Joint Inference of Groups, Events and Human Roles in Aerial Videos Fast Bilateral-Space Stereo for Synthetic Defocus
Modeling Video Evolution for Action Recognition Simultaneous Video Defogging and Stereo Reconstruction
Space-Time Tree Ensemble for Action Recognition One-Day Outdoor Photometric Stereo via Skylight Estimation
Social Saliency Prediction

Wednesday June 10, 10:30am-12:25pm
Plenary Speakers
Ballrooms A,B,C
What's Wrong with Deep Learning?
Yann LeCun
Facebook AI Research & New York University

Deep learning methods have had a profound impact on a number of areas in recent years, including natural image understanding and speech recognition. Other areas seem on the verge of being similarly impacted, notably natural language processing, biomedical image analysis, and the analysis of sequential signals in a variety of application domains. But deep learning systems, as they exist today, have many limitations.

First, they lack mechanisms for reasoning, search, and inference. Complex and/or ambiguous inputs require deliberate reasoning to arrive at a consistent interpretation. Producing structured outputs, such as a long text, or a label map for image segmentation, require sophisticated search and inference algorithms to satisfy complex sets of constraints. One approach to this problem is to marry deep learning with structured prediction (an idea first presented at CVPR 1997). While several deep learning systems augmented with structured prediction modules trained end to end have been proposed for OCR, body pose estimation, and semantic segmentation, new concepts are needed for tasks that require more complex reasoning.

Second, they lack short-term memory. Many tasks in natural language understanding, such as question-answering, require a way to temporarily store isolated facts. Correctly interpreting events in a video and being able to answer questions about it requires remembering abstract representations of what happens in the video. Deep learning systems, including recurrent nets, are notoriously inefficient at storing temporary memories. This has led researchers to propose neural nets systems augmented with separate memory modules, such as LSTM, Memory Networks, Neural Turing Machines, and Stack-Augmented RNN. While these proposals are interesting, new ideas are needed.

Lastly, they lack the ability to perform unsupervised learning. Animals and humans learn most of the structure of the perceptual world in an unsupervised manner. While the interest of the ML community in neural nets was revived in the mid-2000s by progress in unsupervised learning, the vast majority of practical applications of deep learning have used purely supervised learning. There is little doubt that future progress in computer vision will require breakthroughs in unsupervised learning, particularly for video understanding, But what principles should unsupervised learning be based on?

Preliminary works in each of these areas pave the way for future progress in image and video understanding.


Yann LeCun is Director of AI Research at Facebook, and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University, affiliated with the NYU Center for Data Science, the Courant Institute of Mathematical Science, the Center for Neural Science, and the Electrical and Computer Engineering Department.

He received the Electrical Engineer Diploma from Ecole Superieure d'Ingenieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Universite Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor, after a brief period as a Fellow of the NEC Research Institute in Princeton. He directed NYU's initiative in data science and became the founding director of the NYU Center for Data Science. He was named Director of AI Research at Facebook in late and retains a part-time position on the NYU faculty.

His current interests include AI, machine learning, computer perception, mobile robotics, and computational neuroscience. He has published over 180 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and on dedicated circuits and architectures for computer perception. The character recognition technology he developed at Bell Labs is used by several banks around the world to read checks and was reading between 10 and 20% of all the checks in the US in the early 2000s. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web. Since the late 80's he has been working on deep learning methods, particularly the convolutional network model, which is the basis of many products and services deployed by companies such as Facebook, Google, Microsoft, Baidu, IBM, NEC, AT&T and others for image and video understanding, document recognition, human-computer interaction, and speech recognition.

LeCun has been on the editorial board of IJCV, IEEE PAMI, and IEEE Trans. Neural Networks, was program chair of CVPR'06, and is chair of ICLR. He is on the science advisory board of Institute for Pure and Applied Mathematics, and has advised many large and small companies about machine learning technology, including several startups he co-founded. He is the lead faculty at NYU for the Moore-Sloan Data Science Environment, a $36M initiative in collaboration with UC Berkeley and University of Washington to develop data-driven methods in the sciences. He is the recipient of the IEEE Neural Network Pioneer Award.

Reverse Engineering the Human Visual System
Jack L. Gallant
University of California at Berkeley

The human brain is the most sophisticated image processing system known, capable of impressive feats of recognition and discrimination under challenging natural conditions. Reverse-engineering the brain might enable us to design artificial systems with the same capabilities. My laboratory uses a data-driven system identification approach to tackle this reverse-engineering problem. Our approach consists of four broad stages. First, we use functional MRI to measure brain activity while people watch naturalistic movies. We divide these data into two parts, one use to fit models and one for testing model predictions. Second, we use a system identification framework (based on multiple linearizing feature spaces) to model activity measured at each point in the brain. Third, we inspect the most accurate models to understand how the brain represents low-, mid- and high-level information in the movies. Finally, we use the estimated models to decode brain activity, reconstructing the structural and semantic content in the movies. Any effort to reverse-engineer the brain is inevitably limited by the spatial and temporal resolution of brain measurements, and at this time the resolution of human brain measurements is relatively poor. Still, as measurement technology progresses this framework could inform development of biologically-inspired computer vision systems, and it could aid in development of practical new brain reading technologies.


Jack Gallant is Chancellor's Professor of Psychology at the University of California at Berkeley. He is affiliated with the graduate programs in Bioengineering, Biophysics, Neuroscience and Vision Science. He received his Ph.D. from Yale University and did post-doctoral work at the California Institute of Technology and Washington University Medical School. His research program focuses on computational modeling of the human brain. These models accurately describe how the brain encodes information during complex, naturalistic tasks, and they show how information about the external and internal world are mapped systematically across the surface of the cerebral cortex. These models can also be used to decode information in the brain in order to reconstruct mental experiences. Gallant's brain decoding algorithm was one of Times Magazine's Inventions of the Year, and he appears frequently on radio and television. Further information about ongoing work in the Gallant lab, links to talks and papers, and links to an online interactive brain viewer.

Wednesday June 10, 2:00pm-3:30pm
Learning and Matching Local Features Image and Video Processing and Restoration
Ballrooms A,B,C Rooms 302,304,306
Domain-Size Pooling in Local Descriptors: DSP-SIFT Generalized Video Deblurring for Dynamic Scenes
Learning Deep Representations for Ground-to-Aerial Geolocalization Approximate Nearest Neighbor Fields in Video
Understanding Deep Image Representations by Inverting Them Single Image Super-Resolution From Transformed Self-Exemplars
Situational Object Boundary Detection L0TV: A New Method for Image Restoration in the Presence of Impulse Noise
Fast 2D Border Ownership Assignment On Learning Optimized Reaction Diffusion Processes for Effective Image Restoration
A Flexible Tensor Block Coordinate Ascent Scheme for Hypergraph Matching Fast and Flexible Convolutional Sparse Coding

Wednesday June 10, 3:30pm-6:00pm
Poster Session
Session 3B, Exhibit Hall A
Poster #Title and Authors
13D All The Way: Semantic Segmentation of Urban Scenes From Start to End in 3D
2Fast Bilateral-Space Stereo for Synthetic Defocus
3Large-Scale and Drift-Free Surface Reconstruction Using Online Subvolume Registration
4Fast Randomized Singular Value Thresholding for Nuclear Norm Minimization
5LMI-Based 2D-3D Registration: From Uncalibrated Images to Euclidean Scene
6Clique-Graph Matching by Preserving Global & Local Structure
7Appearance-Based Gaze Estimation in the Wild
8One-Day Outdoor Photometric Stereo via Skylight Estimation
9A New Retraction for Accelerating the Riemannian Three-Factor Low-Rank Matrix Completion Algorithm
10Heteroscedastic Max-Min Distance Analysis
11Sparse Composite Quantization
12Sparse Representation Classification With Manifold Constraints Transfer
13CIDEr: Consensus-Based Image Description Evaluation
14Joint Inference of Groups, Events and Human Roles in Aerial Videos
15Photometric Stereo With Near Point Lighting: A Solution by Mesh Deformation
16Efficient Label Collection for Unlabeled Image Datasets
17Separating Objects and Clutter in Indoor Scenes
18FaLRR: A Fast Low Rank Representation Solver
19Simulating Makeup Through Physics-Based Manipulation of Intrinsic Image Layers
20Correlation Filters With Limited Boundaries
21Shape-Based Automatic Detection of a Large Number of 3D Facial Landmarks
22Material Classification With Thermal Imagery
23Deeply Learned Attributes for Crowded Scene Understanding
24Learning To Look Up: Realtime Monocular Gaze Correction Using Machine Learning
25Background Subtraction via Generalized Fused Lasso Foreground Modeling
26Mirror, Mirror on the Wall, Tell Me, Is the Error Small?
27Beyond Short Snippets: Deep Networks for Video Classification
28segDeepM: Exploiting Segmentation and Context in Deep Neural Networks for Object Detection
29Situational Object Boundary Detection
30Real-Time 3D Head Pose and Facial Landmark Estimation From Depth Images Using Triangular Surface Patch Features
31Aligning 3D Models to RGB-D Images of Cluttered Scenes
32A Stable Multi-Scale Kernel for Topological Machine Learning
33The Treasure Beneath Convolutional Layers: Cross-Convolutional-Layer Pooling for Image Classification
34Face Video Retrieval With Image Query via Hashing Across Euclidean Space and Riemannian Manifold
35EgoSampling: Fast-Forward and Stereo for Egocentric Videos
36Social Saliency Prediction
37Beyond Principal Components: Deep Boltzmann Machines for Face Modeling
38Statistical Inference Models for Image Datasets With Systematic Variations
39Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues
40Superpixel-Based Video Object Segmentation Using Perceptual Organization and Location Prior
41Robust Image Filtering Using Joint Static and Dynamic Guidance
42Solving Multiple Square Jigsaw Puzzles With Missing Pieces
43A Dynamic Convolutional Layer for Short Range Weather Prediction
44SWIFT: Sparse Withdrawal of Inliers in a First Trial
45VIP: Finding Important People in Images
46Dataset Fingerprints: Exploring Image Collections Through Data Mining
47Transport-Based Single Frame Super Resolution of Very Low Resolution Face Images
483D Reconstruction in the Presence of Glasses by Acoustic and Stereo Fusion
49Deep Sparse Representation for Robust Image Registration
50Real-Time Part-Based Visual Tracking via Adaptive Correlation Filters
51Beyond Spatial Pooling: Fine-Grained Representation Learning in Multiple Domains
52HC-Search for Structured Prediction in Computer Vision
53Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval
54High-Speed Hyperspectral Video Acquisition With a Dual-Camera Architecture
55More About VLAD: A Leap From Euclidean to Riemannian Manifolds
56Camera Intrinsic Blur Kernel Estimation: A Reliable Framework
57Classifier Learning With Hidden Information
58Single Target Tracking Using Adaptive Clustered Decision Trees and Dynamic Multi-Level Appearance Models
59Simultaneous Video Defogging and Stereo Reconstruction
60Face Alignment by Coarse-to-Fine Shape Searching
61Learning Deep Representations for Ground-to-Aerial Geolocalization
62Unsupervised Simultaneous Orthogonal Basis Clustering Feature Selection
63Space-Time Tree Ensemble for Action Recognition
64Subgraph Decomposition for Multi-Target Tracking
65Understanding Image Structure via Hierarchical Shape Parsing
66Coarse-To-Fine Region Selection and Matching
67Label Consistent Quadratic Surrogate Model for Visual Saliency Prediction
68Subgraph Matching Using Compactness Prior for Robust Feature Correspondence
69Pedestrian Detection Aided by Deep Learning Semantic Tasks
70Multihypothesis Trajectory Analysis for Robust Visual Tracking
71Domain-Size Pooling in Local Descriptors: DSP-SIFT
72Object Detection by Labeling Superpixels
73Fast 2D Border Ownership Assignment
74From Single Image Query to Detailed 3D Reconstruction
75Fast and Flexible Convolutional Sparse Coding
76Iteratively Reweighted Graph Cut for Multi-Label MRFs With Non-Convex Priors
77Pairwise Geometric Matching for Large-Scale Object Retrieval
78Deep Convolutional Neural Fields for Depth Estimation From a Single Image
79Data-Driven Sparsity-Based Restoration of JPEG-Compressed Images in Dual Transform-Pixel Domain
80TVSum: Summarizing Web Videos Using Titles
81Understanding Deep Image Representations by Inverting Them
82Single Image Super-Resolution From Transformed Self-Exemplars
83Constrained Planar Cuts - Object Partitioning for Point Clouds
84A Weighted Sparse Coding Framework for Saliency Detection
85Handling Motion Blur in Multi-Frame Super-Resolution
86Approximate Nearest Neighbor Fields in Video
87Inverting RANSAC: Global Model Detection via Inlier Rate Estimation
88Robust Multi-Image Based Blind Face Hallucination
89On Learning Optimized Reaction Diffusion Processes for Effective Image Restoration
90A Flexible Tensor Block Coordinate Ascent Scheme for Hypergraph Matching
91TILDE: A Temporally Invariant Learned DEtector
92A Maximum Entropy Feature Descriptor for Age Invariant Face Recognition
93Sense Discovery via Co-Clustering on Images and Text
94An Approximate Shading Model for Object Relighting
95Deep Domain Adaptation for Describing People Based on Fine-Grained Clothing Attributes
96A Convolutional Neural Network Cascade for Face Detection
97Visual Vibrometry: Estimating Material Properties From Small Motion in Video
98Jointly Learning Heterogeneous Features for RGB-D Activity Recognition
99Convolutional Neural Networks at Constrained Time Cost
100Fine-Grained Histopathological Image Analysis via Robust Segmentation and Large-Scale Retrieval
101L0TV: A New Method for Image Restoration in the Presence of Impulse Noise
102Modeling Video Evolution for Action Recognition
103Long-Term Correlation Tracking
104Joint Tracking and Segmentation of Multiple Targets
105RGBD-Fusion: Real-Time High Precision Depth Recovery
106Modeling Deformable Gradient Compositions for Single-Image Super-Resolution
107Generalized Video Deblurring for Dynamic Scenes
108Active Pictorial Structures
109Ego-Surfing First-Person Videos
110Visual Saliency Based on Multiscale Deep Features
111Recovering Inner Slices of Translucent Objects by Multi-Frequency Illumination
112Local High-Order Regularization on Data Manifolds
113Fine-Grained Classification of Pedestrians in Video: Benchmark and State of the Art
114Curriculum Learning of Multiple Tasks
115How Many Bits Does it Take for a Stimulus to Be Salient?
116Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction
117SOLD: Sub-Optimal Low-rank Decomposition for Efficient Video Segmentation
118On the Appearance of Translucent Edges
119On Pairwise Costs for Network Flow Multi-Object Tracking
120Fine-Grained Recognition Without Part Annotations
121Robust Reconstruction of Indoor Scenes