ResearchHub | Open Science Community

Torsten Sattler

Author with expertise in Simultaneous Localization and Mapping

Achievements

Cited Author

Key Stats

Upvotes received:

Publications:

(47% Open Access)

Cited by:

5,582

h-index:

i10-index:

103

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

How is this calculated?

Publications

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features

Mihai Dusmanu et al.Jun 1, 2019

In this work we address the problem of finding reliable pixel-level correspondences under difficult imaging conditions. We propose an approach where a single convolutional neural network plays a dual role: It is simultaneously a dense feature descriptor and a feature detector. By postponing the detection to a later stage, the obtained keypoints are more stable than their traditional counterparts based on early detection of low-level structures. We show that this model can be trained using pixel correspondences extracted from readily available large-scale SfM reconstructions, without any further annotations. The proposed method obtains state-of-the-art performance on both the difficult Aachen Day-Night localization dataset and the InLoc indoor localization benchmark, as well as competitive performance on other benchmarks for image matching and 3D reconstruction.

Philosophy

Artificial Intelligence

Paper

Philosophy

790

Save

A Multi-view Stereo Benchmark with High-Resolution Images and Multi-camera Videos

Thomas Schöps et al.Jul 1, 2017

Motivated by the limitations of existing multi-view stereo benchmarks, we present a novel dataset for this task. Towards this goal, we recorded a variety of indoor and outdoor scenes using a high-precision laser scanner and captured both high-resolution DSLR imagery as well as synchronized low-resolution stereo videos with varying fields-of-view. To align the images with the laser scans, we propose a robust technique which minimizes photometric errors conditioned on the geometry. In contrast to previous datasets, our benchmark provides novel challenges and covers a diverse set of viewpoints and scene types, ranging from natural scenes to man-made indoor and outdoor environments. Furthermore, we provide data at significantly higher temporal and spatial resolution. Our benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images. We make our datasets and an online evaluation server available at http://www.eth3d.net.

Artificial Intelligence

Aerospace Engineering

Paper

Artificial Intelligence

607

Save

Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization

Torsten Sattler et al.Sep 20, 2016

Accurately determining the position and orientation from which an image was taken, i.e., computing the camera pose, is a fundamental step in many Computer Vision applications. The pose can be recovered from 2D-3D matches between 2D image positions and points in a 3D model of the scene. Recent advances in Structure-from-Motion allow us to reconstruct large scenes and thus create the need for image-based localization methods that efficiently handle large-scale 3D models while still being effective, i.e., while localizing as many images as possible. This paper presents an approach for large scale image-based localization that is both efficient and effective. At the core of our approach is a novel prioritized matching step that enables us to first consider features more likely to yield 2D-to-3D matches and to terminate the correspondence search as soon as enough matches have been found. Matches initially lost due to quantization are efficiently recovered by integrating 3D-to-2D search. We show how visibility information from the reconstruction process can be used to improve the efficiency of our approach. We evaluate the performance of our method through extensive experiments and demonstrate that it offers the best combination of efficiency and effectiveness among current state-of-the-art approaches for localization.

Artificial Intelligence

Aerospace Engineering

Paper

Artificial Intelligence

510

Save

Fast image-based localization using direct 2D-to-3D matching

Torsten Sattler et al.Nov 1, 2011

Recently developed Structure from Motion (SfM) reconstruction approaches enable the creation of large scale 3D models of urban scenes. These compact scene representations can then be used for accurate image-based localization, creating the need for localization approaches that are able to efficiently handle such large amounts of data. An important bottleneck is the computation of 2D-to-3D correspondences required for pose estimation. Current stateof- the-art approaches use indirect matching techniques to accelerate this search. In this paper we demonstrate that direct 2D-to-3D matching methods have a considerable potential for improving registration performance. We derive a direct matching framework based on visual vocabulary quantization and a prioritized correspondence search. Through extensive experiments, we show that our framework efficiently handles large datasets and outperforms current state-of-the-art methods.

Artificial Intelligence

Aerospace Engineering

Paper

Artificial Intelligence

478

Save

Image-Based Localization Using LSTMs for Structured Feature Correlation

Florian Walch et al.Oct 1, 2017

In this work we propose a new CNN+LSTM architecture for camera pose regression for indoor and outdoor scenes. CNNs allow us to learn suitable feature representations for localization that are robust against motion blur and illumination changes. We make use of LSTM units on the CNN output, which play the role of a structured dimensionality reduction on the feature vector, leading to drastic improvements in localization performance. We provide extensive quantitative comparison of CNN-based and SIFT-based localization methods, showing the weaknesses and strengths of each. Furthermore, we present a new large-scale indoor dataset with accurate ground truth from a laser scanner. Experimental results on both indoor and outdoor public datasets show our method outperforms existing deep architectures, and can localize images in hard conditions, e.g., in the presence of mostly textureless surfaces, where classic SIFT-based methods fail.

Paper

Save

Computer Vision and Pattern Recognition 2020

Zeynep Akata et al.Oct 15, 2021

Artificial Intelligence

Industrial And Manufacturing Engineering

Paper

Artificial Intelligence

422

Save

Understanding the Limitations of CNN-Based Absolute Camera Pose Regression

Torsten Sattler et al.Jun 1, 2019

Visual localization is the task of accurate camera pose estimation in a known scene. It is a key problem in computer vision and robotics, with applications including self-driving cars, Structure-from-Motion, SLAM, and Mixed Reality. Traditionally, the localization problem has been tackled using 3D geometry. Recently, end-to-end approaches based on convolutional neural networks have become popular. These methods learn to directly regress the camera pose from an input image. However, they do not achieve the same level of pose accuracy as 3D structure-based methods. To understand this behavior, we develop a theoretical model for camera pose regression. We use our model to predict failure cases for pose regression techniques and verify our predictions through experiments. We furthermore use our model to show that pose regression is more closely related to pose approximation via image retrieval than to accurate pose estimation via 3D structure. A key result is that current approaches do not consistently outperform a handcrafted image retrieval baseline. This clearly shows that additional research is needed before pose regression algorithms are ready to compete with structure-based methods.

Geology

Artificial Intelligence

Paper

Geology

333

Save

Comparative Evaluation of Hand-Crafted and Learned Local Features

Johannes Schönberger et al.Jul 1, 2017

Matching local image descriptors is a key step in many computer vision applications. For more than a decade, hand-crafted descriptors such as SIFT have been used for this task. Recently, multiple new descriptors learned from data have been proposed and shown to improve on SIFT in terms of discriminative power. This paper is dedicated to an extensive experimental evaluation of learned local features to establish a single evaluation protocol that ensures comparable results. In terms of matching performance, we evaluate the different descriptors regarding standard criteria. However, considering matching performance in isolation only provides an incomplete measure of a descriptors quality. For example, finding additional correct matches between similar images does not necessarily lead to a better performance when trying to match images under extreme viewpoint or illumination changes. Besides pure descriptor matching, we thus also evaluate the different descriptors in the context of image-based reconstruction. This enables us to study the descriptor performance on a set of more practical criteria including image retrieval, the ability to register images under strong viewpoint and illumination changes, and the accuracy and completeness of the reconstructed cameras and scenes. To facilitate future research, the full evaluation pipeline is made publicly available.

Artificial Intelligence

Paleontology

Paper

Artificial Intelligence

310

Save

Image Retrieval for Image-Based Localization Revisited

Torsten Sattler et al.Jan 1, 2012

To reliably determine the camera pose of an image relative to a 3D point cloud of a scene, correspondences between 2D features and 3D points are needed. Recent work has demonstrated that directly matching the features against the points outperforms methods that take an intermediate image retrieval step in terms of the number of images that can be localized successfully. Yet, direct matching is inherently less scalable than retrievalbased approaches. In this paper, we therefore analyze the algorithmic factors that cause the performance gap and identify false positive votes as the main source of the gap. Based on a detailed experimental evaluation, we show that retrieval methods using a selective voting scheme are able to outperform state-of-the-art direct matching methods. We explore how both selective voting and correspondence computation can be accelerated by using a Hamming embedding of feature descriptors. Furthermore, we introduce a new dataset with challenging query images for the evaluation of image-based localization.

Artificial Intelligence

Aerospace Engineering

Paper

Artificial Intelligence

305

Save

Semantic Visual Localization

Johannes Schönberger et al.Jun 1, 2018

Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, e.g., in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes.

Artificial Intelligence

Paleontology

Paper

Artificial Intelligence

284

Save