###### List of past seminars (follow links for details):

*2021-04-08* Yann Labbé **Pose estimation of rigid objects and articulated robots** *(**video)*

*2020-10-22* Angel Villar Corrales **Pose Based Image Retrieval in Greek Vase Paintings**

*2020-10-21* T. Werner, M. Kružík, Z. Kúkelová, V. Korotynskiy, A. Bellon, P. Trutman **Optimization Workshop** *(**video)*

*2020-10-01* Bastien Dechamps **Efficient Camera Pose Verification Via Neural Rendering** *(**video)*

*2020-08-06* Evgeniy Martyushev **Finding Polynomial Constraints by Sampling** *(video)*

*2020-06-25* Jan Tomešek **Visual Geo-Localization in Natural Environments**

*2020-05-21* Michal Vavrecka **Incognite research group – an overview** *(video)*

*2020-05-20* ELLIS against Covid-19 **Online Workshop on Covid19 V.**

*2020-05-07* Hugo Cisneros **Evolving Structures in Complex Systems** *(video)*

*2020-05-06* ELLIS against Covid-19 **Online Workshop on Covid19 IV.**

*2020-04-23* Michal Rolinek **Differentiation of Blackbox Combinatorial Solvers** *(video)*

*2020-04-22* ELLIS against Covid-19 **ELLIS 3rd Online Workshop on Covid19**

*2020-04-16* Tomas Pajdla **Lex Fridman: Deep Learning State of the Art 2020**

*2020-04-15* ELLIS against Covid-19 **ELLIS 2nd Online Workshop on Covid19** *(video)*

*2020-04-09* Luis Gomez Camara **Robust Long-term Autonomous Navigation of Robots in Challenging Environments** *(video)*

*2020-04-02* Kateryna Zorina **Reading Group on Policy Gradient Methods in Reinforcement Learning**

*2020-04-01* ELLIS against Covid-19 **ELLIS Online Workshop on Covid19** *(video)*

*2020-03-31* Kathlén Kohn **Minimal Problems in Computer Vision**

*2020-01-21* Luca Magri **Multiple structure recovery via clustering in preference space**

*2020-01-14* Pavel Trutman **Globally Optimal Solution to Inverse Kinematics of 7DOF Serial Manipulator**

*2019-12-05* Kathlén Kohn **Point-Line Minimal Problems for 3 Cameras with Partial Visibility**

*2019-12-04* Martin Bråtelund **Critical loci for reconstruction from two views**

*2019-10-01* Akihiro Sugimoto **Revisiting Depth Image Fusion with Variational Message Passing**

*2019-09-13* Ekaterina Zorina **Learning from demonstrations**

*2019-08-30* Margaret Regan **Image Reconstruction Using Numerical Algebraic Geometry**

*2019-08-29* Viktor Larsson **Geometric Estimation with Radial Distortion**

*2019-08-27* Andrew Pryhuber **Ideals of the Multiview Variety**

*2019-08-26* Torsten Sattler **Domain Adaptation vs. Semantic Invariance for Long-Term Visual Localization** *(slides)*

*2019-08-22* Teven Le Scao **Neural Differential Equations for image super-resolution**

*2019-07-18* Timothy Duff **Intro to homotopy continuation with a view towards minimal problems**

*2019-07-17* Kathlén Kohn **Point-Line Minimal Problems in Complete Multi-View Visibility**

*2019-07-16* Andrea Fussielo **Synchronisation: from pairwise measures to global values** *(slides)*

*2019-05-06* Horia-Mihai Bujanca **SLAMBench 3.0: Benchmarking beyond traditional Visual SLAM**

*2019-03-13* Mohab Safey El Din **Polar varieties, matrices and real root classification**

*2019-03-05* Simon Telen **Stabilized algebraic methods for numerical root finding**

*2019-01-11* Guillem Alenya **Challenges in robotics for human environments and specially for textile manipulation**

*2019-01-11* Kimotishi Yamazaki **Vision-Based Cloth Manipulation by Autonomous Robots**

*2018-10-25* David Fouhey **Understanding how to get to places and do things** *(video)*

*2018-10-24* Yuriy Kaminskyi **Semantic segmentation for indoor localization**

*2018-08-16* Viktor Larsson **Ortho-Perspective Epipolar Geometry, Optimal Trilateration and Time-of-Arrival**

*2018-07-24* Torsten Sattler **Challenges in Long-Term Visual Localization**

*2018-05-25* Alexei Efros **Self-supervision, Meta-supervision, Curiosity: Making Computers Study Harder** *(video)*

*2018-05-22* Di Meng **Cameras behind glass – polynomial constraints on image projection**

*2018-02-23* Oles Dobosevych **On Ukrainian characters MNIST dataset**

*2018-02-22* F. Bach, P.-Y. Massé, N. Mansard, J. Šivic, T. Werner *AIME@CZ – Czech Workshop on Applied Mathematics in Engineering*

*2017-12-07* Federica Arrigoni **Synchronization Problems in Computer Vision**

*2017-11-23* Antonín Šulc **Lightfield Analysis for non-Lambertian Scenes**

*2017-10-30* Jana Košecká **Semantic Understanding for Robot Perception**

*2017-10-19* Mircea Cimpoi **Deep Filter Banks for Texture Recognition**

*2017-10-18* Wolfgang Förstner **Evaluation of Estimation Results within Structure from Motion Problems**

*2017-10-17* Ludovic Magerand **Projective Structure-from-Motion and Rolling Shutter Pose Estimation**

*2017-09-25* Akihiro Sugimoto **Deeply Supervised 3D Recurrent FCN for Salient Object Detection in Videos**

*2017-08-31* Viktor Larsson **Building Polynomial Solvers for Computer Vision Applications**

*2017-08-25* Tomas Mikolov **Neural Networks for Natural Language Processing**

*2017-08-21* Torsten Sattler, Eric Brachmann, Ignacio Rocco *Workshop on learnable representations for geometric matching*

*2017-06-07* Joe Kileel **Using Computational Algebra for Computer Vision**

*2017-05-11* Torsten Sattler **Camera Localization**

#### AAG/IMPACT Online Seminar No. 012

Yann LabbéPose estimation of rigid objects and articulated robotsINRIA/ENS, Paris, FR Thursday 2021-04-08 at 10:00Online via Zoom: video |

**Abstract**

Accurately recovering the poses of multiple objects and robots in non-instrumented environments is an important problem to grant autonomous systems the ability to solve real tasks in-the-wild, especially in the context of collaborative robotics. In this talk, I will present our recent works on object and robot pose estimation from one or multiple uncalibrated RGB cameras. First, I will present CosyPose, our state-of-the-art method for single-view 6D pose estimation of rigid objects which won the BOP challenge at ECCV 2020. Second, I will present our multi-view approach that is designed to address the limitations inherent to single-view pose estimation. This multi-view approach significantly improves robustness and accuracy and is able to automatically process noisy or incomplete visual information from multiple cameras into a complete scene interpretation in near real time. Third, I will present our latest work on RoboPose, a method for recovering the 6D pose and the joint angles of an articulated robot from a single RGB image. Our method significantly improves the state-of-the-art for multiple commonly used robotic manipulators. It opens-up many exciting applications in visually guided manipulation or collaborative robotics without fiducial markers or time-consuming hand-eye calibration.

This talk is based on the following papers:

[1] Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic. CosyPose: Consistent multi-view multi-object 6D pose estimation. ECCV 2020.

[2] Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic. Single-view robot pose and joint angle estimation via render & compare. CVPR 2021 (Oral).

#### AAG/IMPACT Online Seminar No. 011

Angel Villar CorralesPose Based Image Retrieval in Greek Vase PaintingsFAU Erlangen-Nuremberg, Erlangen, Germany Thursday 2020-10-22 at 15:00Online via Zoom |

**Abstract**

Digital art collections are rapidly growing due to ongoing efforts in digitization of art. Nevertheless, navigation within these libraries often becomes a challenging and time-consuming task. In our work, we address the problem of retrieval and discovery in large artwork collections, namely of Greek vase paintings. We introduce an approach for automated image retrieval based on human pose similarity. More precisely, given a particular human pose, pose estimation and image retrieval techniques are used to find from a (possibly huge) library all images that contain a person with similar posture. However, human pose estimation in Greek vase paintings is a demanding task. State-of-the-art pose estimation methods fail to generalize to that particular painting style, and there are no datasets available containing annotated Greek vase paintings. Using style transfer techniques to generate labeled images with the style of ancient Greek vase paintings, we improve the performance of state-of-the-art human detection and pose estimation algorithms for our particular data. Our experimental results show that our methods outperform state-of-the-art models for the task of human pose estimation in Greek vase paintings. Furthermore, we show how efficient pose-based image retrieval is possible in large databases.

#### AAG/IMPACT Online Workshop No. 010

Organized by Tomáš Kroupa and Tomáš PajdlaOptimization WorkshopCzech Technical University, Prague, CZ Wednesday 2020-10-21, 9:30-15:30Online via Zoom: video |

**Program**

9:30-10:00 Tomáš Werner – Relative-interior Rule in Block-coordinate Descent

10:15-10:45 Martin Kružík – Optimal control problems with oscillations, concentrations and discontinuities

11:00-11:30 Zuzana Kúkelová – Making minimal solvers fast

13:30-14:00 Viktor Korotynskiy – Using Monodromy to Simplify Minimal Problems in Computer Vision

14:15-14:45 Antonio Bellon – On the Properties of Trajectories of Solutions to Parametric Semidefinite Programming

15:00-15:30 Pavel Trutman – Globally Optimal Solution to Inverse Kinematics of 7DOF Serial Manipulator

#### AAG/IMPACT Online Seminar No. 09

Bastien DechampsEfficient Camera Pose Verification Via Neural RenderingEcole des Ponts ParisTech, France Thursday 2020-10-01 at 11:00Online via Zoom: video |

**Abstract**

The recent visual localization pipeline InLoc proved that verifying the best estimated poses found by RANSAC has improved a lot the classification results. In this pose verification step, the 3D points are reprojected using the estimated camera pose to create a novel view of the scene, which is then compared with the query image. However, these renderings of the 3D scene lack realism and do not have the same semantic and appearance structure as the image which is to be localized. In this talk, we present some recent neural rendering techniques that enable generating realistic renderings of a scene from any viewpoint and under any appearance conditions. We use one of these approaches, Neural Rerendering in the Wild, to enhance the quality of InLoc renderings and improve the localization results.

#### AAG/IMPACT Online Seminar No. 08

Evgeniy MartyushevFinding Polynomial Constraints by SamplingSouth Ural State University, Chelyabinsk, Russia Thursday 2020-08-06 at 11:00Online via Zoom: video |

**Abstract**

In this talk, I will address the implicitization problem of converting the parameterization for a certain algebraic variety into its defining polynomial equations. In many cases, the brute force implicitization approach, based on the Groebner basis computation, fails due to the complexity of underlying variety. In this situation, we can use an alternative sampling based method to solve the implicitization problem, at least in part. I will briefly describe the sampling method and consider several examples of applying it to finding polynomial constraints for such well-known entities from multiview geometry as essential matrix, compatible triplet of essential matrices and calibrated trifocal tensor.

#### AAG/IMPACT Online Seminar No. 07

Jan TomešekVisual Geo-Localization in Natural EnvironmentsFIT Brno University of Technology, CZ Thursday 2020-06-25 at 11:00Online via Zoom |

**Abstract**

We will present our work-in-progress on localization of photographs captured in mountainous areas. In general, we approach this problem through deep learning and cast it as large-scale cross-modal image retrieval. In addition to standard challenges of general outdoor localization, such as changing appearance and scene geometry, there are other challenges in this scenario, such as large databases to be searched through and the lack of real photographs from natural environments to be used for training. We will present current results, starting from small synthetic experiments which provide some insight into the potential of localization in natural environments, followed by experiments using real photographs in combination with large databases.

#### AAG/IMPACT Online Seminar No. 06

Michal VavreckaIncognite research group – an overviewCIIRC Czech Technical University, Prague, CZ Thursday 2020-05-21 at 11:00Online via Zoom: video |

**Abstract**

The Incognite Research Group focuses on cognitive robotics, especially machine learning based cognitive architectures to control robot. I will present basic overview of the projects we are involved in and also out latest scientific results. The presentation is divided into 4 sections:

1. Edutainment robot – adaptation of Alquist chatbot to Czech langugage and its implementation to humanoid robot.

2. Visual question answering – how to parse long question to a compositional chain of operators that will answer it. Example of interpretable neural module networks.

3. Robotic manipulation based on intrinsic motivation – how to train manipulation tasks without any supervision from environment. Example of intrinsic rewards and goals.

4. Ciircgym – our environment for training robotic tasks in virtual simulator.

#### AAG/IMPACT Online Seminar No. ELLIS5

ELLIS against Covid-19Online Workshop on Covid19 V.European Lab for Learning and Intelligent Systems (ELLIS) Wednesday 2020-05-20 at 13:30Online via Youtube |

**Program**

https://ellis.eu/en/covid-19/events/ellis-against-covid-19-20-05-2020

**Abstract**

Our speakers Neil Lawrence (Cambridge University), Frank Noe (Freie Universität Berlin), Andrea Thorn (Würzburg Universität), Ahmed Alaa (UCLA) and Mark Calmiano (UCB Pharma) will tell us about their research on Covid-19.

This time we will also hear about the impact of Covid-19 on our academic life, from Carl-Johann Simon-Gabriel (ETH Zürich) and Shakir Mohamed (Google DeepMind) who will talk about conferences in the age of pandemics.

#### AAG/IMPACT Online Seminar No. 05

Hugo CisnerosEvolving Structures in Complex SystemsCIIRC CTU, Prague, CZ & INRIA/ENS, Paris, FR Thursday 2020-05-07 at 11:00Online (video) |

**Abstract**

Open-ended evolution is regarded as a promising way of solving complex tasks and could transform our idea of artificial intelligence. Complex systems with emergent properties of increasing complexity are a possible way of achieving this goal. The talk is about constructing a metric for measuring growth of complexity of emerging patterns in a particular class of complex systems: cellular automata. Approaches based on compression algorithms and artificial neural networks are investigated. With the metric, we were able to automatically construct computational models with properties similar to those found in Conway’s Game of Life, as well as many other emergent phenomena (IEEE SSCI, 2019). We further investigate the case of large-scale cellular automata; thanks to our reduction techniques that help visualize complex computations within those large systems, we identify interesting emergent behaviors at multiple scales (unpublished, submitted to ALife 2020).

#### AAG/IMPACT Online Seminar No. ELLIS4

ELLIS against Covid-19Online Workshop on Covid19 IV.European Lab for Learning and Intelligent Systems (ELLIS) Wednesday 2020-05-06 at 13:30Online via Youtube |

**Program**

https://ellis.eu/en/covid-19/events/ellis-against-covid-19-06-05-2020

**Abstract**

The world is facing a public health emergency caused by the Covid-19 pandemic. We all need to take this on, and science can make a major contribution. This online workshop will present projects on how to tackle Covid-19 using methods of machine learning and AI, carried out by leading international researchers.

Research topics include outbreak prediction, epidemiological modelling, drug development, viral and host genome sequencing, and health care management.

As many are interested in the topic or eager to contribute, the event will be open to the general public via livestreaming and recording.

#### AAG/IMPACT Online Seminar No. 04

Michal RolinekDifferentiation of Blackbox Combinatorial SolversMax Planck Institute for Intelligent Systems, Germany Thursday 2020-04-23 at 11:00Online video |

**Abstract**

Achieving fusion of deep learning with combinatorial algorithms promises transformative changes to artificial intelligence. One possible approach is to introduce combinatorial building blocks into neural networks. Such end-to-end architectures have the potential to tackle combinatorial problems on raw input data such as ensuring global consistency in multi-object tracking or route planning on maps in robotics. We present a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions. We provide both theoretical and experimental backing. In the talk, we will cover the description of the method including initial synthetic experiments (ICLR 2020), as well as two follow-ups; one on rank-based loss functions (may appear at CVPR 2020) and another regarding deep graph matching for keypoint correspondence.

#### AAG/IMPACT Online Seminar No. ELLIS3

ELLIS against Covid-19ELLIS 3rd Online Workshop on Covid19European Lab for Learning and Intelligent Systems (ELLIS) Wednesday 2020-04-22 at 13:30Online via Youtube |

**Video**

https://youtu.be/3VsKaiJB4Mc

**Program**

https://ellis.eu/en/covid-19/events/ellis-against-covid-19-22-04-2020

**Abstract**

The world is facing a public health emergency caused by the Covid-19 pandemic. We all need to take this on, and science can make a major contribution. This online workshop will present projects on how to tackle Covid-19 using methods of machine learning and AI, carried out by leading international researchers.

Research topics include outbreak prediction, epidemiological modelling, drug development, viral and host genome sequencing, and health care management.

As many are interested in the topic or eager to contribute, the event will be open to the general public via livestreaming and recording.

#### AAG/IMPACT Online Seminar No. 03

Tomas PajdlaLex Fridman: Deep Learning State of the Art 2020CIIRC Czech Technical University, Prague, CZ Thursday 2020-04-16 at 11:00Online |

**Abstract**

AAG/IMPACT Discussion of Salient Moments from “Lex Fridman: Deep Learning State of the Art 2020“.

#### AAG/IMPACT Online Seminar No. ELLIS2

ELLIS against Covid-19ELLIS 2nd Online Workshop on Covid19European Lab for Learning and Intelligent Systems (ELLIS) Wednesday 2020-04-15 at 13:30Online via Youtube |

**Video**

https://youtu.be/tK9109eAAcg

**Program**

https://ellis.eu/en/covid-19/events/ellis-against-covid-19-15-04-2020

**Abstract**

The world is facing a public health emergency caused by the Covid-19 pandemic. We all need to take this on, and science can make a major contribution. This online workshop will present projects on how to tackle Covid-19 using methods of machine learning and AI, carried out by leading international researchers.

Research topics include outbreak prediction, epidemiological modelling, drug development, viral and host genome sequencing, and health care management.

As many are interested in the topic or eager to contribute, the event will be open to the general public via livestreaming and recording.

#### AAG/IMPACT Online Seminar No. 02

Luis Gomez CamaraTowards Robust Long-term Autonomous Navigation of Robots in Challenging Environments: A Visual Place Recognition Deep Learning ApproachCIIRC Czech Technical University, Prague, CZ Thursday 2020-04-09 at 11:00Online video |

**Video**

mp4

**Abstract**

We present a visual place recognition (VPR) pipeline that achieves substantially better precision as compared with those commonly appearing in the literature. It is based on a standard image retrieval configuration, with an initial stage that shortlists the closest database candidates to a query image and a second stage where the list of candidates is re-ranked. The latter is realized by the introduction of a novel geometric verification procedure based on the activations of a pre-trained Convolutional Neural Network and is both simple and very robust to viewpoint and condition changes. As a stand-alone, spatial matching methodology, it could be easily added and used to enhance existing VPR approaches whose output is a ranked list of candidates. The VPR pipeline has been implemented in a teach-and-repeat navigation system to both localize and control the steering of a robot. Indoor test show a maximum error of less than 10 cm and excellent robustness to perturbations such as drastic changes in illumination, lateral displacements, different starting positions or even kidnapping.

#### AAG/IMPACT Online Seminar No. RL2

Kateryna ZorinaReading Group on Policy Gradient Methods in Reinforcement LearningCIIRC Czech Technical University, Prague, CZ Thursday 2020-04-02 at 11:00Online via Zoom |

**Reading materials for the reading group**

Vanilla policy gradient

TRPO

PPO

#### AAG/IMPACT Online Seminar No. ELLIS1

ELLIS against Covid-19ELLIS Online Workshop on Covid19European Lab for Learning and Intelligent Systems (ELLIS) Wednesday 2020-04-01 at 13:30-17:00Online via Youtube |

**Video**

https://youtu.be/0jg_NNwF7k4

**Program**

https://ellis.eu/en/pages/covid-19

**Abstract**

The world is facing a public health emergency caused by the Covid-19 pandemic. We all need to take this on, and science can make a major contribution. This online workshop will present projects on how to tackle Covid-19 using methods of machine learning and AI, carried out by leading international researchers.

Research topics include outbreak prediction, epidemiological modelling, drug development, viral and host genome sequencing, and health care management.

As many are interested in the topic or eager to contribute, the event will be open to the general public via livestreaming and recording.

#### AAG/IMPACT Online Seminar No. 01

Kathlén KohnMinimal Problems in Computer VisionKTH Royal Institute of Technology, Stockholm Tuesday 2020-03-31 at 17:00Online via Zoom |

**Abstract**

We present a complete classification of minimal problems for generic arrangements of points and lines in space observed partially by three calibrated perspective cameras when each line is incident to at most one point. This is a large class of interesting minimal problems that allows missing observations in images due to occlusions and missed detections. There is an infinite number of such minimal problems; however, we show that they can be reduced to 140616 equivalence classes by removing superfluous features and relabeling the cameras. We also introduce camera-minimal problems, which are practical for designing minimal solvers, and show how to pick a simplest camera-minimal problem for each minimal problem. This simplification results in 74575 equivalence classes. Only 76 of these were known; the rest are new. In order to identify problems that have potential for practical solving of image matching and 3D reconstruction, we present several smaller natural subfamilies of camera-minimal problems as well as compute solution counts for all camera-minimal problems which have less than 300 solutions for generic data.

**Paper**

Timothy Duff, Kathlén Kohn, Anton Leykin, Tomas Pajdla: PL1P — Point-line Minimal Problems under Partial Visibility in Three Views

#### Seminar No. 36

Luca MagriMultiple structure recovery via clustering in preference spacePolitecnico di Milano, Italy Tuesday 2020-01-21 at 14:30CIIRC Room B-670 (building B, floor 6) |

**Abstract**

Many tasks in empirical sciences can be formulated in terms of robust estimation of multiple parametric models that fit data corrupted by noise and outliers. Typical examples can be found in 3D reconstruction, where multi-model fitting is employed either to estimate multiple rigid moving objects, or to produce intermediate geometric interpretations of reconstructed 3D point clouds.

Other scenarios include face clustering, body-pose estimation and motion segmentation, just to name a few. In all these cases, this turns to be a thorny problem since it is necessary to overcome a “chicken-&-egg dilemma”: to estimate models one needs to first partition the data, and to partition the data it is necessary to know which model points belong to. According to which horn of this dilemma is addressed first, two main approaches can be singled out, namely consensus and preference analysis.

Consensus-based algorithms put the emphasis on the estimation part and the focus is on finding models that describe as many points as possible. In contrast, preference approaches concentrate on the segmentation side, and are aimed at finding a proper partition of the data in meaningful structures.

In this talk, we will see how the change of perspective from consensus to preference allows to derive a conceptual space where the multi model fitting task can be conveniently formulated as a clustering problem.

#### Seminar No. 35

Pavel TrutmanGlobally Optimal Solution to Inverse Kinematics of 7DOF Serial ManipulatorCIIRC Czech Technical University, Prague, CZ Tuesday 2020-01-14 at 13:45CIIRC Room B-670 (building B, floor 6) |

**Abstract**

The Inverse Kinematics (IK) problem is to find robot control parameters to bring it into the desired position under the kinematics and collision constraints. We present a global solution to the optimal IK problem for a general serial 7DOF manipulator with revolute joints and a quadratic polynomial objective function. We show that the kinematic constraints due to rotations can all be generated by second degree polynomials. This is important since it significantly simplifies further step where we find the optimal solution by Lasserre relaxations of a non-convex polynomial systems. We demonstrate that the second relaxation is sufficient to solve 7DOF IK problem. Our approach is certifiably globally optimal. We demonstrate the method on 7DOF KUKA LBR IIWA manipulator and show that we are able to compute the optimal IK or certify in-feasibility in 99.9% tested poses.

#### Seminar No. 34

Kathlén KohnPoint-Line Minimal Problems for 3 Cameras with Partial VisibilityKTH Stockholm, Sweden Thursday 2019-12-05 at 11:00CIIRC Room B-671 (building B, floor 6) |

**Abstract**

We present a complete classification of all minimal problems for generic arrangements of points and lines observed by three calibrated perspective cameras. Our classification includes all possible settings where some of the cameras only see some of the available points and lines. Using basic tools from algebraic geometry, we first find 143494 candidates for minimal point-line problems. Afterwards we determine algorithmically which of these candidates are in fact minimal. We find that only 5707 of our candidates are non-minimal. Hence, we conclude that there are 137787 point-line minimal problem for three calibrated cameras. For each minimal problem we aim to compute the generic number of solutions as it captures the difficulty of the problem at hand. This is joint work with Timothy Duff, Anton Leykin, and Tomas Pajdla.

#### Seminar No. 33

Martin BråtelundCritical loci for reconstruction from two viewsUniversity of Oslo, Norway Wednesday 2019-12-04 at 11:00CIIRC Room B-671 (building B, floor 6) |

**Abstract**

In general, when given a set of images and sufficiently many point correspondences, it’s possible to reconstruct the 3D object uniquely from these images. There are, however, some cases where such a reconstruction is not unique, these are called critical configurations. We will show that all critical configurations consist of cameras and points lying on ruled quadric surfaces, and give a classification of all critical configurations for two cameras. We will also show how the different possible reconstructions are related. This work is largely based on previous work by Richard Hartley and Fredrik Kahl.

#### Seminar No. 32

Akihiro SugimotoRevisiting Depth Image Fusion with Variational Message PassingNational Institute of Informatics (NII), Japan Tuesday 2019-10-01 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

The running average approach has long been perceived as the best choice for fusing depth measurements captured by a consumer-grade RGB-D camera into a global 3D model. This strategy, however, assumes exact correspondences between points in a 3D model and points in the captured RGB-D images. Such assumption does not hold true in many cases because of errors in motion tracking, noise, occlusions, or inconsistent surface sampling during measurements. Accordingly, reconstructed 3D models suffer unpleasant visual artifacts. In this talk, we visit the depth fusion problem from a probabilistic viewpoint and formulate it as a probabilistic optimization using variational message passing in a Bayesian network. Our formulation enables us to fuse depth images robustly, accurately, and fast for high quality RGB-D keyframe creation, even if exact point correspondences are not always available. Our formulation also allows us to smoothly combine depth and color information for further improvements without increasing computational speed. The quantitative and qualitative comparative evaluation on built keyframes of indoor scenes show that our proposed framework achieves promising results for reconstructing accurate 3D models.

#### Seminar No. 31 (internal)

Ekaterina ZorinaLearning from demonstrationsUkrainian Catholic University, Lviv, Ukraine Friday 2019-09-13 at 10:30CIIRC Room B-633 (building B, floor 6) |

**Abstract**

In this talk, I will present work that we did for learning from demonstration task. Our final goal is to learn the shoveling task (shovel sand into wheelbarrow) from video-demonstrations. We’ve been working on it for 2 weeks so far and for now we simplified video-demonstrations to virtual reality demonstrations. We learn the shovel movement policy with reinforcement learning. In the future we plan to control the shovel with robotic arm.

#### Seminar No. 30 (AAG/IMPACT Computer Vision / Algebraic Geometry week 26-30 Aug 2019)

Margaret ReganImage Reconstruction Using Numerical Algebraic GeometryUniversity of Notre Dame, IN, USA Friday 2019-08-30 at 10:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

Many problems in computer vision can be formulated using a parameterized system of polynomials which must be solved quickly and efficiently for given instances of the parameters. We propose a new numerical algebraic geometric method to efficiently solve these systems. First, our new approach uses locally adaptive methods and sparse matrix calculations to solve parameterized overdetermined systems in projective space. Examples will be provided in 2D image reconstruction to compare the new methods with traditional approaches in numerical algebraic geometry. Second, we propose new homotopy continuation methods for solving two minimal trifocal calibrated relative pose problems defined by point and line correspondences, which appear together, e.g., in urban scenes or observing curves. Simulations and comparisons will be shown using real and synthetic data to demonstrate that challenging scenes can be reconstructed where standard methods fail.

#### Seminar No. 29 (AAG/IMPACT Computer Vision / Algebraic Geometry week 26-30 Aug 2019)

Viktor LarssonGeometric Estimation with Radial DistortionETH Zurich, CH Thursday 2019-08-29 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

While many modern cameras can be approximated as a pinhole camera, more complicated models are necessary to achieve higher quality reconstruction and localization results. For most cameras the modeling error is dominated by radial distortion. This distortion is a non-linear warping of the image plane and this extra non-linearity makes geometric estimation problems more difficult. In this talk I will present two recent papers about dealing with radial distortion in geometric vision. We will look at two fundamental problems; two-view triangulation and absolute camera pose estimation. I will also briefly present some currently ongoing work related to calibrated radial multiple-view geometry.

#### Seminar No. 28 (AAG/IMPACT Computer Vision / Algebraic Geometry week 26-30 Aug 2019)

Andrew PryhuberIdeals of the Multiview VarietyUniversity of Washington, Seattle, WA, USA Tuesday 2019-08-27 at 15:30CIIRC Room B-670 (building B, floor 6) |

**Abstract**

The problem of reconstructing a 3D point from image data can be posed as minimizing Euclidean distance to the set of all images. We describe all polynomials that vanish on the space of images and relate them to the well-known bifocal and trifocal constraints. We briefly discuss recent work which imposes inequality constraints to force imaged points to have positive depth in each camera.

#### Seminar No. 27 (AAG/IMPACT Computer Vision / Algebraic Geometry week 26-30 Aug 2019)

Torsten SattlerDomain Adaptation vs. Semantic Invariance for Long-Term Visual LocalizationChalmers University of Technology, Gothenburg, Sweden Monday 2019-08-26 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Presentation**

Slides (pdf)

**Abstract**

Visual localization is the problem of estimating the 6 degree-of-freedom camera pose from which a given image was taken with respect to the scene. Localization is an important subsystem in interesting Computer Vision / AI applications such as autonomous robots (self-driving cars, drones, etc.) and Augmented / Mixed / Virtual Reality. Long-term visual localization deals with the problem that both the appearance and the geometry of scenes changes over time. For example, furniture in indoor scenes is moved around and vegetation in outdoor scenes changes significantly with the seasons. A direct result of the dynamic nature of real-world scenes is that scene representations quickly become outdated. The issue with outdated scene representations is that it is hard to associate current images with data stored in the scene representations. Yet, such data associations are required for successful camera pose estimation. In this paper, we explore two paradigms that can be used to localize images under strong changes in the viewing conditions: 1) Rather than designing representations, e.g., in the form of local features, that are invariant / robust to changes in the scene, domain adaption (for example in the form of generative neural networks) can be used to transform the current viewing conditions to a state close to the conditions under which the scene representation was constructed. 2) Rather than trying to predict how scenes evolve over time, invariant representations try to encapsulate the gist of a scene. In this talk, we consider semantic invariance, i.e., a fact that the semantic meaning of a scene should be invariant to seasonal and illumination changes. We will discuss how both approaches can be used as part of visual localization systems and discuss our current work in both directions.

#### Seminar No. 26

Teven Le ScaoNeural Differential Equations for image super-resolutionCarnegie Mellon University, Pittsburgh, PA, USA Thursday 2019-08-22 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

Neural Differential Equations are a machine learning framework that aims to combine numerical efficiency techniques from differential equation solvers with the flexibility of deep learning. Although they’ve recently received a lot of interest both in the machine learning and physical sciences communities, they’ve so far only been tested on toy problems. Image super-resolution is an interesting use case of this technique as it attempts to continuously transform the input signal ; we compare classical and differential systems on that task to gauge the potential of that technique, and study the impact of a few optimisation tricks for the differential one.

#### Seminar No. 25

Timothy DuffIntro to homotopy continuation with a view towards minimal problemsGeorgia Institute of Technology, GA, USA Thursday 2019-07-18 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Homotopy continuation is a versatile numerical method for solving polynomial systems of equations. It is the main subroutine in a field known as numerical algebraic geometry, which aims to describe the solution set of an arbitrary system in terms of certain generic “witness points.” It also has an emerging role in several applications. I will introduce the “how” and “why” of homotopy continuation, survey specialized solution techniques, and briefly explain their use in the study of 3d relative pose reconstruction.

#### Seminar No. 24

Kathlén KohnPoint-Line Minimal Problems in Complete Multi-View VisibilityUniversity of Oslo, Norway Wednesday 2019-07-17 at 14:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

We present a complete classification of all minimal problems for generic arrangements of points and lines completely observed by calibrated perspective cameras. We show that there are only 30 minimal problems in total, no problems exist for more than 6 cameras, for more than 5 points, and for more than 6 lines. For all minimal problems discovered, we present their algebraic degrees, i.e. the number of solutions, which measure their intrinsic difficulty. Our classification shows that there are many interesting new minimal problems. Our results also show how exactly the difficulty of problems grows with the number of views. Importantly, we discovered several new minimal problems with small degrees that might be practical in image matching and 3D reconstruction. This is joint work with Timothy Duff, Anton Leykin, and Tomas Pajdla.

#### Seminar No. 23

Andrea FussieloSynchronisation: from pairwise measures to global valuesUniversity of Udine, Italy Tuesday 2019-07-16 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Presentation**

Slides (pdf)

**Abstract**

In this talk I will give an overview of some problems that take the following general form: given a graph where edge labels corresponds to noisy measures of the ratio of the unknown labels of adjacent vertices, find the vertex labels. These are called “synchronization” problems. I will focus in particular on instances that are relevant in the computer vision field, namely rotation synchronization and translation synchronisation. In the end I will also touch upon localisation from bearings and applications to structure from motion.

#### Seminar No. 22

Horia-Mihai BujancaSLAMBench 3.0: Benchmarking beyond traditional Visual SLAMThe University of Manchester, United Kingdom Monday 2019-05-06 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

As the SLAM research area matures and the number of SLAM systems available increases, the need for frameworks that can objectively evaluate them against prior work grows. This new version of SLAMBench moves beyond traditional visual SLAM, and provides new support for scene understanding and non-rigid environments (dynamic SLAM). More concretely for dynamic SLAM, SLAMBench 3.0 includes the first publicly available implementation of DynamicFusion, along with an evaluation infrastructure. In addition, we include two SLAM systems (one dense, one sparse) augmented with convolutional neural networks for scene understanding, together with datasets and appropriate metrics.

#### Seminar No. 21

Mohab Safey El DinPolar varieties, matrices and real root classificationPolSys (Polynomial Systems), INRIA / CNRS / Sorbonne Université, France Wednesday 2019-03-13 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Solving polynomial systems with parameters is a topical issue in effective real algebraic geoemtry. One challenging task is to compute semi-algebraic descriptions of areas of the parameters’space over which the number of real solutions to the input system is invariant. In this talk, I will present a new algorithm for solving this problem. It makes intensive use of combinatorial properties of Grobner bases combined with the notion of polar varieties. Joint work with J.-C. Faugère and P. Le.

#### Seminar No. 20

Simon TelenStabilized algebraic methods for numerical root findingKU Leuven, Belgium Tuesday 2019-03-05 at 14:00CIIRC Seminar Room B-641, floor 6 of Building B |

**Abstract**

We consider the problem of finding the isolated points defined by an ideal in a ring of (Laurent) polynomials with complex coefficients. Algebraic approaches for solving this use rewriting techniques modulo the ideal to reduce the problem to a univariate root finding or eigenvalue problem. We introduce a general framework for algebraic solvers in which it is possible to stabilize the computations in finite precision arithmetic. The framework is based on truncated normal forms (TNFs), which generalize Groebner and border bases. The stabilization comes from a ‘good’ choice of basis for the quotient algebra of the ideal and from compactification of the solution space.

#### Seminar R4I No. 3

Guillem AlenyaVision-Based Cloth Manipulation by Autonomous RobotsUniversitat Polytechnica de Catalunya, Spain Friday 2019-01-11 at 10:15CIIRC Seminar Room B-633, floor 6 of Building B |

**Abstract**

The Perception and Manipulation at IRI (Institute of Robotics) group focuses on enhancing the perception, learning, and planning capabilities of robots to achieve higher degrees of autonomy and user-friendliness during everyday manipulation tasks. Some topics addressed are the geometric interpretation of perceptual information, construction of 3D object models, action selection and planning, reinforcement learning, and teaching by demonstration. We will discuss challenges and current developments primarily in the inclusion of robots in everyday environments, and in the manipulation of textiles.

#### Seminar R4I No. 2

Kimotishi YamazakiVision-Based Cloth Manipulation by Autonomous RobotsShinshu University, Faculty of Engineering, Nagano, Japan Friday 2019-01-11 at 9:30CIIRC Seminar Room B-633, floor 6 of Building B |

**Abstract**

In this talk, we will introduce topics about manipulation of cloth by autonomous robots. Cloth is a deformable object, and its shape is drastically changed by adding manipulation. We will mainly explain sensor information processing, knowledge representation, and recognition methods to successfully manipulate such object.

#### Seminar No. 19

David FouheyUnderstanding how to get to places and do thingsUniversity of Michigan, MI, USA Thursday 2018-10-25 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Video**

https://www.youtube.com/watch?v=99Mep3Cw9-Q

**Abstract**

What does it mean to understand an image or video? One common answer in computer vision has been that understanding means naming things: this part of the image corresponds to a refrigerator and that to a person, for instance. While important, this ability is not enough: humans can effortlessly reason about the rich world that images depict and what they can do in it. For example, if a friend shows you the way to their kitchen for you to get something, they won’t worry that you’ll get lost walking back (navigation) or that you’d have trouble figuring out how to open their refrigerator or cabinets. While both are an ordinary feat for humans (or even a dog or cat), they are currently far beyond the abilities of computers.

In my talk, I’ll discuss my efforts towards bridging this gap. In the first part, I’ll discuss the task of navigation, getting from one place to another. In particular, our goal is to take a single demonstration of a path and retrace it, either forwards or backwards, under noisy actuation and a changing environment. Rather than build an explicit model of the world, we learn a network that attends to a sequence of memories in order to make decisions. In the second part, I will discuss how to scalably gather data of humans interacting with the world, resulting in a new dataset of human interactions, VLOG, as well as and what we can learn from this data.

**Bio**

David Fouhey is starting as an assistant professor at the University of Michigan in January 2019 and is currently a visitor at INRIA Paris. His research interests include computer vision and machine learning, with a particular focus on scene understanding. He received a Ph.D. in robotics in 2016 from Carnegie Mellon University where he was supported by NSF and NDSEG fellowships, and was then a postdoctoral fellow at UC Berkeley. He has spent time at the University of Oxford’s Visual Geometry Group and at Microsoft Research. More information is here: http://web.eecs.umich.edu/~fouhey/

#### Seminar No. 18

Yuriy KaminskyiSemantic segmentation for indoor localizationUkrainian Catholic University, Lviv, Ukraine Wednesday 2018-10-24 at 16:00CIIRC IMPACT Room B-641, floor 6 of Building B |

**Abstract**

The seminar will be a progress report on the ongoing indoor localization and navigation project. It will briefly cover the problem and its motivation. The main goal of the talk is to show different approaches to segmentation (both instance and semantic) and approaches that may help to improve the existing solutions. The talk will also cover different segmentation methods and present their results on the InLoc dataset.

#### Seminar No. 17

Viktor LarssonOrthographic-Perspective Epipolar Geometry, Optimal Trilateration and Non-Linear Variable Projection for Time-of-ArrivalLund University, Sweden Thursday 2018-08-16 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

In this three part talk, I will briefly discuss some current work in progress. The first topic relates to the epipolar geometry of one perspective camera and one orthographic, with applications in RADAR-to-Camera calibration. The second part is about position estimation using distances to known 3D points. Finally, I will discuss applying the variable projection method to the non-separable time-of-arrival problem. Preliminary experiments show greatly improved convergence compared to both joint and alternating optimization methods.

#### Seminar No. 16

Torsten SattlerChallenges in Long-Term Visual LocalizationETH Zurich, CH Tuesday 2018-07-24 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Visual localization is the problem of estimating the position and orientation from which an image was taken with respect to a 3D model of a known scene. This problem has important applications, including autonomous vehicles (including self-driving cars and other robots) and augmented / mixed / virtual reality. While multiple solutions to the visual localization problem exist both in the Robotics and Computer Vision communities for accurate camera pose estimation, they typically assume that the scene does not change over time. However, this assumption is often invalid in practice, both in indoor and outdoor environments. This talk thus briefly discusses the challenges encountered when trying to localize images over a longer period of time. Next, we show how a combination of 3D scene geometry and higher-level scene understanding can help to enable visual localization in conditions where both classical and recently proposed learning-based approaches struggle.

**Bio**

Torsten Sattler received a PhD in Computer Science from RWTH Aachen University, Germany, in 2013 under the supervision of Prof. Bastian Leibe and Prof. Leif Kobbelt. In December 2013, he joined the Computer Vision and Geometry Group of Prof. Marc Pollefeys at ETH Zurich, Switzerland, where he currently is a senior researcher and Marc Pollefeys’ deputy while Prof. Pollefeys is on leave from ETH. His research interests include (large-scale) image-based localization using Structure-from-Motion point clouds, real-time localization and SLAM on mobile devices and for robotics, 3D mapping, Augmented & Virtual Reality, (multi-view) stereo, image retrieval and efficient spatial verification, camera calibration and pose estimation. Torsten has worked on dense sensing for self-driving cars as part of the V-Charge project. He is currently involved in enabling semantic SLAM and re-localization for gardening robots (as part of a EU Horizon 2020 project where he leads the efforts on a workpackage), research for Google’s Tango project, where he leads CVG’s research efforts, and in work on self-driving cars.

#### Seminar No. 15

Alexei EfrosSelf-supervision, Meta-supervision, Curiosity: Making Computers Study HarderUC Berkeley, CA, USA Friday 2018-05-25 at 11:00CIIRC Seminar Room A-1001, floor 10 of Building A |

**Video**

https://www.youtube.com/watch?v=_V-WpE8cmpc

**Abstract**

Computer vision has made impressive gains through the use of deep learning models, trained with large-scale labeled data. However, labels require expertise and curation and are expensive to collect. Even worse, direct semantic supervision often leads the learning algorithms “cheating” and taking shortcuts, instead of actually doing the work. In this talk, I will briefly summarize several of my group’s efforts to combat this using self-supervision, meta-supervision, and curiosity — all ways of using the data as its own supervision. These lead to practical applications in image synthesis (such as pix2pix and cycleGAN), image forensics, audio-visual source separation, etc.

**Bio**

Alexei Efros is a professor of Electrical Engineering and Computer Sciences at UC Berkeley. Before 2013, he was nine years on the faculty of Carnegie Mellon University, and has also been affiliated with École Normale Supérieure/INRIA and University of Oxford. His research is in the area of computer vision and computer graphics, especially at the intersection of the two. He is particularly interested in using data-driven techniques to tackle problems where large quantities of unlabeled visual data are readily available. Efros received his PhD in 2003 from UC Berkeley. He is a recipient of the Sloan Fellowship (2008), Guggenheim Fellowship (2008), SIGGRAPH Significant New Researcher Award (2010), 3 Helmholtz Test-of-Time Prizes (1999, 2003, 2005), and the ACM Prize in Computing (2016).

**Web**

https://www.ciirc.cvut.cz/alexei-efros-na-ciirc/

#### Seminar No. 14

Di MengCameras behind glass – polynomial constraints on image projectionUniversity of Burgundy, France Tuesday 2018-05-22 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Advanced Driver Assistance System (ADAS) is used for autonomous car and self-driving cars. Cameras in the system are commonly used to achieve functions such as pedestrian detection, guideboard detection or obstacle avoidance ect. They are calibrated but equipped inside the car behind the windshield. The windshield has the effect of refracting light rays which can cause disparity when mapping the points in space onto the pixels in image. Little disparity in pixel wise would be large errors meters away. So that we model the camera with three shapes of windshield to get a more precise camera calibration for automotive application.

The talk is divided into two main sections. The first section presents the optical model of three types of glass slab, which are planar slab, non-parallel surfaces slab and spherical slab. Polynomial projection equations are formulated. The second section takes the camera parameters into account and provides the process how to map the points in space onto the pixels in image with windshield in between.

#### Seminar No. 13

Oles DobosevychOn Ukrainian characters MNIST datasetUkrainian Catholic University, Lviv, Ukraine Friday 2018-02-23 at 14:00CIIRC Seminar Room B-671, floor 6 of Building B |

**Abstract**

Modified National Institute of Standards and Technology dataset (MNIST dataset) of handwritten digits is the most known dataset that is widely used as a benchmark for validating various ideas in Machine Learning. We present a newly created dataset of 32 handwritten Ukrainian letters, which is divided into 72 different style subclasses, with 2000 examples in each class. We also suggest a recognition model for these symbols and explain why approaches working well for MNIST dataset do not succeed in our case. Finally, we discuss several real-world applications of our model that can help to save paper, time and money.

#### Seminar No. 12

**AIME@CZ – Czech Workshop on Applied Mathematics in Engineering**

Organized by Didier Henrion and Tomáš Pajdla

**Thursday 2018-02-22, 9:45-18:00**

CIIRC Seminar Room **B-670 (9:45-13:30) and B-671 (14:30-16:00)**, floor 6 of Building B

- 9:45-10:30 Francis Bach (Inria/ENS Paris, FR)
**Linearly-convergent stochastic gradient algorithms** - 11:00-11:45 Pierre-Yves Massé
**Online Optimisation of Time Varying Systems** - 12:15-13:00 Nicolas Mansard (LAAS-CNRS Univ. Toulouse, FR)
**Why we need a memory of movement** - 14:30-15:00 Josef Šivic (Inria/ENS Paris, FR and CIIRC CTU Prague, CZ)
**Joint Discovery of Object States and Manipulation Actions** - 15:15-15:45 Tomáš Werner (CTU Prague, CZ)
**Solving LP Relaxations of Some Hard Problems Is Hard** - 16:30-18:00
*Demos and visit of CIIRC*

Title: Linearly-convergent stochastic gradient algorithms |

Speaker: Francis Bach |

Title: Online Optimisation of Time Varying Systems |

Speaker: Pierre-Yves Massé |

Dynamical systems are a wide ranging framework which may model time varying settings, spanning from engineering (e.g., cars) to machine learning (e.g., recurrent neural networks), for instance. The correct behaviour of these systems is often dependent on the choice of a parameter (e.g., the gear ratio or the wheel in the case of cars, or the weights in the case of neural networks) which the user has to choose. Finding the best possible parameter is called optimising, or training, the system.Abstract:Many real life issues require this training to occur online, with immediate processing of the inputs received by the system (e.g. the returns about the surroundings of the sensors of a car, or the successive frames of a video fed to a neural network). We present a proof of convergence for classical online optimisation algorithms used to train these systems, such as the “Real Time Recurrent Learning” (RTRL) or “Truncated Backpropagation Through Time” (TBTT) algorithms. These algorithms avoid time consuming computations by storing information about the past, in the form of a time dependent tensor. However, the memory required to do so may be huge, preventing their use on even moderately large systems. The “No Back Track” (NBT) algorithm, and its implementation friendly “Unbiased Online Recurrent Optimisation” (UORO) variant are general principle algorithms which approximate the aforementioned tensor by a random, rank-one, unbiased tensor, thus decisively reducing the storage costs but preserving the crucial unbiasedness property allowing convergence. We prove that, with arbitrarily large propability, the NBT algorithm converges to the same local optimum as the RTRL or TBTT algorithms. We might conclude by quickly presenting the “Learning the Learning Rate” (LLR) algorithm, which adapts online the step size of a gradient descent, by conducting a gradient descent on this very step. It thus reduces the sensitivity of the descent to the numerical choice of the step size, which is a well documented practical implementation issue. |

Title: Why we need a memory of movement |

Speaker: Nicolas Mansard |

Title: Joint Discovery of Object States and Manipulation Actions |

Speaker: Josef Šivic (Inria/ENS Paris, FR and CIIRC CTU Prague, CZ) |

Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identify object states and to localize state-modifying actions. Our model is formulated as a discriminative clustering cost with constraints. We assume a consistent temporal order for the changes in object states and manipulation actions, and introduce new optimization techniques to learn model parameters without additional supervision. We demonstrate successful discovery of manipulation actions and corresponding object states on a new dataset of videos depicting real-life object manipulations. We show that our joint formulation results in an improvement of object state discovery by action recognition and vice versa.Abstract:Joint work with Jean-Baptiste Alayrac, Ivan Laptev and Simon Lacoste-Julien. |

Title: Solving LP Relaxations of Some Hard Problems Is Hard |

Speaker: Tomas Werner (CTU Prague, CZ) |

I will present our result that solving linear programming (LP) relaxations of a number of classical NP-hard combinatorial optimization problems (set cover/packing, facility location, maximum satisfiability, maximum independent set, multiway cut, 3-D matching, weighted CSP) is as hard as solving the general LP problem. Precisely, these LP relaxations are LP-complete under (nearly) linear-time reductions, assuming sparse encoding of instances. In polyhedral terms, this means that every polytope is a scaled coordinate projection of the optimal set of each LP relaxation, computable in (nearly) linear time. For some of the LP relaxations (exact cover, 3-D matching, weighted CSP), a stronger result holds: every polytope is a scaled coordinate projection of their feasible set, which implies that the corresponding reduction is approximation-preserving. Besides, the considered LP relaxations are P-complete under log-space reductions, therefore also hard to parallelize. These results pose a fundamental limitation on designing very efficient algorithms to compute exact or even approximate solutions to the LP relaxations, because finding such an algorithm might improve the complexity of the best known general LP solvers, which is unlikely.Abstract:Joint work with Daniel Prusa. |

#### Seminar R4I No. 1

Federica ArrigoniSynchronization Problems in Computer VisionUniversity of Udine, Italy Thursday 2017-12-07 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Consider a network of nodes where each node is characterized by an unknown state, and suppose that pairs of nodes can measure the ratio (or difference) between their states. The goal of “synchronization” is to infer the unknown states from the pairwise measures. Typically, states are represented by elements of a group, such as the Symmetric Group or the Special Euclidean Group. The former can represent local labels of a set of features, which refer to the multi-view matching application, whereas the latter can represent camera reference frames, in which case we are in the context of structure from motion, or local coordinates where 3D points are represented, in which case we are dealing with multiple point-set registration. A related problem is that of “bearing-only network localization” where each node is located at a fixed (unknown) position in 3-space and pairs of nodes can measure the direction of the line joining their locations. We are interested in global techniques where all the measures are considered at once, as opposed to incremental approaches that grow a solution by adding pieces iteratively.

#### Seminar No. 11

Antonín ŠulcLightfield Analysis for non-Lambertian ScenesUniversity of Konstanz, Germany Thursday 2017-11-23 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

In most natural scenes, we can see objects composed of non-Lambertian materials, whose appearance changes if we change viewpoint. In many computer vision tasks, we consider these as something undesirable and treat them as outliers. However, if we can record data from multiple dense viewpoints, such as with a light-field camera, we have a chance to not only deal with them but also extract additional information about the scene.

In this talk, I will show the capabilities of the light-field paradigm on various problems. Key ideas are a linear algorithm for structure from motion to generate refocusable panoramas and depth estimation for multi-layered objects which are semitransparent or partially reflective. Using these, I will show that we can decompose such scenes and further perform a robust volumetric reconstruction. Finally, I will consider decomposition of light fields into reflectance, natural illumination and geometry, a problem known as inverse rendering.

#### Seminar No. 10

Jana KošeckáSemantic Understanding for Robot PerceptionGeorge Mason University, Fairfax, VA, USA Monday 2017-10-30 at 16:00 Czech Technical University, Karlovo namesti, G-205 |

**Abstract**

Advancements in robotic navigation, mapping, object search and recognition rest to a large extent on robust, efficient and scalable semantic understanding of the surrounding environment. In recent years we have developed several approaches for capturing geometry and semantics of environment from video, RGB-D data, or just simply a single RGB image, focusing on indoors and outdoors environments relevant for robotics applications.

I will demonstrate our work on detailed semantic parsing and 3D structure recovery using deep convolutional neural networks (CNNs) and object detection and object pose recovery from single RGB image. The applicability of the presented techniques for autonomous driving, service robotics, mapping and augmented reality applications will be discussed.

#### Seminar No. 9

Mircea CimpoiDeep Filter Banks for Texture RecognitionCIIRC, Czech Technical University, Prague Thursday 2017-10-19 at 11:00 CIIRC Seminar Room A-303, floor 3 of Building A |

**Abstract**

This talk will be about texture and material recognition from images, and revisiting classical texture representations in the context of deep learning. The results were presented in CVPR 2015 and IJCV 2016. Visual textures are ubiquitous and play an important role in image understanding because they convey significant semantics of images, and because texture representations that pool local image descriptors in an order-less manner have had a tremendous impact in various practical applications. In the talk, we will revisit classic texture representations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another.

**References**

[1] Cimpoi, M., Maji, S., Kokkinos, I., Vedaldi, A., Deep Filter Banks for Texture Recognition, Description, and Segmentation, IJCV (2016) 118:65

[2] Cimpoi, M., Maji, S., and Vedaldi, A., Deep Filter Banks for Texture Recognition and Segmentation, CVPR (2015)

#### Seminar No. 8

Wolfgang FörstnerEvaluation of Estimation Results within Structure from Motion ProblemsUniversity of Bonn, Germany Wednesday 2017-10-18 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Parameter estimation is the core of many geometric problems within structure from motion. The number of parameters ranges from a few, e.g., for pose estimation or triangulation, to huge numbers, such as in bundle adjustment or surface reconstruction. The ability for planning, self-diagnosis, and evaluation is critical for successful project management. Uncertainty of observed and estimated quantities needs to be available, faithful, and realistic. The talk presents methods (1) to critically check the faithfulness of the result of estimation procedures, (2) to evaluate suboptimal estimation procedures, and (3) to evaluate and compare competing procedures w.r.t. their precision in the presence of rank deficiencies. Evaluating bundle adjustment results is taken as one example problem.

**References**

[1] T. Dickscheid, T. Läbe, and W. Förstner, Benchmarking Automatic Bundle Adjustment Results, in 21st Congress of the International Society for Photogrammetry and Remote Sensing (ISPRS), Beijing, China, 2008, p. 7–12, Part B3a.

[2] W. Förstner and K. Khoshelham, Efficient and Accurate Registration of Point Clouds with Plane to Plane Correspondences, in 3rd International Workshop on Recovering 6D Object Pose, 2017.

[3] W. Förstner and B. P. Wrobel, Photogrammetric Computer Vision — Statistics, Geometry, Orientation and Reconstruction, Springer, 2016.

[4] T. Läbe, T. Dickscheid, and W. Förstner, On the Quality of Automatic Relative Orientation Procedures, in 21st Congress of the International Society for Photogrammetry and Remote Sensing (ISPRS), Beijing, China, 2008, p. 37–42 Part B3b-1.

#### Seminar No. 7

Ludovic MagerandProjective Structure-from-Motion and Rolling Shutter Pose EstimationCIIRC, Czech Technical University, Prague Tuesday 2017-10-17 at 11:00 CIIRC Seminar Room A-303, floor 3 of Building A |

**Abstract**

This talk is divided in two parts, the first one will be a presentation of an ICCV’17 paper about a practical solution to the Projective Structure from Motion (PSfM) problem able to deal efficiently with missing data (up to 98%), outliers and, for the first time, large scale 3D reconstruction scenarios. This is achieved by embedding the projective depths into the projective parameters of the points and views to improve computational speed. To do so and to ensure a valid reconstruction, an extension of the linear constraints from the Generalized Projective Reconstruction Theorem is used. With an incremental approach, views and points are added robustly to an initial solvable sub-problem until completion of the underlying factorization.

The second part of the talk will presents my PhD thesis “Dynamic pose estimation with CMOS cameras using sequential acquisition”. CMOS cameras are cheap and can acquire images at very high frame rate thanks to an acquisition mode called Rolling Shutter which sequentially expose the scan-line. This makes them very interesting in the context of very high-speed robotic but it comes with what was long seen as a drawback: when an object (or the camera itself) moves in the scene, distortions appear in the image. These rolling shutter effects actually contain information on the motion and can become another advantage for high-speed robotic by extending the usual pose estimation to also estimate the motion parameters. Two methods achieving this will be presented, one assumes a non-uniform motion model and the second one a projection model suitable for polynomial optimization.

#### Seminar No. 6

Akihiro SugimotoDeeply Supervised 3D Recurrent FCN for Salient Object Detection in VideosNational Institute of Informatics (NII), Japan Monday 2017-09-25 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

This talk presents a novel end-to-end 3D fully convolutional network for salient object detection in videos. The proposed network uses 3D filters in the spatio-temporal domain to directly learn both spatial and temporal information to have 3D deep features, and transfers the 3D deep features to pixel-level saliency prediction, outputting saliency voxels. In the network, we combine the refinement at each layer and deep supervision to efficiently and accurately detect salient object boundaries. The refinement module recurrently enhances to learn contextual information into the feature map. Applying deeply-supervised learning to hidden layers, on the other hand, improves details of the intermediate saliency voxel, and thus the saliency voxel is progressively refined to become finer and finer. Intensive experiments using publicly available benchmark datasets confirm that our network outperforms state-of-the-art methods. The proposed saliency model also effectively works for video object segmentation.

#### Seminar No. 5

Viktor LarssonBuilding Polynomial Solvers for Computer Vision ApplicationsLund University, Sweden Thursday 2017-08-31 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

In the first part of the talk, I will give a brief overview of how polynomial equation systems are typically solved in Computer Vision. These equation system often come from minimal problems, which are fundamental building blocks in most Structure-from-Motion pipelines.

In the second part, I will present two recent papers on methods for constructing polynomial solvers. The first paper is about automatically generating the socalled elimination templates. The second paper extends the method to also handle saturated ideals. This allows us to essentially add additional constraints that some polynomials should be non-zero. Both papers are joint work with Kalle Åström and Magnus Oskarsson.

**References**

[1] Larsson V., Åström K, Oskarsson M., Efficient Solvers for Minimal Problems by Syzygy-Based Reduction, (CVPR), 2017. [http://www.maths.lth.se/matematiklth/personal/viktorl/papers/larsson2017efficient.pdf]

[2] Larsson V., Åström K, Oskarsson M., Polynomial Solvers for Saturated Ideals, (ICCV), 2017.

#### Seminar No. 4

Tomas MikolovNeural Networks for Natural Language ProcessingFacebook AI Research Friday 2017-08-25 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Artificial neural networks are currently very successful in various machine learning tasks that involve natural language. In this talk, I will describe recurrent neural network language models, as well as their most frequent applications to speech recognition and machine translation. I will also talk about distributed word representations, their interesting properties, and efficient ways how to compute them and use in tasks such as text classification. Finally, I will describe our latest efforts to create a novel dataset that could be used to develop machines that can truly communicate with human users in natural language.

**Short bio**:

Tomáš Mikolov is a research scientist at Facebook AI Research group since 2014. Previously he has been member of the Google Brain team, where he developed and implemented efficient algorithms for computing distributed representations of words (the word2vec project). He obtained his PhD from the Brno University of Technology in 2012 for his work on recurrent neural network-based language models (RNNLM). His long term research goal is to develop intelligent machines that people can communicate with and use to accomplish complex tasks.

#### Seminar No. 3

**Workshop on learnable representations for geometric matching**

Monday 2017-08-21, 14:00-18:00

CIIRC Seminar Room B-670, floor 6 of Building B

- 14:00-15:00 Torsten Sattler (ETH Zurich) Hard Matching Problems in 3D Vision
- 15:00-15:20 Coffee break, discussion
- 15:20-16:20 Eric Brachmann (TU Dresden) Scene Coordinate Regression: From Random Forests to End-to-End Learning
- 16:20-16:40 Coffee break, discussion
- 16:40-17:40 Ignacio Rocco (Inria) Convolutional neural network architecture for geometric matching
- 17:40-18:00 Discussion

Speaker: Torsten Sattler, ETH Zurich |

Title: Hard Matching Problems in 3D Vision |

Abstract: Estimating correspondences, i.e., data association, is a fundamental step of each 3D Computer Vision pipeline. For example, 2D-3D matches between pixels in an image and 3D points in a 3D scene model are required for camera pose computation and thus for visual localization. Existing approaches for correspondence estimation, e.g., based on local image descriptors such as SIFT, have been shown to work well for a range of viewing conditions. Still, existing solutions are rather limited in challenging scenes. This talk will focus on data association in challenging scenarios. We first discuss the impact of day-night changes on visual localization, demonstrating that state-of-the-art algorithms perform severely worse compared to the day-day scenario typically considered in the literature. Next, we discuss ongoing work aiming at boosting the performance of local descriptors in this scenario via a dense-sparse feature detection and matching pipeline. A key idea in this work is to use pre-trained convolutional neural networks to obtain descriptors that contain mid-level semantic information compared to the low-level information utilized by SIFT. Based on the intuition that semantic information provides a higher form of invariance, the second part of the talk considers exploiting semantic (image) segmentations in the context of visual localization and visual SLAM. |

Speaker: Eric Brachmann, TU Dresden |

Title: Scene Coordinate Regression: From Random Forests to End-to-End Learning |

Abstract: For decades, estimation of accurate 6D camera poses relied on hand-crafted sparse feature pipelines and geometric processing. Motivated by recent successes, some authors ask the question whether camera localization can be cast as a learning problem. Despite some success, the accuracy of unconstrained CNN architectures trained for this task is still inferior compared to traditional approaches. In this talk, we discuss an alternative line of research, which tries to combine geometric processing with constrained machine learning in the form of scene coordinate regression. We discuss how random forests or CNNs can be trained to substitute sparse feature detection and matching. Furthermore, we show how to train camera localization pipelines end-to-end using a novel, differentiable formulation of RANSAC. We will close the talk with some thoughts about open problems in learning camera localization. |

Speaker: Ignacio Rocco, Inria Paris |

Title: Convolutional neural network architecture for geometric matching |

Abstract: We address the problem of determining correspondences between two images in agreement with a geometric model such as an affine or thin-plate spline transformation, and estimating its parameters. The contributions of this work are three-fold. First, we propose a convolutional neural network architecture for geometric matching. The architecture is based on three main components that mimic the standard steps of feature extraction, matching and simultaneous inlier detection and model parameter estimation, while being trainable end-to-end. Second, we demonstrate that the network parameters can be trained from synthetically generated imagery without the need for manual annotation and that our matching layer significantly increases generalization capabilities to never seen before images. Finally, we show that the same model can perform both instance-level and category-level matching giving state-of-the-art results on the challenging Proposal Flow dataset. |

#### Seminar No. 2

Joe Kileel.Princeton Using Computational Algebra for Computer VisionWednesday 2017-06-07 at 15:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract
**Scene reconstruction is a fundamental task in computer vision: given multiple images from different angles, create a 3D model of a world scene. Nowadays self-driving cars need to do 3D reconstruction in real-time, to navigate their surroundings. Large-scale photo-tourism is also a popular application. In this talk, we will explain how key subroutines in reconstruction algorithms amount to solving polynomial systems, with special geometric structure. We will answer a question of Sameer Agarwal (Google Research) about recovering the motion of two calibrated cameras. Next, we will quantify the “algebraic complexity” of polynomial systems arising from three calibrated cameras. In terms of multi-view geometry, we deal with essential matrices and trifocal tensors. The first part applies tools like resultants from algebra, while the second part will offer an introduction to numerical homotopy continuation methods. Those wondering “if algebraic geometry is good for anything practical” are especially encouraged to attend.

**References**

[1] G. Floystad, J. Kileel, G. Ottaviani: “The Chow form of the essential variety in computer vision,” J. Symbolic Comput., to appear. [https://arxiv.org/pdf/1604.04372]

[2] J. Kileel: “Minimal problems for the calibrated trifocal variety,” SIAM Appl. Alg. Geom., to appear. [https://arxiv.org/pdf/1611.05947]

#### Seminar No. 1

Torsten Sattler. Camera LocalizationETH ZurichThursday 2017-05-11 at 11:00 CIIRC Lecture Hall A1001 of Building A (Jugoslavskych partyzanu 3) |

**Abstract**

Estimating the position and orientation of a camera in a scene based on images is an essential part of many (3D) Computer Vision and Robotics algorithms such as Structure-from-Motion, Simultaneous Localization and Mapping (SLAM), and visual localization. Camera localization has applications in navigation for autonomous vehicles/robots, Augmented and Virtual Reality, and 3D mapping. Furthermore, there are strong relations to camera calibration and visual place recognition. In this talk, I will give an overview over past and current efforts on robust, efficient, and accurate camera localization. I will begin the talk showing that classical localization approaches haven’t been made obsolete by deep learning. Following a local feature-based approach, the talk will discuss how to adapt such methods for real-time visual localization on mobile devices with limited computational capabilities and approaches that scale to large (city-scale) scenes, including the challenges encountered at large-scale. The final part of the talk will discuss open problems in the areas of camera localization and 3D mapping, both in terms of problems we are currently working on as well as interesting long-term goals.

**Short bio**:

Torsten Sattler received a PhD in Computer Science from RWTH Aachen University, Germany, in 2013 under the supervision of Prof. Bastian Leibe and Prof. Leif Kobbelt. In December 2013, he joined the Computer Vision and Geometry Group of Prof. Marc Pollefeys at ETH Zurich, Switzerland, where he currently is a senior researcher and Marc Pollefeys’ deputy while Prof. Pollefeys is on leave from ETH. His research interests include (large-scale) image-based localization using Structure-from-Motion point clouds, real-time localization and SLAM on mobile devices and for robotics, 3D mapping, Augmented & Virtual Reality, (multi-view) stereo, image retrieval and efficient spatial verification, camera calibration and pose estimation. Torsten has worked on dense sensing for self-driving cars as part of the V-Charge project. He is currently involved in enabling semantic SLAM and re-localization for gardening robots (as part of a EU Horizon 2020 project where he leads the efforts on a workpackage), research for Google’s Tango project, where he leads CVG’s research efforts, and in work on self-driving cars.