###### Upcoming seminars (click for details):

* 2025-01-21 11:00 (B-670)* Stanislav Fort

**Adversarial attacks as a baby version of A(G)I alignment**

###### List of past seminars (click for details):

*2024-09-02* Aaron Hertzmann **Toward a Theory of Perspective Perception in Pictures**

*2024-07-22* Dinesh Manocha **Robot Navigation in Complex Indoor and Outdoor Environments**

*2024-03-18* Jitendra Malik **When will we have intelligent robots?** *(video)*

*2024-02-06* Eugen Hruška **Exploration-exploitation tradeoff for protein conformations and dynamics** *(video)*

*2023-10-12* Alexei Efros **It’s Still All About the Data**

*2023-09-29* Bryan Russell **Language-guided audiovisual learning**

*2023-05-05* Yann Labbé **MegaPose: 6D pose estimation of novel objects via render&compare** *(video)*

*2022-12-14* Aless Lasaruk **An Efficient Model for a Camera Behind a Parallel Refractive Slab**

*2022-09-22* David Fouhey **Understanding 3D Rooms and Interacting Hands** *(video)*

*2022-08-25* Dmytro Mishkin **Affine Correspondences and Where to Find Them** *(video)*

*2022-08-17* Ratthachat Chatpatanasiri **Terpene Synthesis Framework & Pytorch Library**

*Algebraic Vision Meeting II **(full schedule)*

*2022-07-25* Simon Telen **Toric Geometry of Entropic Regularization**

*2022-07-25* Evgeniy Martyushev **Optimizing Elimination Templates by Greedy Parameter Search**

*2022-07-25* James Pritts **Rectification, Auto-Calibration, and Scene Parsing from Affine-Correspondences of Repetitive Textures**

*2022-07-22* Snehal Bhayani **Minimal problems for Semi-generalized Homographies**

*2022-07-22* Anton Leykin **Homotopy continuation for geometric problems in vision**

*2022-07-21* Kathlén Kohn **Algebraic statistics — Invariant theory and scaling algorithms for maximum likelihood estimation**

*2022-07-21* Federica Arrigoni **Viewing graph solvability**

*2022-07-21* Luca Magri **Self-calibration in the presence of multiple motions**

*2022-07-21* Viktor Korotynskiy **Interpolating Symmetries of Parametric Polynomial Systems**

*2022-07-20* Andrea Porfiri Dal Cin **Synchronization on Group-labelled Multi-graphs**

*Algebraic Vision Meeting **(full schedule, meetings)*

*2022-06-03* Gabriel Ong **G-Equivariant Neural Networks in Computer Vision** *(video)*

*2022-06-03* Timothy Duff **Polynomial constraints on points and cameras** *(video)*

*2022-06-03* Elima Shehu **The line multiview variety** *(video)*

*2022-06-02* Kathlén Kohn **The Geometry of Linear Convolutional Networks** *(video)*

*2022-06-02* Viktor Larsson **Camera Pose Estimation with Implicit Distortion Models** *(video)*

*2022-06-02* Diana Sungatullina **NeRF in detail: Learning to sample for view synthesis** *(*Reading group in Pattern Recognition and Computer Vision)*

*2022-06-01* Diego Thomas **Building new bridges between the Cyber and physical worlds by 3D vision** *(video)*

*2022-06-01* Paul Breiding **HomotopyContinuation.jl: A package for homotopy continuation in Julia.**

*2022-06-01* Orlando Marigliano **Minimal Problems for Rolling Shutter Cameras** *(video)*

*2022-05-31* Zuzana Kúkelová **Methods for Generating Efficient Algebraic Solvers for Computer Vision Problems** *(*AICzechia seminar)*

*2022-05-31* Felix Rydell **The Generalized Multiview Variety** *(video)*

*2022-05-31* Petr Hrubý **Learning to Solve Hard Minimal Problems** *(video)*

*2022-05-31* Martin Bråtelund **On the Compatibility of Fundamental Matrices** *(video)*

*2022-05-23* Médéric Fourmy **State estimation and localization of legged robots: a tightly-coupled approach based on a-posteriori maximization** *(video)*

*2022-05-20* Timothy Duff **Galois/monodromy groups for decomposing minimal problems in 3D reconstruction** *(video)*

*2022-03-31* Georges Chahine **Multi-Sensor Mapping in Natural Environments: 3D Reconstruction and Temporal Alignment** *(video)*

*2021-09-29* Lenka Zdeborová **Understanding machine learning via exactly solvable models** *(video)*

*2021-08-26* Lucas Disson **Analysis of Molecular Dynamic Simulations for Alzheimer’s Disease Research using VAMPnet Neural Networks**

*2021-04-08* Yann Labbé **Pose estimation of rigid objects and articulated robots** *(**video)*

*2020-10-22* Angel Villar Corrales **Pose Based Image Retrieval in Greek Vase Paintings**

*2020-10-21* T. Werner, M. Kružík, Z. Kúkelová, V. Korotynskiy, A. Bellon, P. Trutman **Optimization Workshop** *(**video)*

*2020-10-01* Bastien Dechamps **Efficient Camera Pose Verification Via Neural Rendering** *(**video)*

*2020-08-06* Evgeniy Martyushev **Finding Polynomial Constraints by Sampling** *(video)*

*2020-06-25* Jan Tomešek **Visual Geo-Localization in Natural Environments**

*2020-05-21* Michal Vavrecka **Incognite research group – an overview** *(video)*

*2020-05-07* Hugo Cisneros **Evolving Structures in Complex Systems** *(video)*

*2020-04-23* Michal Rolinek **Differentiation of Blackbox Combinatorial Solvers** *(video)*

*2020-04-16* Tomas Pajdla **Lex Fridman: Deep Learning State of the Art 2020**

*2020-04-09* Luis Gomez Camara **Robust Long-term Autonomous Navigation of Robots in Challenging Environments** *(video)*

*2020-04-02* Kateryna Zorina **Reading Group on Policy Gradient Methods in Reinforcement Learning**

*2020-03-31* Kathlén Kohn **Minimal Problems in Computer Vision**

*2020-01-21* Luca Magri **Multiple structure recovery via clustering in preference space**

*2020-01-14* Pavel Trutman **Globally Optimal Solution to Inverse Kinematics of 7DOF Serial Manipulator**

*2019-12-05* Kathlén Kohn **Point-Line Minimal Problems for 3 Cameras with Partial Visibility**

*2019-12-04* Martin Bråtelund **Critical loci for reconstruction from two views**

*2019-10-01* Akihiro Sugimoto **Revisiting Depth Image Fusion with Variational Message Passing**

*2019-09-13* Ekaterina Zorina **Learning from demonstrations**

*2019-08-30* Margaret Regan **Image Reconstruction Using Numerical Algebraic Geometry**

*2019-08-29* Viktor Larsson **Geometric Estimation with Radial Distortion**

*2019-08-27* Andrew Pryhuber **Ideals of the Multiview Variety**

*2019-08-26* Torsten Sattler **Domain Adaptation vs. Semantic Invariance for Long-Term Visual Localization** *(slides)*

*2019-08-22* Teven Le Scao **Neural Differential Equations for image super-resolution**

*2019-07-18* Timothy Duff **Intro to homotopy continuation with a view towards minimal problems**

*2019-07-17* Kathlén Kohn **Point-Line Minimal Problems in Complete Multi-View Visibility**

*2019-07-16* Andrea Fussielo **Synchronisation: from pairwise measures to global values** *(slides)*

*2019-05-06* Horia-Mihai Bujanca **SLAMBench 3.0: Benchmarking beyond traditional Visual SLAM**

*2019-03-13* Mohab Safey El Din **Polar varieties, matrices and real root classification**

*2019-03-05* Simon Telen **Stabilized algebraic methods for numerical root finding**

*2019-01-11* Guillem Alenya **Challenges in robotics for human environments and specially for textile manipulation**

*2019-01-11* Kimotishi Yamazaki **Vision-Based Cloth Manipulation by Autonomous Robots**

*2018-10-25* David Fouhey **Understanding how to get to places and do things** *(video)*

*2018-10-24* Yuriy Kaminskyi **Semantic segmentation for indoor localization**

*2018-08-16* Viktor Larsson **Ortho-Perspective Epipolar Geometry, Optimal Trilateration and Time-of-Arrival**

*2018-07-24* Torsten Sattler **Challenges in Long-Term Visual Localization**

*2018-05-25* Alexei Efros **Self-supervision, Meta-supervision, Curiosity: Making Computers Study Harder** *(video)*

*2018-05-22* Di Meng **Cameras behind glass – polynomial constraints on image projection**

*2018-02-23* Oles Dobosevych **On Ukrainian characters MNIST dataset**

*2018-02-22* F. Bach, P.-Y. Massé, N. Mansard, J. Šivic, T. Werner *AIME@CZ – Czech Workshop on Applied Mathematics in Engineering*

*2017-12-07* Federica Arrigoni **Synchronization Problems in Computer Vision**

*2017-11-23* Antonín Šulc **Lightfield Analysis for non-Lambertian Scenes**

*2017-10-30* Jana Košecká **Semantic Understanding for Robot Perception**

*2017-10-19* Mircea Cimpoi **Deep Filter Banks for Texture Recognition**

*2017-10-18* Wolfgang Förstner **Evaluation of Estimation Results within Structure from Motion Problems**

*2017-10-17* Ludovic Magerand **Projective Structure-from-Motion and Rolling Shutter Pose Estimation**

*2017-09-25* Akihiro Sugimoto **Deeply Supervised 3D Recurrent FCN for Salient Object Detection in Videos**

*2017-08-31* Viktor Larsson **Building Polynomial Solvers for Computer Vision Applications**

*2017-08-25* Tomas Mikolov **Neural Networks for Natural Language Processing**

*2017-08-21* Torsten Sattler, Eric Brachmann, Ignacio Rocco *Workshop on learnable representations for geometric matching*

*2017-06-07* Joe Kileel **Using Computational Algebra for Computer Vision**

*2017-05-11* Torsten Sattler **Camera Localization**

#### Seminar No. 88

Stanislav FortAdversarial attacks as a baby version of A(G)I alignmentGoogle DeepMind Tuesday 2025-01-21 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

Adversarial attacks pose a significant challenge to the robustness, reliability and alignment of deep neural networks from simple computer vision models to hundred-billion-parameter language models. Despite their ubiquitous nature, our theoretical understanding of their character and ultimate causes, as well as our ability to successfully defend against them are noticeably lacking. This talk examines the robustness of modern deep learning methods and the surprising scaling of attacks on them, and showcases several practical examples of transferable attacks on the largest closed-source vision-language models out there. Building on biological insights and new empirical evidence, I will introduce our solution proposed in [1], in which we make a step towards the alignment of the implicit human and the explicit machine vision representations, closely connecting interpretability and robustness. I will conclude with a direct analogy between the problem of adversarial examples and the much larger task of general AI alignment.

[1] Ensemble everything everywhere: Multi-scale aggregation for adversarial robustness. Stanislav Fort, Balaji Lakshminarayanan

#### Seminar No. 86

Aaron HertzmannToward a Theory of Perspective Perception in PicturesAdobe Research Monday 2024-09-02 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

I propose a new approach to understanding how human vision interprets 3D shape in realistic pictures, specifically focusing on perspective. I argue that most of human shape perception happens in single eye fixations, mostly in foveal vision, and that humans preserve surprisingly little 3D awareness over time. In pictures, this means that each individual eye fixation can have, to some extent, a separate perspective from the others. This theory integrates ideas from human vision science, art history, and computational photography, and suggests new ways to think about how pictures work, and how we can make them.

**Bio**

Aaron is a Principal Scientist at Adobe Research, and Affiliate Faculty at University of Washington. He received a Bachelor degree in Computer Science and Art at Rice University and a Ph.D. degree in Computer Science from New York University. He was previously a Professor of Computer Science at the University of Toronto for ten years. He has published over 120 papers in computer graphics, several subfields of AI, and in the science of art. He is an IEEE Fellow, an ACM Fellow, and winner of the 2024 SIGGRAPH Computer Graphics Achievement Award.

#### Seminar No. 85

Dinesh ManochaRobot Navigation in Complex Indoor and Outdoor EnvironmentsUniversity of Maryland at College Park Monday 2024-07-22 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

In the last few decades, most robotics success stories have been limited to structured or controlled environments. A major challenge is to develop robot systems that can operate in complex or unstructured environments corresponding to homes, dense traffic, outdoor terrains, public places, etc. In this talk, we give an overview of our ongoing work on developing robust planning and navigation technologies that use recent advances in computer vision, sensor technologies, machine learning, and motion planning algorithms. We present new methods that utilize multi-modal observations from an RGB camera, 3D LiDAR, and robot odometry for scene perception, along with deep reinforcement learning for reliable planning. The latter is also used to compute dynamically feasible and spatial aware velocities for a robot navigating among mobile obstacles and uneven terrains. The latter is also used to compute dynamically feasible and spatial aware velocities for a robot navigating among mobile obstacles and uneven terrains. We have integrated these methods with wheeled robots, home robots, and legged platforms and highlight their performance in crowded indoor scenes, home environments, and dense outdoor terrains.

**Bio**

Dinesh Manocha is Paul Chrisman-Iribe Chair in Computer Science & ECE and Distinguished University Professor at University of Maryland College Park. His research interests include virtual environments, physically-based modeling, and robotics. His group has developed a number of software packages that are standard and licensed to 60+ commercial vendors. He has published more than 750 papers & supervised 50 PhD dissertations. He is a Fellow of AAAI, AAAS, ACM, IEEE, and NAI and member of ACM SIGGRAPH and IEEE VR Academies, and Bézier Award from Solid Modeling Association. He received the Distinguished Alumni Award from IIT Delhi the Distinguished Career in Computer Science Award from Washington Academy of Sciences. He was a co-founder of Impulsonic, a developer of physics-based audio simulation technologies, which was acquired by Valve Inc in November 2016.

#### Seminar No. 84

Jitendra MalikWhen will we have intelligent robots?UC Berkeley Monday 2024-03-18 at 11:00CIIRC Room A-1001 (building A, floor 10) |

**Video**

https://youtu.be/rbeaRHWaRu8

**Abstract**

Deep learning has resulted in remarkable breakthroughs in fields such as speech recognition, computer vision, natural language processing, and protein structure prediction. Robotics has proved to be much more challenging as there are no pre-existing repositories of behavior to draw upon; rather the robot has to learn from its own trial and error in its own specific body, and it has to generalize and adapt. I believe that the most promising approach for this is to train robot skills in simulation and then transfer them to the real world. I will show multiple examples of skills – legged locomotion (quadruped and humanoid), navigation, and dexterous manipulation such as in-hand rotation and twisting caps off bottles – acquired in this paradigm. Along the way, we developed “Rapid Motor Adaptation”, a method for adaptive control in the framework of deep reinforcement learning. Looking to the future, I believe that there are multiple insights from the development of motor skills in children that are relevant to robotics; I will sketch some examples and partial results. While we are many years away from having robots with the skills of a five year old, progress in the last few years has been remarkable and substantial.

**Bio**

Jitendra Malik is the Arthur J. Chick Professor in the Department of Electrical Engineering and Computer Sciences at UC Berkeley. He is also part-time Research Scientist Director at Meta. Malik’s research group has worked on many different topics in computer vision, human visual perception, robotics, machine learning and artificial intelligence. Several well-known concepts and algorithms arose in this research, such as anisotropic diffusion, normalized cuts, high dynamic range imaging, shape contexts and R-CNN. His honors include the 2013 IEEE PAMI-TC Distinguished Researcher in Computer Vision Award, the 2014 K.S. Fu Prize from the International Association of Pattern Recognition, the 2016 ACM-AAAI Allen Newell Award, the 2018 IJCAI Award for Research Excellence in AI, and the 2019 IEEE Computer Society Computer Pioneer Award. He is a member of the National Academy of Engineering and the National Academy of Sciences, and a fellow of the American Academy of Arts and Sciences.

#### Seminar No. 83

Eugen HruškaExploration-exploitation tradeoff for protein conformations and dynamicsCharles University, Czech Republic Tuesday 2024-02-06 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

While the folded structure of proteins can be rapidly estimated to some accuracy, the accurate estimation of protein conformational distributions and their dynamics is an open challenge. The different conformations and their dynamics play a frequent role in pharmaceutical questions. Experimental datasets of accurate protein conformational distributions are limited. Brute-force in silico estimation of protein conformational distributions at molecular mechanics accuracy is currently limited to small systems by computational resources. Here, we will discuss scaling high throughput simulation and representation of high-dimensional protein dynamics to larger systems. Furthermore, we will examine strategies to speed up the convergence and the connected exploration-exploitation tradeoff.

#### Seminar No. 82

Alexei EfrosIt’s Still All About the DataUC Berkeley, CA, USA Thursday 2023-10-12 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

The use of large-scale data is the primary driver for the recent advances in AI systems. Yet, the data often does not get as much respect as it deserves. In this talk, I will discuss several aspects of visual data including various sources of dataset bias, methods for “deep fake” creation and detection, and the problem of data attribution in generative models.

#### Seminar No. 81

Bryan RussellLanguage-guided audiovisual learningAdobe, San Francisco, CA Friday 2023-09-29 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

In this talk, I will describe recent work on two tasks where joint reasoning over the visual, audio, and language modalities is critical. In the first part, I will present an approach for recommending a music track for an input video while allowing a user to guide the music selection with free-form natural language. Our approach can match or exceed the performance of prior methods on video-to-music retrieval while significantly improving retrieval accuracy when using text guidance. In the second part, I will present a self-supervised approach that performs audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data in conjunction with a pretrained image-language foundation model. Our approach outperforms prior strongly supervised approaches for this task despite not using object detectors or text labels during training.

**Bio**

Bryan Russell is a Senior Research Scientist at Adobe in San Francisco, CA. His research focuses on problems in video and 3D understanding, where he has co-authored over 45 publications. He jointly received the Helmholtz “test-of-time” prize at ICCV 2017 for work on “Discovering Objects and their Location in Images”, which appeared at ICCV 2005. He has collaborated on several tech transfers to Adobe’s products, including video auto tagging, video shot type filtering, automatic ground plane estimation, and automatic portrait segmentation.

#### Seminar No. 80

Yann LabbéMegaPose: 6D pose estimation of novel objects via render&compareINRIA/ENS, Paris, FR Friday 2023-05-05 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

In this talk, I will present MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. This is an important problem to allow robotic and AR/VR systems to be rapidly deployed in novel scenes containing novel objects. The approach relies on three main contributions: (i) a scoring network for finding the best initial estimate among a set of coarse hypotheses, (ii) a network for iterative refinement where the shape and coordinate system of the novel object are implicitly provided as inputs, and (iii) a large-scale synthetic training dataset displaying thousands of different objects in challenging visual conditions. We evaluate MegaPose on hundreds of novel objects in real images from several pose estimation benchmarks and show our approach achieves performance competitive with existing methods that require access to the target objects during training. Code, dataset and trained models are available on the project page https://megapose6d.github.io.

#### Seminar No. 79

Aless LasarukAn Efficient Model for a Camera Behind a Parallel Refractive SlabZF Friedrichshafen, Germany Wednesday 2022-12-14 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

We present a new and efficient solution for the forward world to image projection of a pinhole camera with distortions placed behind a planar refractive slab. Firstly, we introduce a novel way to compute the projection by reducing the problem to finding a real quantity called a slab shift. We characterize a physically meaningful slab shift as the unique solution of a fixed point equation on the one hand and as a specific uniquely defined root of a quartic polynomial in the unknown slab shift on the other. In the latter case we obtain a closed-form formula, which provides the unique physically meaningful projection. Secondly, we develop an approximation of the projection through the slab that reaches single-precision floating point accuracy for practically relevant problem instances with considerably lower computational costs compared to the exact solution. We demonstrate the accuracy and the efficiency of our method with realistic synthetic experiments. We demonstrate with real experiments that our method enables efficient camera calibration behind the windshield for automotive industry applications.

#### Seminar No. 78

David FouheyUnderstanding 3D Rooms and Interacting HandsUniversity of Michigan, MI, USA Thursday 2022-09-22 at 15:00CIIRC Room B-671 (building B, floor 6) + online via Zoom: video |

**Abstract**

The long-term goal of my research is to enable computers to understand the physical world from images, including both 3D properties and how humans or robots could interact with things. This talk will summarize two recent directions aimed at enabling this goal.

I will begin with learning to reconstruct full 3D scenes, including invisible surfaces, from a single RGB image and present work that can be trained with the ordinary unstructured 3D scans that sensors usually collect. Our method uses implicit functions, which have shown great promise when learned on watertight meshes. When trained on non-watertight meshes, we show that the conventional setup incentivizes neural nets to systematically distort their prediction. We offer a simple solution with a distance-like function that leads to strong results for full scene reconstruction on Matterport3D and other datasets.

I will then focus on understanding what humans are doing with their hands. Hands are a primary means for humans to manipulate the world, but fairly basic information about what they’re doing is often off limits to computers (or, at least in challenging data). I’ll describe some of our efforts on understanding hand state, including work on learning to segment hands and hand-held objects in images via a system that learns from large-scale video data.

**Bio**

David Fouhey is an assistant professor at the University of Michigan. He received a Ph.D. in robotics from Carnegie Mellon University and was then a postdoctoral fellow at UC Berkeley. His work has been recognized by a NSF CAREER award, and NSF and NDSEG fellowships. He has spent time at the University of Oxford’s Visual Geometry Group, INRIA Paris, and Microsoft Research.

#### Seminar No. 77

Dmytro MishkinAffine Correspondences and Where to Find ThemVRG, FEL, CTU in Prague / HOVER Inc Thursday 2022-08-25 at 10:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

We discuss the possible ways of obtaining affine correspondences, starting with classical affine-covariant local features, such as Hessian-Affine and MSER, and continuing with the view synthesis-based approach, first proposed in the Affine-SIFT paper. We will cover the deep learning approaches, such as patch-based AffNet, dense AffNet. Also, we will discuss the two and more view-based upgrades to the affine correspondences starting with similarity-covariant or even point correspondences.

#### Seminar No. 76

Ratthachat ChatpatanasiriTerpene Synthesis Framework & Pytorch LibraryThaiKeras Wednesday 2022-08-17 at 16:00CIIRC Room B-633 (building B, floor 6) |

**Abstract**

This talk introduced the joint machine-learning and biochemistry project to understand the process of “terpene synthesis”. We briefly explained the background on molecular synthesis and one important molecular class, called “terpene”, the main target of this project. We further illustrated how to formulate the terpene synthesis prediction problem using either Large-language models or Graph Neural Networks, as well as conventional prediction metrics used in chemoinformatics literature. Preliminary experiments showed that a straightforward language-model with conventional metrics was not able to capture the essence of terpene. Finally, we suggested some possible directions to continue this exciting biochemistry machine-learning journey.

#### Seminar No. 75 (Algebraic Vision Meeting II)

Simon TelenToric Geometry of Entropic RegularizationCentrum Wiskunde & Informatica (CWI) Amsterdam, Netherlands Monday 2022-07-25 at 14:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

Entropic regularization is a method for large-scale linear programming. Geometrically, one traces intersections of the feasible polytope with scaled toric varieties, starting at the Birch point. We compare this to log-barrier methods, with reciprocal linear spaces, starting at the analytic center. We revisit entropic regularization for unbalanced optimal transport. We compute the degree of the associated toric variety, and we explore algorithms like iterative scaling. This is Joint work with Bernd Sturmfels, François-Xavier Vialard, and Max von Renesse.

#### Seminar No. 74 (Algebraic Vision Meeting II)

Evgeniy MartyushevOptimizing Elimination Templates by Greedy Parameter SearchSouth Ural State University, Chelyabinsk, Russia Monday 2022-07-25 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

We propose a new method for optimizing elimination templates for efficient polynomial system solving of minimal problems in structure from motion, image matching, and camera tracking. We first construct a particular affine parameterization of the elimination templates for systems with a finite number of distinct solutions. Then, we use a heuristic greedy optimization strategy over the space of parameters to get a template with a small size. We test our method on 34 minimal problems in computer vision. For all of them, we found the templates either of the same or smaller size compared to the state-of-the-art. For some difficult examples, our templates are, e.g., 2.1, 2.5, 3.8, 6.6 times smaller. For the problem of refractive absolute pose estimation with unknown focal length, we have found a template that is 20 times smaller. Our experiments on synthetic data also show that the new solvers are fast and numerically accurate. We also present a fast and numerically accurate solver for the problem of relative pose estimation with unknown common focal length and radial distortion.

#### Seminar No. 73 (Algebraic Vision Meeting II)

James PrittsRectification, Auto-Calibration, and Scene Parsing from Affine-Correspondences of Repetitive TexturesCIIRC, Czech Technical University in Prague, Czech Republic Monday 2022-07-25 at 9:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

The talk will introduce the use of affine-covariant features for single-view geometry estimation. In particular, we show how to construct constraints on scene-plane rectification and camera intrinsics using affine features detected from coplanar repetitive texture. We will introduce general solvers that jointly undistort and rectify from imaged translations, reflections, and rigid transforms. We show how to use these solvers in a robust estimation framework to construct a generative model of a coplanar pattern,and introduce a maximum-likelihood estimator to refine it. We will discuss the sensitivity of single-view tasks to noisy region extractions and what is required to make these methods work in practice. Finally, we will show how to combine affine features with circular arcs to generate hybrid solvers using coplanar repetitive texture and parallel scene lines to auto-calibrate cameras and recover absolute pose.

#### Seminar No. 72 (Algebraic Vision Meeting II)

Snehal BhayaniMinimal problems for Semi-generalized HomographiesCenter for Machine Vision and Signal Analysis (CMVS), University of Oulu, Finland Friday 2022-07-22 at 14:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

In computer vision, homography is a mapping between two 2D images by a point transfer through a plane in the scene. If the two images are detected by a pinhole and a generalized camera (system of multiple camera centers), we have a “semi-generalized” homography. Such a semi-generalized homography can be used to estimate the relative pose between the pinhole and the generalized camera. In this talk we will derive minimal solvers for this problem from a set of five point correspondences through a scene plane. The pinhole camera can be calibrated or partially calibrated with unknown focal length. Using a choice of the coordinate system and parameterization, we can derive extremely simple minimal solvers. In the context of visual localization such solvers can be very useful, as evident from synthetic as well as real experiments.

#### Seminar No. 71 (Algebraic Vision Meeting II)

Anton LeykinHomotopy continuation for geometric problems in visionGeorgia Institute of Technology, School of Mathematics, GA, USA Friday 2022-07-22 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

I will survey methods based on the homotopy continuation which led to success in several projects related to vision. One strand of research (with Duff, Fabbri, Hruby, Kohn, Pajdla, and others) was aimed at classification of minimal problems and construction of efficient solvers. The other (with Christian, Duff, Mancini) concerns orbit determination from bearing measurements in astrodynamics.

#### Seminar No. 70 (Algebraic Vision Meeting II)

Kathlén KohnAlgebraic statistics — Invariant theory and scaling algorithms for maximum likelihood estimationKTH Royal Institute of Technology, Stockholm, Sweden Thursday 2022-07-21 at 15:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

This talk will give an introduction to algebraic statistics. As an example, we will discuss maximum likelihood estimation from the algebraic perspective. This is a widespread approach for the fundamental task of fitting data to a model. It aims to find a maximum likelihood estimate (MLE) by maximizing the likelihood of observing the data as we range over the model.

For two common statistical settings (log-linear models / finite discrete exponential families and Gaussian transformation families), we show that this approach is equivalent to a capacity problem in invariant theory: finding a point of minimal norm in an orbit under a corresponding group action. The existence of the MLE can then be characterized by stability notions under the action. Moreover, algorithms from statistics can be used in invariant theory, and vice versa.

This talk is based on joint work with Carlos Améndola, Philipp Reichenbach and Anna Seigal.

#### Seminar No. 69 (Algebraic Vision Meeting II)

Federica ArrigoniViewing graph solvabilityPolitecnico di Milano, Italy Thursday 2022-07-21 at 14:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

In structure-from-motion the viewing graph is a graph where vertices correspond to cameras and edges represent fundamental matrices. We provide a new formulation and an algorithm for establishing whether a viewing graph is solvable, i.e. it uniquely determines a set of projective cameras. Known theoretical conditions either do not fully characterize the solvability of all viewing graphs, or are exceedingly hard to compute for they involve solving a system of polynomial equations with a large number of unknowns. We show how to reduce the number of unknowns by exploiting the cycle consistency. We advance the understanding of the solvability by (i) finishing the classification of all previously undecided minimal graphs up to 9 nodes, (ii) extending the practical solvability testing up to minimal graphs with up to 90 nodes, and (iii) definitely answering an open research question by showing that the finite solvability is not equivalent to the solvability. Finally, we present an experiment on real data showing that unsolvable graphs are appearing in practical situations.

#### Seminar No. 68 (Algebraic Vision Meeting II)

Luca MagriSelf-calibration in the presence of multiple motionsPolitecnico di Milano, Italy Thursday 2022-07-21 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

One of the main assumptions behind Structure-from-Motion techniques is that of a rigid scene, i.e., the scene is static or composed of a single moving object. The rigidity constraint – typically encoded in the Kruppa equations – is at the core of self-calibration techniques and enables the Euclidean 3D reconstruction from uncalibrated images. In this talk, we explore how motion segmentation can be exploited to improve self-calibration when the scene is composed of multiple rigid moving objects.

#### Seminar No. 67 (Algebraic Vision Meeting II)

Viktor KorotynskiyInterpolating Symmetries of Parametric Polynomial SystemsCIIRC, Czech Technical University in Prague, Czech Republic Thursday 2022-07-21 at 10:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

Parametric polynomial systems arise in geometric computer vision as minimal problems. In practice, we need efficient solvers for minimal problems that are able to solve 3D reconstruction task in real time. Solving a minimal problem means solving the polynomial system specialized at the given parameters (which are pixel coordinates of points and lines in images). Sometimes parametric polynomial systems decompose into a sequence of simpler subsystems that allows to design a simpler solver for these systems. Monodromy is an effective tool from numerical algebraic geometry that allows us to check whether a polynomial system is decomposable. However, monodromy itself doesn’t provide an algorithm that computes a decomposition. We concentrate ourselves on the case when the decomposability is caused by the presence of symmetries in the polynomial system (e.g. twisted pair symmetry in the 5-point problem in computer vision). In that case we believe that computational invariant theory can answer the question of how to find a decomposition effectively. In this talk I will describe how to verify that a parametric polynomial system has symmetries and how they can be computed. This is joint work in progress with Timothy Duff, Tomas Pajdla, and Margaret Regan.

#### Seminar No. 66 (Algebraic Vision Meeting II)

Andrea Porfiri Dal CinSynchronization on Group-labelled Multi-graphsPolitecnico di Milano, Italy Wednesday 2022-07-20 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

Synchronization refers to the problem of inferring the unknown values attached to vertices of a graph where edges are labelled with the ratio of the incident vertices, and labels belong to a group. This paper addresses for the first time the synchronization problem on multi-graphs, that are graphs with more than one edge connecting the same pair of nodes. The problem naturally arises when multiple measures are available to model the relationship between two vertices. This happens when different sensors measure the same quantity, or when the original graph is partitioned into sub-graphs that are solved independently. In this case the relationships among sub-graphs give rise to multi-edges and the problem can be traced back to a multi-graph synchronization problem. Specifically, we present MULTISYNC, the first synchronization algorithm for multi-graphs that is based on a principled constrained eigenvalue optimization. MULTISYNC is a general solution that can cope with any linear group and we show to be profitably usable both on synthetic and real multi-graph synchronization problems.

#### Seminar No. 65 (Algebraic Vision Meeting)

Gabriel OngG-Equivariant Neural Networks in Computer VisionMax Planck Institute for Mathematics in the Sciences (MPI-MiS), Leipzig, Germany & Bowdoin College, ME, USA Friday 2022-06-03 at 14:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

Convolutional layers have proven to be highly effective tools in computer vision due to their invariance under the translation group. This leverages the underlying symmetry and structure of images to develop more efficacious neural networks for image classification. But this only leverages part of the symmetries present in the image, which also contains symmetries of rotations and reflections. We will survey group equivariant neural networks introduced by Cohen and Welling, discussing the various ways algebraic structures can be leveraged in vision as well as the proof of universal approximation of equivariant neural networks. This talk is based on the work of Cohen and Welling, as well as that of Sannai, et. al.

#### Seminar No. 64 (Algebraic Vision Meeting)

Timothy DuffPolynomial constraints on points and camerasUniversity of Washington, WA, USA Friday 2022-06-03 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

Projective space, rational maps, and other notions from algebraic geometry appear naturally in the study of image formation and various camera models in computer vision. Considerable attention has been paid to multiview ideals, which collect all polynomial constraints on images that must be satisfied by a given camera arrangement. We extend past work on multiview ideals to settings where the camera arrangement is unknown. We characterize various “generalized multiview ideals”, which are interesting objects in their own right. Some nice previous results about multiview ideals also fall out from our framework. We give a new proof of a result by Aholt, Sturmfels, and Thomas that the multiview ideal has a universal Groebner basis consisting of k-focals (also known as k-linearities in the vision literature) for k in {2,3,4}. This is a preliminary report based on ongoing joint work with Sameer Agarwal, Max Lieblich, and Rekha Thomas.

#### Seminar No. 63 (Algebraic Vision Meeting)

Elima ShehuThe line multiview varietyOsnabrück University, Germany & Max Planck Institute for Mathematics in the Sciences (MPI-MiS), Leipzig, Germany Friday 2022-06-03 at 10:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

Consider the 3D reconstruction problem in Computer Vision and restrict it to the case when the photographed objects are lines. The first step is to find the lines in each of the images that originate from the same line in 3D space. These correspondences can be studied using algebraic geometry tools. We will see that the line multiview variety is indeed a determinantal variety of dimension 4. I will explain the motivation behind this construction and some of its geometric properties. Based on joint work with Felix Rydell, Angelica Torres, and Paul Breiding.

#### Seminar No. 62 (Algebraic Vision Meeting)

Kathlén KohnThe Geometry of Linear Convolutional NetworksKTH Royal Institute of Technology, Stockholm, Sweden Thursday 2022-06-02 at 15:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

We discuss linear convolutional neural networks (LCNs) and their critical points. We observe that the function space (i.e., the set of functions represented by LCNs) can be identified with polynomials that admit certain factorizations, and we use this perspective to describe the impact of the network’s architecture on the geometry of the function space. For instance, for LCNs with one-dimensional convolutions having stride one and arbitrary filter sizes, we provide a full description of the boundary of the function space. We further study the optimization of an objective function over such LCNs: We characterize the relations between critical points in function space and in parameter space and show that there do exist spurious critical points. We compute an upper bound on the number of critical points in function space using Euclidean distance degrees and describe dynamical invariants for gradient descent. This talk is based on joint work with Thomas Merkh, Guido Montúfar, and Matthew Trager.

#### Seminar No. 61 (Algebraic Vision Meeting)

Viktor LarssonCamera Pose Estimation with Implicit Distortion ModelsLund University, Sweden Thursday 2022-06-02 at 14:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

In this talk I will discuss some recent work on 6 degree-of-freedom camera pose estimation using implicit distortion models. In contrast to standard methods, we do not assume to have an explicit distortion model, but instead use a regularization term that ensures the latent distortion map varies smoothly throughout the image. The proposed model is effectively parameter-free and allows us to optimize the 6 degree-of-freedom camera pose without explicitly knowing the intrinsic calibration.

We show that the method is applicable to a wide selection of cameras with varying distortion and in multiple applications, such as visual localization and structure-from-motion.

#### Seminar No. 60 (Algebraic Vision Meeting)

Diego ThomasBuilding new bridges between the Cyber and physical worlds by 3D visionKyushu University, Fukuoka, Japan Wednesday 2022-06-01 at 14:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

High fidelity 3D maps of our world can now be reconstructed using modern 3D mapping technology. However, realistic integration of digital content (like digital avatars) in real 3D environments remains extremely challenging. This is because not only static scenes but also dynamic scenes must be captured and modeled. In this talk I will discuss about the evolution of hardware-based technology for RGB-D based 3D reconstruction. I will also talk about recent AI-based solutions for 3D shape reconstruction from a single image and data-driven 3D human body animation, which is bringing the technology to a wider range of public. I will also discuss about future promising directions to adapt state-of-the-art AI-based 3D reconstruction methods to represent the diversity present in our world.

#### Seminar No. 59 (Algebraic Vision Meeting)

Paul BreidingHomotopyContinuation.jl: A package for homotopy continuation in Julia.Osnabrück University, Germany Wednesday 2022-06-01 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom |

**Abstract**

I will present the software package HomotopyContinuation.jl. It provides efficient implementations of algorithms for solving systems of polynomial equations. I will show applications to problems in 3D-reconstruction.

#### Seminar No. 58 (Algebraic Vision Meeting)

Orlando MariglianoMinimal Problems for Rolling Shutter CamerasKTH Royal Institute of Technology, Stockholm, Sweden Wednesday 2022-06-01 at 10:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

This talk is about the 3D reconstruction problems for points and lines using a rolling shutter camera model. This model generalizes the pinhole camera model. It is more complicated in that there is a time component when taking pictures, which can create distortions that have to be accounted for when doing 3D reconstruction. It is also more accurate, since most smartphone cameras are actually rolling shutter cameras. In this talk I describe how to approach this problem algebraically using certain surfaces in the projective Grassmannian Gr(1,3).

#### Seminar No. 57 (Algebraic Vision Meeting)

Felix RydellThe Generalized Multiview VarietyKTH Royal Institute of Technology, Stockholm, Sweden Tuesday 2022-05-31 at 14:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

Point and line correspondences for pinhole cameras are for the most part well-understood by now. In this talk we generalize the study to correspondences of k-planes in camera planes of different dimensions. The geometry of this problem is interesting by itself, but it also sees potential applications in modelling dynamic scenes and for understanding certain determinantal varieties.

#### Seminar No. 56 (Algebraic Vision Meeting)

Petr HrubýLearning to Solve Hard Minimal ProblemsETH Zürich, Switzerland Tuesday 2022-05-31 at 11:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

We present an approach to solving hard geometric optimization problems in the RANSAC framework. The hard minimal problems arise from relaxing the original geometric optimization problem into a minimal problem with many spurious solutions. Our approach avoids computing large numbers of spurious solutions. We design a learning strategy for selecting a starting problem-solution pair that can be numerically continued to the problem and the solution of interest. We demonstrate our approach by developing a RANSAC solver for the problem of computing the relative pose of three calibrated cameras, via a minimal relaxation using four points in each view. On average, we can solve a single problem in under 70 microseconds. We also benchmark and study our engineering choices on the very familiar problem of computing the relative pose of two calibrated cameras, via the minimal case of five points in two views.

#### Seminar No. 55 (Algebraic Vision Meeting)

Martin BråtelundOn the Compatibility of Fundamental MatricesUniversity of Oslo, Norway Tuesday 2022-05-31 at 10:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

In this talk we will explore the problem of compatibility of fundamental matrices: given a set of 3×3 matrices of rank 2, when does there exist a set of cameras for which these are the fundamental matrices. While the problem has been solved for some settings, parts of the problem still remain unsolved.

#### Seminar No. 54

Médéric FourmyState estimation and localization of legged robots: a tightly-coupled approach based on a-posteriori maximizationLAAS, CNRS, Toulouse, France Monday 2022-05-23 at 15:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

Legged robots are complex dynamical systems whose stable behavior depends on several physical quantities that must be observed at high speed and accuracy, such as the robot velocity and orientation with respect to the gravity. While the robot state cannot be observed directly, it is typically reconstructed by fusing very diverse sensor modalities. The goals of state estimation are twofolds: on one hand, the robot state is needed to maintain the sense of balance and locomote safely, and, on the other hand, a representation of the environment is required for navigation and interaction. The current approach for many legged systems is to solve these problems independently, using cascades of estimators that may neglect some of the correlations present in the data. We rather defend the idea of building a single tightly-coupled estimator capable of estimating all quantities needed by the robot. For this goal, the framework of a-posteriori estimation is used and formalized as a factor graph, which is already a popular way of representing and solving SLAM estimation problems. We will focus on a few examples of applications: a robot base odometry based on the fusion of inertial and kinematic information, how to include external force for the centroidal estimation of a legged robot, as well as object-level visual-inertial algorithms based on fiducial markers and deep-learning-based object pose estimation.

#### Seminar No. 53

Timothy DuffGalois/monodromy groups for decomposing minimal problems in 3D reconstructionUniversity of Washington, WA, USA Friday 2022-05-20 at 15:00CIIRC Room B-670 (building B, floor 6) + online via Zoom: video |

**Abstract**

In computer vision, the study of minimal problems is critical for many 3D reconstruction tasks. Solving minimal problems comes down to solving systems of polynomial equations of a very particular structure. “Structure” can be understood in terms of the Galois/monodromy group of an associated branched cover. For classical problems such as homography estimation and five-point relative pose, efficient solutions exploit imprimitivity of the Galois groups; in these cases, the imprimitivity comes from certain rational deck transformations. In general, Galois groups can be computed with numerical homotopy continuation using a variety of software. I will highlight joint work with Viktor Korotynskiy, Tomas Pajdla, and Maggie Regan that studies an ever expanding zoo of minimal problems and their Galois groups, with a view towards identifying new minimal problems that might be useful in practice.

#### Seminar No. 52 (online)

Georges ChahineMulti-Sensor Mapping in Natural Environments: 3D Reconstruction and Temporal AlignmentGeorgia Institute of Technology, GA, USA Thursday 2022-03-31 at 10:00Online via Zoom: video |

**Abstract**

Using semantic knowledge, we constrain the Iterative Closest Point (ICP) in order to build semantic keyframes as well as to align them both spatially and temporally. Hierarchical clustering of ICP-generated transformations is then used to both eliminate outliers and find alignment consensus, followed by an optimization scheme based on a factor graph that includes loop closure. Data was captured using a portable robotic sensor suite consisting of three cameras, three-dimensional lidar, and an inertial navigation system. The data was acquired in monthly intervals over 12 months, by revisiting the same trajectory between August 2020 and July 2021. Finally, it has been shown that it is possible to align monthly surveys, taken over a year using the conceived sensor suite, and to provide insightful metrics for change evaluation in natural environments.

#### Seminar No. 51

Lenka ZdeborováUnderstanding machine learning via exactly solvable modelsÉcole polytechnique fédérale de Lausanne (EPFL), Switzerland Wednesday 2021-09-29 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Video**

https://youtu.be/KrNDnwYXUjc

**Abstract**

Inspired by physics, where simple models are at the core of our theoretical understanding of the physical world, we study simple models of neural networks to clarify some of the open questions surrounding learning with neural networks. In this talk, I will describe some of our recent progress in this direction.

#### Seminar No. 50 (online)

Lucas DissonAnalysis of Molecular Dynamic Simulations for Alzheimer’sDisease Research using VAMPnet Neural Networks CIIRC CTU, Prague, CZ / ENS, Lyon, FR Thursday 2021-08-26 at 10:00Online via Zoom |

**Abstract**

Proteins provide essential functions to living organisms. Some disordered proteins do not fold into a unique and stable 3-dimensional structure; instead they tend to partially fold and unfold in a variety of conformations. Hence, the knowledge of the associated dynamics is key to understand and control the function of such proteins. Markov State Models are a popular method to analyse protein dynamics. They consist in clustering the conformational space of the protein into a few Markov states, such that the dynamics of the protein can be faithfully summarised by the transitions between these Markov states. Recent methods have leveraged the approximation power of deep neural networks to estimate better Markov State Models. We proposed to use these methods to analyse the effect of a drug targeting the aggregation of the protein amyloid beta 42 – a hallmark and possible cause of Alzheimer’s disease.

#### Seminar No. 49 (online)

Yann LabbéPose estimation of rigid objects and articulated robotsINRIA/ENS, Paris, FR Thursday 2021-04-08 at 10:00Online via Zoom: video |

**Abstract**

Accurately recovering the poses of multiple objects and robots in non-instrumented environments is an important problem to grant autonomous systems the ability to solve real tasks in-the-wild, especially in the context of collaborative robotics. In this talk, I will present our recent works on object and robot pose estimation from one or multiple uncalibrated RGB cameras. First, I will present CosyPose, our state-of-the-art method for single-view 6D pose estimation of rigid objects which won the BOP challenge at ECCV 2020. Second, I will present our multi-view approach that is designed to address the limitations inherent to single-view pose estimation. This multi-view approach significantly improves robustness and accuracy and is able to automatically process noisy or incomplete visual information from multiple cameras into a complete scene interpretation in near real time. Third, I will present our latest work on RoboPose, a method for recovering the 6D pose and the joint angles of an articulated robot from a single RGB image. Our method significantly improves the state-of-the-art for multiple commonly used robotic manipulators. It opens-up many exciting applications in visually guided manipulation or collaborative robotics without fiducial markers or time-consuming hand-eye calibration.

This talk is based on the following papers:

[1] Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic. CosyPose: Consistent multi-view multi-object 6D pose estimation. ECCV 2020.

[2] Yann Labbé, Justin Carpentier, Mathieu Aubry, Josef Sivic. Single-view robot pose and joint angle estimation via render & compare. CVPR 2021 (Oral).

#### Seminar No. 48 (online)

Angel Villar CorralesPose Based Image Retrieval in Greek Vase PaintingsFAU Erlangen-Nuremberg, Erlangen, Germany Thursday 2020-10-22 at 15:00Online via Zoom |

**Abstract**

Digital art collections are rapidly growing due to ongoing efforts in digitization of art. Nevertheless, navigation within these libraries often becomes a challenging and time-consuming task. In our work, we address the problem of retrieval and discovery in large artwork collections, namely of Greek vase paintings. We introduce an approach for automated image retrieval based on human pose similarity. More precisely, given a particular human pose, pose estimation and image retrieval techniques are used to find from a (possibly huge) library all images that contain a person with similar posture. However, human pose estimation in Greek vase paintings is a demanding task. State-of-the-art pose estimation methods fail to generalize to that particular painting style, and there are no datasets available containing annotated Greek vase paintings. Using style transfer techniques to generate labeled images with the style of ancient Greek vase paintings, we improve the performance of state-of-the-art human detection and pose estimation algorithms for our particular data. Our experimental results show that our methods outperform state-of-the-art models for the task of human pose estimation in Greek vase paintings. Furthermore, we show how efficient pose-based image retrieval is possible in large databases.

#### AAG/IMPACT Workshop No. 47 (online)

Organized by Tomáš Kroupa and Tomáš PajdlaOptimization WorkshopCzech Technical University, Prague, CZ Wednesday 2020-10-21, 9:30-15:30Online via Zoom: video |

**Program**

9:30-10:00 Tomáš Werner – Relative-interior Rule in Block-coordinate Descent

10:15-10:45 Martin Kružík – Optimal control problems with oscillations, concentrations and discontinuities

11:00-11:30 Zuzana Kúkelová – Making minimal solvers fast

13:30-14:00 Viktor Korotynskiy – Using Monodromy to Simplify Minimal Problems in Computer Vision

14:15-14:45 Antonio Bellon – On the Properties of Trajectories of Solutions to Parametric Semidefinite Programming

15:00-15:30 Pavel Trutman – Globally Optimal Solution to Inverse Kinematics of 7DOF Serial Manipulator

#### Seminar No. 46 (online)

Bastien DechampsEfficient Camera Pose Verification Via Neural RenderingEcole des Ponts ParisTech, France Thursday 2020-10-01 at 11:00Online via Zoom: video |

**Abstract**

The recent visual localization pipeline InLoc proved that verifying the best estimated poses found by RANSAC has improved a lot the classification results. In this pose verification step, the 3D points are reprojected using the estimated camera pose to create a novel view of the scene, which is then compared with the query image. However, these renderings of the 3D scene lack realism and do not have the same semantic and appearance structure as the image which is to be localized. In this talk, we present some recent neural rendering techniques that enable generating realistic renderings of a scene from any viewpoint and under any appearance conditions. We use one of these approaches, Neural Rerendering in the Wild, to enhance the quality of InLoc renderings and improve the localization results.

#### Seminar No. 45 (online)

Evgeniy MartyushevFinding Polynomial Constraints by SamplingSouth Ural State University, Chelyabinsk, Russia Thursday 2020-08-06 at 11:00Online via Zoom: video |

**Abstract**

In this talk, I will address the implicitization problem of converting the parameterization for a certain algebraic variety into its defining polynomial equations. In many cases, the brute force implicitization approach, based on the Groebner basis computation, fails due to the complexity of underlying variety. In this situation, we can use an alternative sampling based method to solve the implicitization problem, at least in part. I will briefly describe the sampling method and consider several examples of applying it to finding polynomial constraints for such well-known entities from multiview geometry as essential matrix, compatible triplet of essential matrices and calibrated trifocal tensor.

#### Seminar No. 44 (online)

Jan TomešekVisual Geo-Localization in Natural EnvironmentsFIT Brno University of Technology, CZ Thursday 2020-06-25 at 11:00Online via Zoom |

**Abstract**

We will present our work-in-progress on localization of photographs captured in mountainous areas. In general, we approach this problem through deep learning and cast it as large-scale cross-modal image retrieval. In addition to standard challenges of general outdoor localization, such as changing appearance and scene geometry, there are other challenges in this scenario, such as large databases to be searched through and the lack of real photographs from natural environments to be used for training. We will present current results, starting from small synthetic experiments which provide some insight into the potential of localization in natural environments, followed by experiments using real photographs in combination with large databases.

#### Seminar No. 43 (online)

Michal VavreckaIncognite research group – an overviewCIIRC Czech Technical University, Prague, CZ Thursday 2020-05-21 at 11:00Online via Zoom: video |

**Abstract**

The Incognite Research Group focuses on cognitive robotics, especially machine learning based cognitive architectures to control robot. I will present basic overview of the projects we are involved in and also out latest scientific results. The presentation is divided into 4 sections:

1. Edutainment robot – adaptation of Alquist chatbot to Czech langugage and its implementation to humanoid robot.

2. Visual question answering – how to parse long question to a compositional chain of operators that will answer it. Example of interpretable neural module networks.

3. Robotic manipulation based on intrinsic motivation – how to train manipulation tasks without any supervision from environment. Example of intrinsic rewards and goals.

4. Ciircgym – our environment for training robotic tasks in virtual simulator.

#### Seminar No. 42 (online)

Hugo CisnerosEvolving Structures in Complex SystemsCIIRC CTU, Prague, CZ & INRIA/ENS, Paris, FR Thursday 2020-05-07 at 11:00Online (video) |

**Abstract**

Open-ended evolution is regarded as a promising way of solving complex tasks and could transform our idea of artificial intelligence. Complex systems with emergent properties of increasing complexity are a possible way of achieving this goal. The talk is about constructing a metric for measuring growth of complexity of emerging patterns in a particular class of complex systems: cellular automata. Approaches based on compression algorithms and artificial neural networks are investigated. With the metric, we were able to automatically construct computational models with properties similar to those found in Conway’s Game of Life, as well as many other emergent phenomena (IEEE SSCI, 2019). We further investigate the case of large-scale cellular automata; thanks to our reduction techniques that help visualize complex computations within those large systems, we identify interesting emergent behaviors at multiple scales (unpublished, submitted to ALife 2020).

#### Seminar No. 41 (online)

Michal RolinekDifferentiation of Blackbox Combinatorial SolversMax Planck Institute for Intelligent Systems, Germany Thursday 2020-04-23 at 11:00Online video |

**Abstract**

Achieving fusion of deep learning with combinatorial algorithms promises transformative changes to artificial intelligence. One possible approach is to introduce combinatorial building blocks into neural networks. Such end-to-end architectures have the potential to tackle combinatorial problems on raw input data such as ensuring global consistency in multi-object tracking or route planning on maps in robotics. We present a method that implements an efficient backward pass through blackbox implementations of combinatorial solvers with linear objective functions. We provide both theoretical and experimental backing. In the talk, we will cover the description of the method including initial synthetic experiments (ICLR 2020), as well as two follow-ups; one on rank-based loss functions (may appear at CVPR 2020) and another regarding deep graph matching for keypoint correspondence.

#### Seminar No. 40 (online)

Tomas PajdlaLex Fridman: Deep Learning State of the Art 2020CIIRC Czech Technical University, Prague, CZ Thursday 2020-04-16 at 11:00Online |

**Abstract**

AAG/IMPACT Discussion of Salient Moments from “Lex Fridman: Deep Learning State of the Art 2020“.

#### Seminar No. 39 (online)

Luis Gomez CamaraTowards Robust Long-term Autonomous Navigation of Robots in Challenging Environments: A Visual Place Recognition Deep Learning ApproachCIIRC Czech Technical University, Prague, CZ Thursday 2020-04-09 at 11:00Online video |

**Video**

mp4

**Abstract**

We present a visual place recognition (VPR) pipeline that achieves substantially better precision as compared with those commonly appearing in the literature. It is based on a standard image retrieval configuration, with an initial stage that shortlists the closest database candidates to a query image and a second stage where the list of candidates is re-ranked. The latter is realized by the introduction of a novel geometric verification procedure based on the activations of a pre-trained Convolutional Neural Network and is both simple and very robust to viewpoint and condition changes. As a stand-alone, spatial matching methodology, it could be easily added and used to enhance existing VPR approaches whose output is a ranked list of candidates. The VPR pipeline has been implemented in a teach-and-repeat navigation system to both localize and control the steering of a robot. Indoor test show a maximum error of less than 10 cm and excellent robustness to perturbations such as drastic changes in illumination, lateral displacements, different starting positions or even kidnapping.

#### Seminar No. 38 (online)

Kateryna ZorinaReading Group on Policy Gradient Methods in Reinforcement LearningCIIRC Czech Technical University, Prague, CZ Thursday 2020-04-02 at 11:00Online via Zoom |

**Reading materials for the reading group**

Vanilla policy gradient

TRPO

PPO

#### Seminar No. 37 (online)

Kathlén KohnMinimal Problems in Computer VisionKTH Royal Institute of Technology, Stockholm Tuesday 2020-03-31 at 17:00Online via Zoom |

**Abstract**

We present a complete classification of minimal problems for generic arrangements of points and lines in space observed partially by three calibrated perspective cameras when each line is incident to at most one point. This is a large class of interesting minimal problems that allows missing observations in images due to occlusions and missed detections. There is an infinite number of such minimal problems; however, we show that they can be reduced to 140616 equivalence classes by removing superfluous features and relabeling the cameras. We also introduce camera-minimal problems, which are practical for designing minimal solvers, and show how to pick a simplest camera-minimal problem for each minimal problem. This simplification results in 74575 equivalence classes. Only 76 of these were known; the rest are new. In order to identify problems that have potential for practical solving of image matching and 3D reconstruction, we present several smaller natural subfamilies of camera-minimal problems as well as compute solution counts for all camera-minimal problems which have less than 300 solutions for generic data.

**Paper**

Timothy Duff, Kathlén Kohn, Anton Leykin, Tomas Pajdla: PL1P — Point-line Minimal Problems under Partial Visibility in Three Views

#### Seminar No. 36

Luca MagriMultiple structure recovery via clustering in preference spacePolitecnico di Milano, Italy Tuesday 2020-01-21 at 14:30CIIRC Room B-670 (building B, floor 6) |

**Abstract**

Many tasks in empirical sciences can be formulated in terms of robust estimation of multiple parametric models that fit data corrupted by noise and outliers. Typical examples can be found in 3D reconstruction, where multi-model fitting is employed either to estimate multiple rigid moving objects, or to produce intermediate geometric interpretations of reconstructed 3D point clouds.

Other scenarios include face clustering, body-pose estimation and motion segmentation, just to name a few. In all these cases, this turns to be a thorny problem since it is necessary to overcome a “chicken-&-egg dilemma”: to estimate models one needs to first partition the data, and to partition the data it is necessary to know which model points belong to. According to which horn of this dilemma is addressed first, two main approaches can be singled out, namely consensus and preference analysis.

Consensus-based algorithms put the emphasis on the estimation part and the focus is on finding models that describe as many points as possible. In contrast, preference approaches concentrate on the segmentation side, and are aimed at finding a proper partition of the data in meaningful structures.

In this talk, we will see how the change of perspective from consensus to preference allows to derive a conceptual space where the multi model fitting task can be conveniently formulated as a clustering problem.

#### Seminar No. 35

Pavel TrutmanGlobally Optimal Solution to Inverse Kinematics of 7DOF Serial ManipulatorCIIRC Czech Technical University, Prague, CZ Tuesday 2020-01-14 at 13:45CIIRC Room B-670 (building B, floor 6) |

**Abstract**

The Inverse Kinematics (IK) problem is to find robot control parameters to bring it into the desired position under the kinematics and collision constraints. We present a global solution to the optimal IK problem for a general serial 7DOF manipulator with revolute joints and a quadratic polynomial objective function. We show that the kinematic constraints due to rotations can all be generated by second degree polynomials. This is important since it significantly simplifies further step where we find the optimal solution by Lasserre relaxations of a non-convex polynomial systems. We demonstrate that the second relaxation is sufficient to solve 7DOF IK problem. Our approach is certifiably globally optimal. We demonstrate the method on 7DOF KUKA LBR IIWA manipulator and show that we are able to compute the optimal IK or certify in-feasibility in 99.9% tested poses.

#### Seminar No. 34

Kathlén KohnPoint-Line Minimal Problems for 3 Cameras with Partial VisibilityKTH Stockholm, Sweden Thursday 2019-12-05 at 11:00CIIRC Room B-671 (building B, floor 6) |

**Abstract**

We present a complete classification of all minimal problems for generic arrangements of points and lines observed by three calibrated perspective cameras. Our classification includes all possible settings where some of the cameras only see some of the available points and lines. Using basic tools from algebraic geometry, we first find 143494 candidates for minimal point-line problems. Afterwards we determine algorithmically which of these candidates are in fact minimal. We find that only 5707 of our candidates are non-minimal. Hence, we conclude that there are 137787 point-line minimal problem for three calibrated cameras. For each minimal problem we aim to compute the generic number of solutions as it captures the difficulty of the problem at hand. This is joint work with Timothy Duff, Anton Leykin, and Tomas Pajdla.

#### Seminar No. 33

Martin BråtelundCritical loci for reconstruction from two viewsUniversity of Oslo, Norway Wednesday 2019-12-04 at 11:00CIIRC Room B-671 (building B, floor 6) |

**Abstract**

In general, when given a set of images and sufficiently many point correspondences, it’s possible to reconstruct the 3D object uniquely from these images. There are, however, some cases where such a reconstruction is not unique, these are called critical configurations. We will show that all critical configurations consist of cameras and points lying on ruled quadric surfaces, and give a classification of all critical configurations for two cameras. We will also show how the different possible reconstructions are related. This work is largely based on previous work by Richard Hartley and Fredrik Kahl.

#### Seminar No. 32

Akihiro SugimotoRevisiting Depth Image Fusion with Variational Message PassingNational Institute of Informatics (NII), Japan Tuesday 2019-10-01 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

The running average approach has long been perceived as the best choice for fusing depth measurements captured by a consumer-grade RGB-D camera into a global 3D model. This strategy, however, assumes exact correspondences between points in a 3D model and points in the captured RGB-D images. Such assumption does not hold true in many cases because of errors in motion tracking, noise, occlusions, or inconsistent surface sampling during measurements. Accordingly, reconstructed 3D models suffer unpleasant visual artifacts. In this talk, we visit the depth fusion problem from a probabilistic viewpoint and formulate it as a probabilistic optimization using variational message passing in a Bayesian network. Our formulation enables us to fuse depth images robustly, accurately, and fast for high quality RGB-D keyframe creation, even if exact point correspondences are not always available. Our formulation also allows us to smoothly combine depth and color information for further improvements without increasing computational speed. The quantitative and qualitative comparative evaluation on built keyframes of indoor scenes show that our proposed framework achieves promising results for reconstructing accurate 3D models.

#### Seminar No. 31 (internal)

Ekaterina ZorinaLearning from demonstrationsUkrainian Catholic University, Lviv, Ukraine Friday 2019-09-13 at 10:30CIIRC Room B-633 (building B, floor 6) |

**Abstract**

In this talk, I will present work that we did for learning from demonstration task. Our final goal is to learn the shoveling task (shovel sand into wheelbarrow) from video-demonstrations. We’ve been working on it for 2 weeks so far and for now we simplified video-demonstrations to virtual reality demonstrations. We learn the shovel movement policy with reinforcement learning. In the future we plan to control the shovel with robotic arm.

#### Seminar No. 30 (AAG/IMPACT Computer Vision / Algebraic Geometry week 26-30 Aug 2019)

Margaret ReganImage Reconstruction Using Numerical Algebraic GeometryUniversity of Notre Dame, IN, USA Friday 2019-08-30 at 10:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

Many problems in computer vision can be formulated using a parameterized system of polynomials which must be solved quickly and efficiently for given instances of the parameters. We propose a new numerical algebraic geometric method to efficiently solve these systems. First, our new approach uses locally adaptive methods and sparse matrix calculations to solve parameterized overdetermined systems in projective space. Examples will be provided in 2D image reconstruction to compare the new methods with traditional approaches in numerical algebraic geometry. Second, we propose new homotopy continuation methods for solving two minimal trifocal calibrated relative pose problems defined by point and line correspondences, which appear together, e.g., in urban scenes or observing curves. Simulations and comparisons will be shown using real and synthetic data to demonstrate that challenging scenes can be reconstructed where standard methods fail.

#### Seminar No. 29 (AAG/IMPACT Computer Vision / Algebraic Geometry week 26-30 Aug 2019)

Viktor LarssonGeometric Estimation with Radial DistortionETH Zurich, CH Thursday 2019-08-29 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

While many modern cameras can be approximated as a pinhole camera, more complicated models are necessary to achieve higher quality reconstruction and localization results. For most cameras the modeling error is dominated by radial distortion. This distortion is a non-linear warping of the image plane and this extra non-linearity makes geometric estimation problems more difficult. In this talk I will present two recent papers about dealing with radial distortion in geometric vision. We will look at two fundamental problems; two-view triangulation and absolute camera pose estimation. I will also briefly present some currently ongoing work related to calibrated radial multiple-view geometry.

#### Seminar No. 28 (AAG/IMPACT Computer Vision / Algebraic Geometry week 26-30 Aug 2019)

Andrew PryhuberIdeals of the Multiview VarietyUniversity of Washington, Seattle, WA, USA Tuesday 2019-08-27 at 15:30CIIRC Room B-670 (building B, floor 6) |

**Abstract**

The problem of reconstructing a 3D point from image data can be posed as minimizing Euclidean distance to the set of all images. We describe all polynomials that vanish on the space of images and relate them to the well-known bifocal and trifocal constraints. We briefly discuss recent work which imposes inequality constraints to force imaged points to have positive depth in each camera.

#### Seminar No. 27 (AAG/IMPACT Computer Vision / Algebraic Geometry week 26-30 Aug 2019)

Torsten SattlerDomain Adaptation vs. Semantic Invariance for Long-Term Visual LocalizationChalmers University of Technology, Gothenburg, Sweden Monday 2019-08-26 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Presentation**

Slides (pdf)

**Abstract**

Visual localization is the problem of estimating the 6 degree-of-freedom camera pose from which a given image was taken with respect to the scene. Localization is an important subsystem in interesting Computer Vision / AI applications such as autonomous robots (self-driving cars, drones, etc.) and Augmented / Mixed / Virtual Reality. Long-term visual localization deals with the problem that both the appearance and the geometry of scenes changes over time. For example, furniture in indoor scenes is moved around and vegetation in outdoor scenes changes significantly with the seasons. A direct result of the dynamic nature of real-world scenes is that scene representations quickly become outdated. The issue with outdated scene representations is that it is hard to associate current images with data stored in the scene representations. Yet, such data associations are required for successful camera pose estimation. In this paper, we explore two paradigms that can be used to localize images under strong changes in the viewing conditions: 1) Rather than designing representations, e.g., in the form of local features, that are invariant / robust to changes in the scene, domain adaption (for example in the form of generative neural networks) can be used to transform the current viewing conditions to a state close to the conditions under which the scene representation was constructed. 2) Rather than trying to predict how scenes evolve over time, invariant representations try to encapsulate the gist of a scene. In this talk, we consider semantic invariance, i.e., a fact that the semantic meaning of a scene should be invariant to seasonal and illumination changes. We will discuss how both approaches can be used as part of visual localization systems and discuss our current work in both directions.

#### Seminar No. 26

Teven Le ScaoNeural Differential Equations for image super-resolutionCarnegie Mellon University, Pittsburgh, PA, USA Thursday 2019-08-22 at 11:00CIIRC Room B-670 (building B, floor 6) |

**Abstract**

Neural Differential Equations are a machine learning framework that aims to combine numerical efficiency techniques from differential equation solvers with the flexibility of deep learning. Although they’ve recently received a lot of interest both in the machine learning and physical sciences communities, they’ve so far only been tested on toy problems. Image super-resolution is an interesting use case of this technique as it attempts to continuously transform the input signal ; we compare classical and differential systems on that task to gauge the potential of that technique, and study the impact of a few optimisation tricks for the differential one.

#### Seminar No. 25

Timothy DuffIntro to homotopy continuation with a view towards minimal problemsGeorgia Institute of Technology, GA, USA Thursday 2019-07-18 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Homotopy continuation is a versatile numerical method for solving polynomial systems of equations. It is the main subroutine in a field known as numerical algebraic geometry, which aims to describe the solution set of an arbitrary system in terms of certain generic “witness points.” It also has an emerging role in several applications. I will introduce the “how” and “why” of homotopy continuation, survey specialized solution techniques, and briefly explain their use in the study of 3d relative pose reconstruction.

#### Seminar No. 24

Kathlén KohnPoint-Line Minimal Problems in Complete Multi-View VisibilityUniversity of Oslo, Norway Wednesday 2019-07-17 at 14:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

We present a complete classification of all minimal problems for generic arrangements of points and lines completely observed by calibrated perspective cameras. We show that there are only 30 minimal problems in total, no problems exist for more than 6 cameras, for more than 5 points, and for more than 6 lines. For all minimal problems discovered, we present their algebraic degrees, i.e. the number of solutions, which measure their intrinsic difficulty. Our classification shows that there are many interesting new minimal problems. Our results also show how exactly the difficulty of problems grows with the number of views. Importantly, we discovered several new minimal problems with small degrees that might be practical in image matching and 3D reconstruction. This is joint work with Timothy Duff, Anton Leykin, and Tomas Pajdla.

#### Seminar No. 23

Andrea FussieloSynchronisation: from pairwise measures to global valuesUniversity of Udine, Italy Tuesday 2019-07-16 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Presentation**

Slides (pdf)

**Abstract**

In this talk I will give an overview of some problems that take the following general form: given a graph where edge labels corresponds to noisy measures of the ratio of the unknown labels of adjacent vertices, find the vertex labels. These are called “synchronization” problems. I will focus in particular on instances that are relevant in the computer vision field, namely rotation synchronization and translation synchronisation. In the end I will also touch upon localisation from bearings and applications to structure from motion.

#### Seminar No. 22

Horia-Mihai BujancaSLAMBench 3.0: Benchmarking beyond traditional Visual SLAMThe University of Manchester, United Kingdom Monday 2019-05-06 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

As the SLAM research area matures and the number of SLAM systems available increases, the need for frameworks that can objectively evaluate them against prior work grows. This new version of SLAMBench moves beyond traditional visual SLAM, and provides new support for scene understanding and non-rigid environments (dynamic SLAM). More concretely for dynamic SLAM, SLAMBench 3.0 includes the first publicly available implementation of DynamicFusion, along with an evaluation infrastructure. In addition, we include two SLAM systems (one dense, one sparse) augmented with convolutional neural networks for scene understanding, together with datasets and appropriate metrics.

#### Seminar No. 21

Mohab Safey El DinPolar varieties, matrices and real root classificationPolSys (Polynomial Systems), INRIA / CNRS / Sorbonne Université, France Wednesday 2019-03-13 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Solving polynomial systems with parameters is a topical issue in effective real algebraic geoemtry. One challenging task is to compute semi-algebraic descriptions of areas of the parameters’space over which the number of real solutions to the input system is invariant. In this talk, I will present a new algorithm for solving this problem. It makes intensive use of combinatorial properties of Grobner bases combined with the notion of polar varieties. Joint work with J.-C. Faugère and P. Le.

#### Seminar No. 20

Simon TelenStabilized algebraic methods for numerical root findingKU Leuven, Belgium Tuesday 2019-03-05 at 14:00CIIRC Seminar Room B-641, floor 6 of Building B |

**Abstract**

We consider the problem of finding the isolated points defined by an ideal in a ring of (Laurent) polynomials with complex coefficients. Algebraic approaches for solving this use rewriting techniques modulo the ideal to reduce the problem to a univariate root finding or eigenvalue problem. We introduce a general framework for algebraic solvers in which it is possible to stabilize the computations in finite precision arithmetic. The framework is based on truncated normal forms (TNFs), which generalize Groebner and border bases. The stabilization comes from a ‘good’ choice of basis for the quotient algebra of the ideal and from compactification of the solution space.

#### Seminar R4I No. 3

Guillem AlenyaVision-Based Cloth Manipulation by Autonomous RobotsUniversitat Polytechnica de Catalunya, Spain Friday 2019-01-11 at 10:15CIIRC Seminar Room B-633, floor 6 of Building B |

**Abstract**

The Perception and Manipulation at IRI (Institute of Robotics) group focuses on enhancing the perception, learning, and planning capabilities of robots to achieve higher degrees of autonomy and user-friendliness during everyday manipulation tasks. Some topics addressed are the geometric interpretation of perceptual information, construction of 3D object models, action selection and planning, reinforcement learning, and teaching by demonstration. We will discuss challenges and current developments primarily in the inclusion of robots in everyday environments, and in the manipulation of textiles.

#### Seminar R4I No. 2

Kimotishi YamazakiVision-Based Cloth Manipulation by Autonomous RobotsShinshu University, Faculty of Engineering, Nagano, Japan Friday 2019-01-11 at 9:30CIIRC Seminar Room B-633, floor 6 of Building B |

**Abstract**

In this talk, we will introduce topics about manipulation of cloth by autonomous robots. Cloth is a deformable object, and its shape is drastically changed by adding manipulation. We will mainly explain sensor information processing, knowledge representation, and recognition methods to successfully manipulate such object.

#### Seminar No. 19

David FouheyUnderstanding how to get to places and do thingsUniversity of Michigan, MI, USA Thursday 2018-10-25 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Video**

https://www.youtube.com/watch?v=99Mep3Cw9-Q

**Abstract**

What does it mean to understand an image or video? One common answer in computer vision has been that understanding means naming things: this part of the image corresponds to a refrigerator and that to a person, for instance. While important, this ability is not enough: humans can effortlessly reason about the rich world that images depict and what they can do in it. For example, if a friend shows you the way to their kitchen for you to get something, they won’t worry that you’ll get lost walking back (navigation) or that you’d have trouble figuring out how to open their refrigerator or cabinets. While both are an ordinary feat for humans (or even a dog or cat), they are currently far beyond the abilities of computers.

In my talk, I’ll discuss my efforts towards bridging this gap. In the first part, I’ll discuss the task of navigation, getting from one place to another. In particular, our goal is to take a single demonstration of a path and retrace it, either forwards or backwards, under noisy actuation and a changing environment. Rather than build an explicit model of the world, we learn a network that attends to a sequence of memories in order to make decisions. In the second part, I will discuss how to scalably gather data of humans interacting with the world, resulting in a new dataset of human interactions, VLOG, as well as and what we can learn from this data.

**Bio**

David Fouhey is starting as an assistant professor at the University of Michigan in January 2019 and is currently a visitor at INRIA Paris. His research interests include computer vision and machine learning, with a particular focus on scene understanding. He received a Ph.D. in robotics in 2016 from Carnegie Mellon University where he was supported by NSF and NDSEG fellowships, and was then a postdoctoral fellow at UC Berkeley. He has spent time at the University of Oxford’s Visual Geometry Group and at Microsoft Research. More information is here: http://web.eecs.umich.edu/~fouhey/

#### Seminar No. 18

Yuriy KaminskyiSemantic segmentation for indoor localizationUkrainian Catholic University, Lviv, Ukraine Wednesday 2018-10-24 at 16:00CIIRC IMPACT Room B-641, floor 6 of Building B |

**Abstract**

The seminar will be a progress report on the ongoing indoor localization and navigation project. It will briefly cover the problem and its motivation. The main goal of the talk is to show different approaches to segmentation (both instance and semantic) and approaches that may help to improve the existing solutions. The talk will also cover different segmentation methods and present their results on the InLoc dataset.

#### Seminar No. 17

Viktor LarssonOrthographic-Perspective Epipolar Geometry, Optimal Trilateration and Non-Linear Variable Projection for Time-of-ArrivalLund University, Sweden Thursday 2018-08-16 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

In this three part talk, I will briefly discuss some current work in progress. The first topic relates to the epipolar geometry of one perspective camera and one orthographic, with applications in RADAR-to-Camera calibration. The second part is about position estimation using distances to known 3D points. Finally, I will discuss applying the variable projection method to the non-separable time-of-arrival problem. Preliminary experiments show greatly improved convergence compared to both joint and alternating optimization methods.

#### Seminar No. 16

Torsten SattlerChallenges in Long-Term Visual LocalizationETH Zurich, CH Tuesday 2018-07-24 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Visual localization is the problem of estimating the position and orientation from which an image was taken with respect to a 3D model of a known scene. This problem has important applications, including autonomous vehicles (including self-driving cars and other robots) and augmented / mixed / virtual reality. While multiple solutions to the visual localization problem exist both in the Robotics and Computer Vision communities for accurate camera pose estimation, they typically assume that the scene does not change over time. However, this assumption is often invalid in practice, both in indoor and outdoor environments. This talk thus briefly discusses the challenges encountered when trying to localize images over a longer period of time. Next, we show how a combination of 3D scene geometry and higher-level scene understanding can help to enable visual localization in conditions where both classical and recently proposed learning-based approaches struggle.

**Bio**

Torsten Sattler received a PhD in Computer Science from RWTH Aachen University, Germany, in 2013 under the supervision of Prof. Bastian Leibe and Prof. Leif Kobbelt. In December 2013, he joined the Computer Vision and Geometry Group of Prof. Marc Pollefeys at ETH Zurich, Switzerland, where he currently is a senior researcher and Marc Pollefeys’ deputy while Prof. Pollefeys is on leave from ETH. His research interests include (large-scale) image-based localization using Structure-from-Motion point clouds, real-time localization and SLAM on mobile devices and for robotics, 3D mapping, Augmented & Virtual Reality, (multi-view) stereo, image retrieval and efficient spatial verification, camera calibration and pose estimation. Torsten has worked on dense sensing for self-driving cars as part of the V-Charge project. He is currently involved in enabling semantic SLAM and re-localization for gardening robots (as part of a EU Horizon 2020 project where he leads the efforts on a workpackage), research for Google’s Tango project, where he leads CVG’s research efforts, and in work on self-driving cars.

#### Seminar No. 15

Alexei EfrosSelf-supervision, Meta-supervision, Curiosity: Making Computers Study HarderUC Berkeley, CA, USA Friday 2018-05-25 at 11:00CIIRC Seminar Room A-1001, floor 10 of Building A |

**Video**

https://www.youtube.com/watch?v=_V-WpE8cmpc

**Abstract**

Computer vision has made impressive gains through the use of deep learning models, trained with large-scale labeled data. However, labels require expertise and curation and are expensive to collect. Even worse, direct semantic supervision often leads the learning algorithms “cheating” and taking shortcuts, instead of actually doing the work. In this talk, I will briefly summarize several of my group’s efforts to combat this using self-supervision, meta-supervision, and curiosity — all ways of using the data as its own supervision. These lead to practical applications in image synthesis (such as pix2pix and cycleGAN), image forensics, audio-visual source separation, etc.

**Bio**

Alexei Efros is a professor of Electrical Engineering and Computer Sciences at UC Berkeley. Before 2013, he was nine years on the faculty of Carnegie Mellon University, and has also been affiliated with École Normale Supérieure/INRIA and University of Oxford. His research is in the area of computer vision and computer graphics, especially at the intersection of the two. He is particularly interested in using data-driven techniques to tackle problems where large quantities of unlabeled visual data are readily available. Efros received his PhD in 2003 from UC Berkeley. He is a recipient of the Sloan Fellowship (2008), Guggenheim Fellowship (2008), SIGGRAPH Significant New Researcher Award (2010), 3 Helmholtz Test-of-Time Prizes (1999, 2003, 2005), and the ACM Prize in Computing (2016).

**Web**

https://www.ciirc.cvut.cz/alexei-efros-na-ciirc/

#### Seminar No. 14

Di MengCameras behind glass – polynomial constraints on image projectionUniversity of Burgundy, France Tuesday 2018-05-22 at 11:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Advanced Driver Assistance System (ADAS) is used for autonomous car and self-driving cars. Cameras in the system are commonly used to achieve functions such as pedestrian detection, guideboard detection or obstacle avoidance ect. They are calibrated but equipped inside the car behind the windshield. The windshield has the effect of refracting light rays which can cause disparity when mapping the points in space onto the pixels in image. Little disparity in pixel wise would be large errors meters away. So that we model the camera with three shapes of windshield to get a more precise camera calibration for automotive application.

The talk is divided into two main sections. The first section presents the optical model of three types of glass slab, which are planar slab, non-parallel surfaces slab and spherical slab. Polynomial projection equations are formulated. The second section takes the camera parameters into account and provides the process how to map the points in space onto the pixels in image with windshield in between.

#### Seminar No. 13

Oles DobosevychOn Ukrainian characters MNIST datasetUkrainian Catholic University, Lviv, Ukraine Friday 2018-02-23 at 14:00CIIRC Seminar Room B-671, floor 6 of Building B |

**Abstract**

Modified National Institute of Standards and Technology dataset (MNIST dataset) of handwritten digits is the most known dataset that is widely used as a benchmark for validating various ideas in Machine Learning. We present a newly created dataset of 32 handwritten Ukrainian letters, which is divided into 72 different style subclasses, with 2000 examples in each class. We also suggest a recognition model for these symbols and explain why approaches working well for MNIST dataset do not succeed in our case. Finally, we discuss several real-world applications of our model that can help to save paper, time and money.

#### Seminar No. 12

**AIME@CZ – Czech Workshop on Applied Mathematics in Engineering**

Organized by Didier Henrion and Tomáš Pajdla

**Thursday 2018-02-22, 9:45-18:00**

CIIRC Seminar Room **B-670 (9:45-13:30) and B-671 (14:30-16:00)**, floor 6 of Building B

- 9:45-10:30 Francis Bach (Inria/ENS Paris, FR)
**Linearly-convergent stochastic gradient algorithms** - 11:00-11:45 Pierre-Yves Massé
**Online Optimisation of Time Varying Systems** - 12:15-13:00 Nicolas Mansard (LAAS-CNRS Univ. Toulouse, FR)
**Why we need a memory of movement** - 14:30-15:00 Josef Šivic (Inria/ENS Paris, FR and CIIRC CTU Prague, CZ)
**Joint Discovery of Object States and Manipulation Actions** - 15:15-15:45 Tomáš Werner (CTU Prague, CZ)
**Solving LP Relaxations of Some Hard Problems Is Hard** - 16:30-18:00
*Demos and visit of CIIRC*

Title: Linearly-convergent stochastic gradient algorithms |

Speaker: Francis Bach |

Title: Online Optimisation of Time Varying Systems |

Speaker: Pierre-Yves Massé |

Dynamical systems are a wide ranging framework which may model time varying settings, spanning from engineering (e.g., cars) to machine learning (e.g., recurrent neural networks), for instance. The correct behaviour of these systems is often dependent on the choice of a parameter (e.g., the gear ratio or the wheel in the case of cars, or the weights in the case of neural networks) which the user has to choose. Finding the best possible parameter is called optimising, or training, the system.Abstract:Many real life issues require this training to occur online, with immediate processing of the inputs received by the system (e.g. the returns about the surroundings of the sensors of a car, or the successive frames of a video fed to a neural network). We present a proof of convergence for classical online optimisation algorithms used to train these systems, such as the “Real Time Recurrent Learning” (RTRL) or “Truncated Backpropagation Through Time” (TBTT) algorithms. These algorithms avoid time consuming computations by storing information about the past, in the form of a time dependent tensor. However, the memory required to do so may be huge, preventing their use on even moderately large systems. The “No Back Track” (NBT) algorithm, and its implementation friendly “Unbiased Online Recurrent Optimisation” (UORO) variant are general principle algorithms which approximate the aforementioned tensor by a random, rank-one, unbiased tensor, thus decisively reducing the storage costs but preserving the crucial unbiasedness property allowing convergence. We prove that, with arbitrarily large propability, the NBT algorithm converges to the same local optimum as the RTRL or TBTT algorithms. We might conclude by quickly presenting the “Learning the Learning Rate” (LLR) algorithm, which adapts online the step size of a gradient descent, by conducting a gradient descent on this very step. It thus reduces the sensitivity of the descent to the numerical choice of the step size, which is a well documented practical implementation issue. |

Title: Why we need a memory of movement |

Speaker: Nicolas Mansard |

Title: Joint Discovery of Object States and Manipulation Actions |

Speaker: Josef Šivic (Inria/ENS Paris, FR and CIIRC CTU Prague, CZ) |

Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identify object states and to localize state-modifying actions. Our model is formulated as a discriminative clustering cost with constraints. We assume a consistent temporal order for the changes in object states and manipulation actions, and introduce new optimization techniques to learn model parameters without additional supervision. We demonstrate successful discovery of manipulation actions and corresponding object states on a new dataset of videos depicting real-life object manipulations. We show that our joint formulation results in an improvement of object state discovery by action recognition and vice versa.Abstract:Joint work with Jean-Baptiste Alayrac, Ivan Laptev and Simon Lacoste-Julien. |

Title: Solving LP Relaxations of Some Hard Problems Is Hard |

Speaker: Tomas Werner (CTU Prague, CZ) |

I will present our result that solving linear programming (LP) relaxations of a number of classical NP-hard combinatorial optimization problems (set cover/packing, facility location, maximum satisfiability, maximum independent set, multiway cut, 3-D matching, weighted CSP) is as hard as solving the general LP problem. Precisely, these LP relaxations are LP-complete under (nearly) linear-time reductions, assuming sparse encoding of instances. In polyhedral terms, this means that every polytope is a scaled coordinate projection of the optimal set of each LP relaxation, computable in (nearly) linear time. For some of the LP relaxations (exact cover, 3-D matching, weighted CSP), a stronger result holds: every polytope is a scaled coordinate projection of their feasible set, which implies that the corresponding reduction is approximation-preserving. Besides, the considered LP relaxations are P-complete under log-space reductions, therefore also hard to parallelize. These results pose a fundamental limitation on designing very efficient algorithms to compute exact or even approximate solutions to the LP relaxations, because finding such an algorithm might improve the complexity of the best known general LP solvers, which is unlikely.Abstract:Joint work with Daniel Prusa. |

#### Seminar R4I No. 1

Federica ArrigoniSynchronization Problems in Computer VisionUniversity of Udine, Italy Thursday 2017-12-07 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Consider a network of nodes where each node is characterized by an unknown state, and suppose that pairs of nodes can measure the ratio (or difference) between their states. The goal of “synchronization” is to infer the unknown states from the pairwise measures. Typically, states are represented by elements of a group, such as the Symmetric Group or the Special Euclidean Group. The former can represent local labels of a set of features, which refer to the multi-view matching application, whereas the latter can represent camera reference frames, in which case we are in the context of structure from motion, or local coordinates where 3D points are represented, in which case we are dealing with multiple point-set registration. A related problem is that of “bearing-only network localization” where each node is located at a fixed (unknown) position in 3-space and pairs of nodes can measure the direction of the line joining their locations. We are interested in global techniques where all the measures are considered at once, as opposed to incremental approaches that grow a solution by adding pieces iteratively.

#### Seminar No. 11

Antonín ŠulcLightfield Analysis for non-Lambertian ScenesUniversity of Konstanz, Germany Thursday 2017-11-23 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

In most natural scenes, we can see objects composed of non-Lambertian materials, whose appearance changes if we change viewpoint. In many computer vision tasks, we consider these as something undesirable and treat them as outliers. However, if we can record data from multiple dense viewpoints, such as with a light-field camera, we have a chance to not only deal with them but also extract additional information about the scene.

In this talk, I will show the capabilities of the light-field paradigm on various problems. Key ideas are a linear algorithm for structure from motion to generate refocusable panoramas and depth estimation for multi-layered objects which are semitransparent or partially reflective. Using these, I will show that we can decompose such scenes and further perform a robust volumetric reconstruction. Finally, I will consider decomposition of light fields into reflectance, natural illumination and geometry, a problem known as inverse rendering.

#### Seminar No. 10

Jana KošeckáSemantic Understanding for Robot PerceptionGeorge Mason University, Fairfax, VA, USA Monday 2017-10-30 at 16:00 Czech Technical University, Karlovo namesti, G-205 |

**Abstract**

Advancements in robotic navigation, mapping, object search and recognition rest to a large extent on robust, efficient and scalable semantic understanding of the surrounding environment. In recent years we have developed several approaches for capturing geometry and semantics of environment from video, RGB-D data, or just simply a single RGB image, focusing on indoors and outdoors environments relevant for robotics applications.

I will demonstrate our work on detailed semantic parsing and 3D structure recovery using deep convolutional neural networks (CNNs) and object detection and object pose recovery from single RGB image. The applicability of the presented techniques for autonomous driving, service robotics, mapping and augmented reality applications will be discussed.

#### Seminar No. 9

Mircea CimpoiDeep Filter Banks for Texture RecognitionCIIRC, Czech Technical University, Prague Thursday 2017-10-19 at 11:00 CIIRC Seminar Room A-303, floor 3 of Building A |

**Abstract**

This talk will be about texture and material recognition from images, and revisiting classical texture representations in the context of deep learning. The results were presented in CVPR 2015 and IJCV 2016. Visual textures are ubiquitous and play an important role in image understanding because they convey significant semantics of images, and because texture representations that pool local image descriptors in an order-less manner have had a tremendous impact in various practical applications. In the talk, we will revisit classic texture representations, including bag-of-visual-words and the Fisher vectors, in the context of deep learning and show that these have excellent efficiency and generalization properties if the convolutional layers of a deep model are used as filter banks. We obtain in this manner state-of-the-art performance in numerous datasets well beyond textures, an efficient method to apply deep features to image regions, as well as benefit in transferring features from one domain to another.

**References**

[1] Cimpoi, M., Maji, S., Kokkinos, I., Vedaldi, A., Deep Filter Banks for Texture Recognition, Description, and Segmentation, IJCV (2016) 118:65

[2] Cimpoi, M., Maji, S., and Vedaldi, A., Deep Filter Banks for Texture Recognition and Segmentation, CVPR (2015)

#### Seminar No. 8

Wolfgang FörstnerEvaluation of Estimation Results within Structure from Motion ProblemsUniversity of Bonn, Germany Wednesday 2017-10-18 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Parameter estimation is the core of many geometric problems within structure from motion. The number of parameters ranges from a few, e.g., for pose estimation or triangulation, to huge numbers, such as in bundle adjustment or surface reconstruction. The ability for planning, self-diagnosis, and evaluation is critical for successful project management. Uncertainty of observed and estimated quantities needs to be available, faithful, and realistic. The talk presents methods (1) to critically check the faithfulness of the result of estimation procedures, (2) to evaluate suboptimal estimation procedures, and (3) to evaluate and compare competing procedures w.r.t. their precision in the presence of rank deficiencies. Evaluating bundle adjustment results is taken as one example problem.

**References**

[1] T. Dickscheid, T. Läbe, and W. Förstner, Benchmarking Automatic Bundle Adjustment Results, in 21st Congress of the International Society for Photogrammetry and Remote Sensing (ISPRS), Beijing, China, 2008, p. 7–12, Part B3a.

[2] W. Förstner and K. Khoshelham, Efficient and Accurate Registration of Point Clouds with Plane to Plane Correspondences, in 3rd International Workshop on Recovering 6D Object Pose, 2017.

[3] W. Förstner and B. P. Wrobel, Photogrammetric Computer Vision — Statistics, Geometry, Orientation and Reconstruction, Springer, 2016.

[4] T. Läbe, T. Dickscheid, and W. Förstner, On the Quality of Automatic Relative Orientation Procedures, in 21st Congress of the International Society for Photogrammetry and Remote Sensing (ISPRS), Beijing, China, 2008, p. 37–42 Part B3b-1.

#### Seminar No. 7

Ludovic MagerandProjective Structure-from-Motion and Rolling Shutter Pose EstimationCIIRC, Czech Technical University, Prague Tuesday 2017-10-17 at 11:00 CIIRC Seminar Room A-303, floor 3 of Building A |

**Abstract**

This talk is divided in two parts, the first one will be a presentation of an ICCV’17 paper about a practical solution to the Projective Structure from Motion (PSfM) problem able to deal efficiently with missing data (up to 98%), outliers and, for the first time, large scale 3D reconstruction scenarios. This is achieved by embedding the projective depths into the projective parameters of the points and views to improve computational speed. To do so and to ensure a valid reconstruction, an extension of the linear constraints from the Generalized Projective Reconstruction Theorem is used. With an incremental approach, views and points are added robustly to an initial solvable sub-problem until completion of the underlying factorization.

The second part of the talk will presents my PhD thesis “Dynamic pose estimation with CMOS cameras using sequential acquisition”. CMOS cameras are cheap and can acquire images at very high frame rate thanks to an acquisition mode called Rolling Shutter which sequentially expose the scan-line. This makes them very interesting in the context of very high-speed robotic but it comes with what was long seen as a drawback: when an object (or the camera itself) moves in the scene, distortions appear in the image. These rolling shutter effects actually contain information on the motion and can become another advantage for high-speed robotic by extending the usual pose estimation to also estimate the motion parameters. Two methods achieving this will be presented, one assumes a non-uniform motion model and the second one a projection model suitable for polynomial optimization.

#### Seminar No. 6

Akihiro SugimotoDeeply Supervised 3D Recurrent FCN for Salient Object Detection in VideosNational Institute of Informatics (NII), Japan Monday 2017-09-25 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

This talk presents a novel end-to-end 3D fully convolutional network for salient object detection in videos. The proposed network uses 3D filters in the spatio-temporal domain to directly learn both spatial and temporal information to have 3D deep features, and transfers the 3D deep features to pixel-level saliency prediction, outputting saliency voxels. In the network, we combine the refinement at each layer and deep supervision to efficiently and accurately detect salient object boundaries. The refinement module recurrently enhances to learn contextual information into the feature map. Applying deeply-supervised learning to hidden layers, on the other hand, improves details of the intermediate saliency voxel, and thus the saliency voxel is progressively refined to become finer and finer. Intensive experiments using publicly available benchmark datasets confirm that our network outperforms state-of-the-art methods. The proposed saliency model also effectively works for video object segmentation.

#### Seminar No. 5

Viktor LarssonBuilding Polynomial Solvers for Computer Vision ApplicationsLund University, Sweden Thursday 2017-08-31 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

In the first part of the talk, I will give a brief overview of how polynomial equation systems are typically solved in Computer Vision. These equation system often come from minimal problems, which are fundamental building blocks in most Structure-from-Motion pipelines.

In the second part, I will present two recent papers on methods for constructing polynomial solvers. The first paper is about automatically generating the socalled elimination templates. The second paper extends the method to also handle saturated ideals. This allows us to essentially add additional constraints that some polynomials should be non-zero. Both papers are joint work with Kalle Åström and Magnus Oskarsson.

**References**

[1] Larsson V., Åström K, Oskarsson M., Efficient Solvers for Minimal Problems by Syzygy-Based Reduction, (CVPR), 2017. [http://www.maths.lth.se/matematiklth/personal/viktorl/papers/larsson2017efficient.pdf]

[2] Larsson V., Åström K, Oskarsson M., Polynomial Solvers for Saturated Ideals, (ICCV), 2017.

#### Seminar No. 4

Tomas MikolovNeural Networks for Natural Language ProcessingFacebook AI Research Friday 2017-08-25 at 11:00 CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract**

Artificial neural networks are currently very successful in various machine learning tasks that involve natural language. In this talk, I will describe recurrent neural network language models, as well as their most frequent applications to speech recognition and machine translation. I will also talk about distributed word representations, their interesting properties, and efficient ways how to compute them and use in tasks such as text classification. Finally, I will describe our latest efforts to create a novel dataset that could be used to develop machines that can truly communicate with human users in natural language.

**Short bio**:

Tomáš Mikolov is a research scientist at Facebook AI Research group since 2014. Previously he has been member of the Google Brain team, where he developed and implemented efficient algorithms for computing distributed representations of words (the word2vec project). He obtained his PhD from the Brno University of Technology in 2012 for his work on recurrent neural network-based language models (RNNLM). His long term research goal is to develop intelligent machines that people can communicate with and use to accomplish complex tasks.

#### Seminar No. 3

**Workshop on learnable representations for geometric matching**

Monday 2017-08-21, 14:00-18:00

CIIRC Seminar Room B-670, floor 6 of Building B

- 14:00-15:00 Torsten Sattler (ETH Zurich) Hard Matching Problems in 3D Vision
- 15:00-15:20 Coffee break, discussion
- 15:20-16:20 Eric Brachmann (TU Dresden) Scene Coordinate Regression: From Random Forests to End-to-End Learning
- 16:20-16:40 Coffee break, discussion
- 16:40-17:40 Ignacio Rocco (Inria) Convolutional neural network architecture for geometric matching
- 17:40-18:00 Discussion

Speaker: Torsten Sattler, ETH Zurich |

Title: Hard Matching Problems in 3D Vision |

Abstract: Estimating correspondences, i.e., data association, is a fundamental step of each 3D Computer Vision pipeline. For example, 2D-3D matches between pixels in an image and 3D points in a 3D scene model are required for camera pose computation and thus for visual localization. Existing approaches for correspondence estimation, e.g., based on local image descriptors such as SIFT, have been shown to work well for a range of viewing conditions. Still, existing solutions are rather limited in challenging scenes. This talk will focus on data association in challenging scenarios. We first discuss the impact of day-night changes on visual localization, demonstrating that state-of-the-art algorithms perform severely worse compared to the day-day scenario typically considered in the literature. Next, we discuss ongoing work aiming at boosting the performance of local descriptors in this scenario via a dense-sparse feature detection and matching pipeline. A key idea in this work is to use pre-trained convolutional neural networks to obtain descriptors that contain mid-level semantic information compared to the low-level information utilized by SIFT. Based on the intuition that semantic information provides a higher form of invariance, the second part of the talk considers exploiting semantic (image) segmentations in the context of visual localization and visual SLAM. |

Speaker: Eric Brachmann, TU Dresden |

Title: Scene Coordinate Regression: From Random Forests to End-to-End Learning |

Abstract: For decades, estimation of accurate 6D camera poses relied on hand-crafted sparse feature pipelines and geometric processing. Motivated by recent successes, some authors ask the question whether camera localization can be cast as a learning problem. Despite some success, the accuracy of unconstrained CNN architectures trained for this task is still inferior compared to traditional approaches. In this talk, we discuss an alternative line of research, which tries to combine geometric processing with constrained machine learning in the form of scene coordinate regression. We discuss how random forests or CNNs can be trained to substitute sparse feature detection and matching. Furthermore, we show how to train camera localization pipelines end-to-end using a novel, differentiable formulation of RANSAC. We will close the talk with some thoughts about open problems in learning camera localization. |

Speaker: Ignacio Rocco, Inria Paris |

Title: Convolutional neural network architecture for geometric matching |

Abstract: We address the problem of determining correspondences between two images in agreement with a geometric model such as an affine or thin-plate spline transformation, and estimating its parameters. The contributions of this work are three-fold. First, we propose a convolutional neural network architecture for geometric matching. The architecture is based on three main components that mimic the standard steps of feature extraction, matching and simultaneous inlier detection and model parameter estimation, while being trainable end-to-end. Second, we demonstrate that the network parameters can be trained from synthetically generated imagery without the need for manual annotation and that our matching layer significantly increases generalization capabilities to never seen before images. Finally, we show that the same model can perform both instance-level and category-level matching giving state-of-the-art results on the challenging Proposal Flow dataset. |

#### Seminar No. 2

Joe Kileel.Princeton Using Computational Algebra for Computer VisionWednesday 2017-06-07 at 15:00CIIRC Seminar Room B-670, floor 6 of Building B |

**Abstract
**Scene reconstruction is a fundamental task in computer vision: given multiple images from different angles, create a 3D model of a world scene. Nowadays self-driving cars need to do 3D reconstruction in real-time, to navigate their surroundings. Large-scale photo-tourism is also a popular application. In this talk, we will explain how key subroutines in reconstruction algorithms amount to solving polynomial systems, with special geometric structure. We will answer a question of Sameer Agarwal (Google Research) about recovering the motion of two calibrated cameras. Next, we will quantify the “algebraic complexity” of polynomial systems arising from three calibrated cameras. In terms of multi-view geometry, we deal with essential matrices and trifocal tensors. The first part applies tools like resultants from algebra, while the second part will offer an introduction to numerical homotopy continuation methods. Those wondering “if algebraic geometry is good for anything practical” are especially encouraged to attend.

**References**

[1] G. Floystad, J. Kileel, G. Ottaviani: “The Chow form of the essential variety in computer vision,” J. Symbolic Comput., to appear. [https://arxiv.org/pdf/1604.04372]

[2] J. Kileel: “Minimal problems for the calibrated trifocal variety,” SIAM Appl. Alg. Geom., to appear. [https://arxiv.org/pdf/1611.05947]

#### Seminar No. 1

Torsten Sattler. Camera LocalizationETH ZurichThursday 2017-05-11 at 11:00 CIIRC Lecture Hall A1001 of Building A (Jugoslavskych partyzanu 3) |

**Abstract**

Estimating the position and orientation of a camera in a scene based on images is an essential part of many (3D) Computer Vision and Robotics algorithms such as Structure-from-Motion, Simultaneous Localization and Mapping (SLAM), and visual localization. Camera localization has applications in navigation for autonomous vehicles/robots, Augmented and Virtual Reality, and 3D mapping. Furthermore, there are strong relations to camera calibration and visual place recognition. In this talk, I will give an overview over past and current efforts on robust, efficient, and accurate camera localization. I will begin the talk showing that classical localization approaches haven’t been made obsolete by deep learning. Following a local feature-based approach, the talk will discuss how to adapt such methods for real-time visual localization on mobile devices with limited computational capabilities and approaches that scale to large (city-scale) scenes, including the challenges encountered at large-scale. The final part of the talk will discuss open problems in the areas of camera localization and 3D mapping, both in terms of problems we are currently working on as well as interesting long-term goals.

**Short bio**:

Torsten Sattler received a PhD in Computer Science from RWTH Aachen University, Germany, in 2013 under the supervision of Prof. Bastian Leibe and Prof. Leif Kobbelt. In December 2013, he joined the Computer Vision and Geometry Group of Prof. Marc Pollefeys at ETH Zurich, Switzerland, where he currently is a senior researcher and Marc Pollefeys’ deputy while Prof. Pollefeys is on leave from ETH. His research interests include (large-scale) image-based localization using Structure-from-Motion point clouds, real-time localization and SLAM on mobile devices and for robotics, 3D mapping, Augmented & Virtual Reality, (multi-view) stereo, image retrieval and efficient spatial verification, camera calibration and pose estimation. Torsten has worked on dense sensing for self-driving cars as part of the V-Charge project. He is currently involved in enabling semantic SLAM and re-localization for gardening robots (as part of a EU Horizon 2020 project where he leads the efforts on a workpackage), research for Google’s Tango project, where he leads CVG’s research efforts, and in work on self-driving cars.