Research Publications Affiliation

Paulo Fabiano Urnau Gotardo

Paulo Gotardo is a postdoctoral researcher with Disney Research, Pittsburgh. His research interests include machine vision and learning, with a focus on the 3D reconstruction and modeling of dynamic scenes, and human-computer interaction. He received his BSc and MSc degrees in Informatics from Universidade Federal do Parana, Brazil, in 2000 and 2002, and his PhD degree in Electrical and Computer Engineering from The Ohio State University (OSU) in 2010. During his graduate studies, he was a recipient of distinguished fellowships from The Brazilian Ministry of Education and The Fulbright Committee. Prior to joining Disney Research, he was a graduate research associate with OSU's Advanced Computing Center for the Arts and Design (ACCAD) and a postdoctoral researcher with the Computational Biology and Cognitive Science Lab (CBCSL) at OSU.

Lattes C.V. (in Portuguese)


As of October 2012, I have taken a Postdoc position with Disney Research, Pittsburgh.    I can still be reached at


Learning Spatially-Smooth Mappings in Non-Rigid Structure from Motion (ECCV 2012)

Non-rigid structure from motion (NRSFM) is a classical underconstrained problem in computer vision. A common approach to make NRSFM more tractable is to constrain 3D shape deformation to be smooth over time. This constraint has been used to compress the deformation model and reduce the number of unknowns that are estimated. However, temporal smoothness cannot be enforced when the data lacks temporal ordering and its benefits are less evident when objects undergo abrupt deformations. This paper proposes a new NRSFM method that addresses these problems by considering deformations as spatial variations in shape space and then enforcing spatial, rather than temporal, smoothness. This is done by modeling each 3D shape coefficient as a function of its input 2D shape. This mapping is learned in the feature space of a rotation invariant kernel, where spatial smoothness is intrinsically defined by the mapping function. As a result, our model represents shape variations compactly using a custom-built coefficient basis B learned from the input data, rather than a pre-specified basis such as the Discrete Cosine Transform. The resulting kernel-based mapping is a by-product of the NRSFM solution and leads to another fundamental advantage of our approach: for a newly observed 2D shape, its 3D shape is recovered by simply evaluating the learned function (watch the supplementary video).

NRSFM with RIKs is a generic new approach that can make use of customized RIKs to build mappings that even exploit correlations between object appearance and 3D shape. Our approach can potentially combine the functionalities of NRSFM and 3D active appearance models with RIKs (Hamsici & Martinez, ICCV 2009): while NRSFM is seen as the training stage, "testing" corresponds to the evaluation of the learned mapping with a previously unseen 2D shape. These new capabilities allow for learning deformable models in a studio, reliably (e.g., with known camera positions), to reconstruct the 3D shapes of objects observed elsewhere.



Kernel Non-Rigid Structure from Motion (ICCV 2011)

Non-rigid structure from motion (NRSFM) is a difficult, underconstrained problem in computer vision. The standard approach in NRSFM constrains 3D shape deformation using a linear combination of K basis shapes; the solution is then obtained as the low-rank factorization of an input observation matrix. An important but overlooked problem with this approach is that non-linear deformations are often observed; these deformations lead to a weakened low-rank constraint due to the need to use additional basis shapes to linearly model points that move along curves.

We demonstrate how the kernel trick can be applied in standard NRSFM. This approach is flexible and can use different kernels to build different non-linear models. Using the kernel trick, our model complements the low-rank constraint by capturing non-linear relationships in the shape coefficients of the linear model. The net effect can be seen as using non-linear dimensionality reduction to further compress the (shape) space of possible solutions.

The top-right figure illustrates how the deformation of a 3D shape (walking person) is modeled as the smooth time-trajectory of a point in a 2-dimensional shape space. The 3D reconstruction of each observed 2D shape is the output of a non-linear mapping of points along this shape trajectory.

See the ICCV poster. To download the source code and the supplementary video of results, please scroll down to the Publications section.


Non-Rigid Structure from Motion with Complementary Rank-3 Spaces (CVPR 2011)

Video presentation of our CVPR 2011 paper. Download the original video, source code, and supplementary results here!

Analysis of 3D Facial Expressions in ASL Video

An application of computer vision is in the interpretation of the facial expressions of sign languages from video. Here, the reconstruction of 3D faces from video is formulated as the problem of non-rigid structure from motion with occlusion. This video shows the results for a face close-up sequence of an American Sign Language (ASL) sentence. Note that head rotation and hand gesticulation often cause the occlusion of facial features. Facial landmarks were manually annotated in each image when visible. These markings have small magnitude noise due to annotation errors caused by partial occlusion of facial features and motion blur in the video images. The results show that, even when a hand occludes the mouth, our algorithm provides a correct estimate for the occluded 3D shape by enforcing smoothness of deformation while modeling the visible shapes in adjacent images.

Structure from Motion (PAMI 2011)

We derived a family of efficient methods that estimate the column space of a matrix using compact parameterizations in the Discrete Cosine Transform (DCT) domain. Our methods tolerate high percentages of missing data and incorporate new models for the smooth time-trajectories of 2D points, affine and weak-perspective cameras, and 3D deformable shape. We solve a rigid structure from motion (SFM) problem by estimating the smooth time-trajectory of a single camera moving around the structure of interest. By considering a weak-perspective camera model from the outset, we directly compute Euclidean 3D shape reconstructions without requiring post-processing steps such as Euclidean upgrade and bundle adjustment. Our results on real SFM datasets with high percentages of missing data were positively compared to those by a number of methods in the computer vision literature.

The 3D Euclidean shape on the right was reconstructed from the dinosaur dataset. A total of 4983 two-dimensional points were automatically tracked in 36 images, including 2300 points tracked in only two images. Due to tracking errors and self-occlusion, 90.8% of the input point tracks is missing. The result on the right also indicates the presence of outliers (gross tracking errors) in the input data.


Non-rigid Structure from Motion (PAMI 2011)

In non-rigid SFM, we assume that the structure deforms only smoothly over time. We propose a novel 3D shape trajectory approach that solves for the deformable structure as the smooth time-trajectory of a single point in a linear shape space. A key result shows that, compared to state-of-the-art algorithms (see video), our non-rigid SFM method can better model complex articulated deformation with higher frequency DCT components while still maintaining the low-rank factorization constraint. Finally, we also offer an approach for non-rigid SFM with missing data.

The results shown here were obtained from 2D (XY) projections of 3D motion capture datasets. Reconstructed and original 3D shapes are overlaid for comparison. The top row shows the result obtained with the current state-of-the-art algorithm in the computer vision literature. The bottom row has our new result.

An Integrated 3D Space -- a.k.a. "In the days before the Kinect" (SIGGRAPH 2010)

This project explores the use of real-time computer vision techniques and a pair of standard computer cameras to provide 3D human body awareness in an inexpensive, immersive environment system. The goal is to enhance the user experience of immersion in a virtual scene that is displayed by a 3D screen. We combine stereo vision and stereo projection to allow for both the user and the virtual scene to become aware of each other's 3D presence as part of a single, integrated 3D space.

We focus on enabling authoring applications based on the direct manipulation of virtual objects, with users interacting from a first-person perspective. This emphasis contrasts with the avatar-based, mostly reactive focus often employed in the design of computer game interfaces.

No markers or other special user-born equipment is required. The user's presence in front of the projection screen is automatically detected and head tracking dynamically updates the camera frustum, adjusting the 3D output. This capability significantly enhances the experience of immersion as it allows for 3D objects to be seen from different points of view as the user moves sideways or closer/farther from the screen. Hand detection provides a means for the user to directly interact with virtual objects by touching and moving them in the scene. The above capabilities are achieved without the need for the user to learn complex trackers or control devices, allowing for passers-by to immediately begin interaction with applications.

Our initial prototype performed stereo imaging using a pair of inexpensive USB cameras (see next video below). Current versions use Microsoft's Kinect sensor to provide more accurate depth maps for the estimation of 3D body pose.

Real-time stereo vision in HCI

This project explores the use of stereoscopic vision and image understanding techniques combined with inexpensive video cameras to provide 3D full-body gesture data as input to an immersive environment system. The goal is to help enhance the experience of immersion by developing software tools that add human body awareness to the virtual environment being displayed by a stereoscopic 3D screen. We seek a simple and effective means of interaction independent of controllers, tracking markers, or other user-borne devices. Only the use of polarized glasses is required for the perception of the 3D output. Our aim is also to develop low-cost, accessible solutions that are forward-looking to provide interaction designers with means to prototype ideas easily and in anticipation of future releases of similar technology in the mainstream of HCI applications.

Silhouette segmentation and skeletonization

Two-dimensional body pose extraction and interpretation in monocular video with background infra-red illumination. Provides body pose information for application with large projection screens in dark rooms. The digit displayed on each hand indicates the number of fingers identified (intended for use with closer views of the hands).

Segmentation of planar and quadric surface patches in range images (TSMC-B 2004)

Our range image segmentation method employs a novel robust estimator to iteratively detect and extract distinct planar and quadric surface patches. Our robust estimator extends M-estimator Sample Consensus/Random Sample Consensus (MSAC/RANSAC) to use local surface orientation, enhancing the accuracy of inlier/outlier classification when processing noisy range data describing multiple structures. An efficient approximation to the true geometric distance between a point and a quadric surface also contributes to effectively reject weak surface hypotheses and avoid the extraction of false surface components. Additionally, a genetic algorithm was specifically designed to accelerate the optimization process of surface extraction, while avoiding premature convergence. The segmentation algorithm was applied to three real range image databases and competes favorably against eleven other segmenters using the most popular evaluation framework in the literature.

Detection of ventricular dyssynchrony from cardiac MRI (CVPR 2006)

Intra-ventricular dyssynchrony (IVD) in the left ventricle (LV), the asynchronous activation of the LV walls, has been identified as a novel target for therapy in heart failure patients. Current guidelines for resynchronization therapy rely on measures that do not reliably predict successful patient response to treatment, in part due to poor characterization of IVD. We present a two-class statistical pattern recognition approach for the detection of IVD in the LV from routinely acquired MRI sequences depicting complete cardiac cycles. First, the LV endocardial and epicardial boundaries were extracted from a number of studies, including dyssynchronous and nondyssynchronous LVs. A pose normalization procedure was then applied to align the resulting spatio-temporal characterizations of LV wall motion, before training a classifier using Principal Component Analysis plus Linear Discriminant Analysis.

Fourier Hands: computer vision class project (autumn quarter, 2004) -- MATLAB code

The goal of this project was to develop an interface to control a very simple game using hand gestures captured by a regular webcam. The software was implemented in Matlab and the two main image processing techniques are (1) background subtraction, and (2) Fourier shape descriptors. In the begining of the video to the right, the user is asked to especify the hand gestures used to: pause, resume and quit the game; and to fire or navigate the "space ship" on the upper right corner (the small "asteroids" are not visible :-).


Past Affiliation

Oct 2010 - Aug 2012
Computational Biology and Cognitive Science Laboratory (CBCSL)
Department of Electrical and Computer Engineering, OSU - USA
Postdoctoral Researcher

Jul 2008 - Sep 2010
Computational Biology and Cognitive Science Laboratory (CBCSL)
Department of Electrical and Computer Engineering, OSU - USA
Graduate Student

Oct 2008 - Jun 2010
The Advanced Computing Center for the Arts and Design (ACCAD)
The College of The Arts, OSU - USA
Graduate Research Associate

Sep 2007 - Jun 2008
CMR/CT Laboratory
Division of Cardiovascular Medicine, The OSU Medical Center - USA
Graduate Research Associate

Sep 2003 - Aug 2007
Signal Analysis and Machine Perception Laboratory (SAMPL)
Department of Electrical and Computer Engineering, OSU - USA
Graduate Student

Mar 2003 - Jun 2003
Departamento de Informática, UFPR - Brasil

Feb 2003 - Mar 2003
OPET Centro Tecnológico, Curitiba (PR) - Brasil

Nov 1999 - Aug 2003
IMAGO - Computer Vision, Graphics, and Image Processing Research Group
Departamento de Informática, UFPR - Brasil
Undergraduate/Graduate Student

Aug 1998 - Dec 2000
Programa Especial de Treinamento (PET / CAPES)
Departamento de Informática, UFPR - Brasil
Undergraduate Student

Contact Information

Department of Electrical and Computer Engineering
205 Dreese Laboratory
2015 Neil Avenue
The Ohio State University
Columbus, OH 43210-1272



TBDBITL: Video Games

The Amazing Iguassu Falls

Iguassu Falls (main site)

Many views of Iguassu Falls (in my home town)

T.A.B.C.A.T. - Capoeira angola in Columbus, OH

Another capoeira video

Research Publications Affiliation