Otávio Penatti - Personal Web Page

Otávio Augusto Bizetto Penatti

Hi! I'm a research senior manager in Artificial Intelligence at Samsung R&D Institute Brazil and this is my personal web page. In this page, you can find some news and materials related to myself and my research.
You can also find more information about myself on the links below.

LinkedIn Google Scholar GitHub Curriculum Lattes

Some news:

[May, 2023] 2 papers accepted at EUSIPCO (European Signal Processing Conference)

[Apr, 2023] Paper accepted at QoMex (International Conference on Quality of Multimedia Experience): "Photoplethysmogram Signal Quality Assessment via 1D-to-2D Projections and Vision Transformers"

[Mar, 2023] Patent granted in USPTO: "Method for generating an adaptive multiplane image from a single high-resolution image"

[Feb, 2023] Patent granted in INPI (Brazilian patent office): "Method for multiclass classification in open-set scenarios and uses thereof" (previously granted in USPTO in 2018)

[Feb, 2023] Paper accepted at IEEE Journal of Biomedical and Health Informatics (JBHI)! "Learning to estimate heart rate from accelerometer and user's demographics during physical exercises"

[Feb, 2023] Paper accepted at IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)! "Towards Low-Power Heart Rate Estimation Based on User's Demographics and Activity Level For Wearables"

[Nov, 2022] Paper published at the Brazilian Congress on Health Informatics (CBIS): "Signal Quality Assessment of Photoplethysmogram Signals Using Hybrid Ruleand Learning-Based Models" (later published on the Journal of Health Informatics)

[Jun, 2022] Patents granted in USPTO: Two patents filed in 2018 and in 2020 were just granted in USPTO!

[Feb, 2022] Personal web page restored: I restored my personal web page after some months offline and applied some small adjustments.

[Nov, 2021] 2021 Employee of the Year Award: I feel very honored to be one of the two employees from all Samsung global R&D centers that received this recognition in 2021.

[Oct, 2021] Paper published at BMVC 2021: Learning multiplane images from single views with self-supervision. Here is the PDF and the video.

[Jun, 2021] Paper presentation on CVPR 2021 Workshop: Our paper published on WACV 2021 will be presented in the Learning to Generate 3D Shapes and Scenes workshop at CVPR on June 25th.

[Nov, 2020] Paper accepted at WACV 2021: "Adaptive Multiplane Image Generation From a Single Internet Picture"

[Sep, 2020] Paper accepted at SIBGRAPI 2020: "A comparison of graph-based semi-supervised learning for data augmentation" with colleagues from Unifesp (São José dos Campos).

[Aug, 2020] Code release on GitHub: The source code of the Eva tool was released on my GitHub page. This tool was developed during my masters degree.

[July, 2020] Top cited paper [1]: A paper that I published at Pattern Recognition journal in 2017 in collaboration with colleagues from UFMG is currently the top-3 most cited paper among all papers from the journal according to the updated Google Scholar metrics for papers published from 2015 to 2019.

[July, 2020] Top cited paper [2]: A paper that I published at CVPR Workshops in 2015 in collaboration with colleagues from UFMG is currently the top-6 most cited paper among all papers from CVPR Workshops according to the updated Google Scholar metrics for papers published from 2015 to 2019.

[May, 2020] Code release on GitHub [1]: Code from all the descriptors that I used during my masters studies were released on my GitHub page in the libdescriptors repository. These descriptors were evaluated theoretically and experimentally on the paper "Comparative study of descriptors for content-based image retrieval on the Web".

[May, 2020] Code release on GitHub [2]: Code of the visual-word spatial arrangement technique (WSA) was released on my GitHub page in the wsa repository. WSA is presented in the paper "Visual word spatial arrangement for image retrieval and classification".

[June, 2019] Most cited paper: A paper published with colleagues from UFMG (Keiller Nogueira and Jefersson dos Santos) is currently the most cited paper of Elsevier Pattern Recognition journal, considering the articles published since 2016: "Towards better exploiting convolutional neural networks for remote sensing scene classification".

[October, 2018] Best Paper Award at SIBGRAPI 2018: One of my papers at SIBGRAPI 2018 was awarded as the best paper in the category of Image Processing/Computer Vision/Pattern Recognition. The paper "Bag of Attributes for Video Event Retrieval" was written in collaboration with colleagues from Unifesp (São José dos Campos).

[September, 2018] Patent granted on USPTO: "Method for multiclass classification in open-set scenarios and uses thereof". Patent 14/532,580

[September, 2018] Paper published at IEEE Geoscience and Remote Sensing Letters (GRSL): "Exploiting ConvNet Diversity for Flooding Identification". The paper is an extension of the work presented at the Flood Detection in Satellite Images (FDSI) in the MediaEval challenge of 2017.

[August, 2018] 2 papers accepted at SIBGRAPI 2018: "Object-based Temporal Segment Relational Network for Activity Recognition" with colleagues from UFMG and "Bag of Attributes for Video Event Retrieval" with colleagues from Unifesp (São José dos Campos).

[August, 2018] 5th most cited paper: Paper from CVPRW 2015 is currently the 5th most cited paper among all CVPRW papers according to Google Scholar. The paper is about deep features generalization from everyday objects to remote sensing applications.

[September, 2017] Best results for Flood Detection in Satellite Images (FDSI) in the MediaEval challenge of 2017: In collaboration with colleagues from UFMG, Unicamp, UEFS and Samsung Brazil, we achieved good results in the two sub-tasks of the Satellite task of MediaEval 2017. For Flood Detection (FDSI), we got the 1st place. For Disaster Image Retrieval from Social Media (DIRSM), we obtained the highest average precision (AP@480) in two runs (textual only and textual+visual). Our working notes paper explains the approaches used.

[August, 2017] Paper accepted at Elsevier Pattern Recognition Letters journal: "TWM: A framework for creating highly compressible videos targeted to computer vision tasks". Paper in collaboration with F. Andaló and V. Testoni. The method in this paper is related to US Patent 9,699,476.

[July, 2017] Patent granted on USPTO: "System and method for video context-based composition and compression from normalized spatial resolution objects". Patent 9,699,476.

[June, 2017] Paper accepted at Elsevier Future Generation Computer Systems journal: "Kuaa: a unified framework for design, deployment, execution, and recommendation of machine learning experiments", presenting a framework for automating machine learning experiments, including a tool for recommendation of experimental setups. Paper in collaboration with colleagues from University of Campinas. Code available on Github

[November, 2016] Paper published at Springer Machine Learning journal: "Nearest neighbors distance ratio open-set classifier", proposes a new multiclass open set classifier, which is robust to deal with unknown classes at training time that appear during test (open set scenario).

[July, 2016] Paper accepted at the Pattern Recognition journal: "Towards Better Exploiting Convolutional Neural Networks for Remote Sensing Scene Classification", in collaboration with K. Nogueira and J. A. dos Santos. Paper on Google Scholar.

[July, 2016] 2 papers accepted at SIBGRAPI 2016:
the following papers were accepted at the Conference on Graphics, Patterns and Images (SIBGRAPI) that will happen on São José dos Campos, SP, on October 2016:
- Bag of Genres for Video Retrieval, with colleagues L. A. Duarte and J. Almeida, and
- Transmitting What Matters - Task-oriented video composition and compression, with colleagues F. A. Andaló and V. Testoni.

[June, 2016] Unicamp Inventors Awards 2016: The research team of my postdoc project received the Unicamp Inventors Award 2016 in the category of "Licensed Technology" due to the patents filed during the research collaboration between Unicamp and Samsung. The project was about feature engineering and open-set recognition.

[May, 2016] Honored papers 2015: The paper about real-time heart view plane classification of echocardiograms was awarded as a Esteemed Paper of 2015 from Elsevier Computers in Biology and Medicine journal. This award is given to the top 10-15 papers of the year published in this journal. [Pre-print PDF]

[April, 2016] Paper among the journal's most cited papers: The survey paper published in 2012 is currently the 2nd most cited article in the Journal of Visual Communication and Image Representation among the articles published since 2011. "Comparative study of descriptors for content-based image retrieval on the Web", in collaboration with E. Valle and R. da S. Torres.

[March, 2016] Results of my postdoc project on the news: "Sistemas para classificação de imagens do coração e de amostras" - press Inova Unicamp (Inovation agency from University of Campinas - in Portuguese)

[March, 2016] Paper accepted at the IEEE Transactions on Geoscience and Remote Sensing: "Detection of Fragmented Rectangular Enclosures in Very-High-Resolution Remote Sensing Images", in collaboration with I. Zingman, D. Saupe, and K. Lambers

[Other news about my research in the RECOD's blog]

Awards:

[My awards on LinkedIn]

Academic Projects:

Postdoc (12/2012 - 12/2013) - Unicamp/Samsung:
Pattern recognition and classification by feature engineering, *-fusion, open set recognition and meta-recognition
Principal investigator: Anderson de Rezende Rocha
Awards:
- Unicamp Inventors Award 2016 - category Licensed Technology, for the patents filed during the research collaboration between Unicamp and Samsung.
- Esteemed Paper 2015 - Top 10-15 papers of the year - Mid-level Image Representations for Real-time Heart View Plane Classification of Echocardiograms - Computers in Biology and Medicine, volume 66, p. 66-81, 2015.

PhD (03/2009 - 11/2012):
Exploring visual dictionaries for Web image retrieval
Advisor: Ricardo da Silva Torres
Awards:
- 3º place - Posters - PhD - Workshop of Thesis and Dissertations 2009 - Institute of Computing - University of Campinas (Unicamp)
- Best Paper Award - 2011 - "Encoding spatial arrangement of visual words" - Iberoamerican Congress on Pattern Recognition (CIARP-IAPR)
Visiting scholar:
- 2011 (June to July) - Cedric/Cnam - Paris, France - Prof. Valerie Gouet-Brunet
- 2012 (July to October) - University of Colorado at Colorado Springs (UCCS), CO, USA - Prof. Terrance E. Boult and Prof. Walter J. Scheirer
Abstract | Thesis (PDF)
Effectively encoding visual properties from multimedia content is challenging. One popular approach to deal with this challenge is the visual dictionary model. In this model, images are handled as an unordered set of local features being represented by the so-called bag-of-(visual-)words vector. In this thesis, we work on three research problems related to the visual dictionary model.
The first research problem is concerned with the generalization power of dictionaries, which is related to the ability of representing well images from one dataset even using a dictionary created over other dataset, or using a dictionary created on small dataset samples. We perform experiments in closed datasets, as well as in a Web environment. Obtained results suggest that diverse samples in terms of appearances are enough to generate a good dictionary.
The second research problem is related to the importance of the spatial information of visual words in the image space, which could be crucial to distinguish types of objects and scenes. The traditional pooling methods usually discard the spatial configuration of visual words in the image. We have proposed a pooling method, named Word Spatial Arrangement (WSA), which encodes the relative position of visual words in the image, having the advantage of generating more compact feature vectors than most of the existing spatial pooling strategies. Experiments for image retrieval show that WSA outperforms the most popular spatial pooling method, the Spatial Pyramids.
The third research problem under investigation in this thesis is related to the lack of semantic information in the visual dictionary model. We show that the problem of having no semantics in the space of low-level descriptions is reduced when we move to the bag-of-words representation. However, even in the bag-of-words space, we show that there is little separability between distance distributions of different semantic concepts. Therefore, we question about moving one step further and propose a representation based on visual words which carry more semantics, according to the human visual perception. We have proposed a bag-of-prototypes model, according to which the prototypes are the elements containing more semantics. This approach goes in the direction of reducing the so-called semantic gap problem. We propose a dictionary based on scenes, that is used for video representation in experiments for video geocoding. Video geocoding is the task of assigning a geographic location to a given video. The evaluation was performed in the context of the Placing Task of the MediaEval challenge and the proposed bag-of-scenes model has shown promising performance.

Master's (03/2007 - 03/2009):
Comparative study of descriptors for content-based image retrieval on the Web
Advisor: Ricardo da Silva Torres
Awards:
- JVCI Best Paper Award Runner Up - 2014 - "Comparative study of global color and texture descriptors for web image retrieval" - Journal of Visual Communication and Image Representation, volume 23, number 2, p. 359-380, 2012.
Abstract | Libdescriptors repository | Eva tool | Dissertation (in portuguese)

The growth in size of image collections and the worldwide availability of these collections has increased the demand for image retrieval systems. A promising approach to address this demand is to retrieve images based on image content (Content-Based Image Retrieval). This approach considers the image visual properties, like color, texture and shape of objects, for indexing and retrieval. The main component of a content-based image retrieval system is the image descriptor. The image descriptor is responsible for encoding image properties into feature vectors. Given two feature vectors, the descriptor compares them and computes a distance value. This value quantifies the difference between the images represented by their vectors. In a content-based image retrieval system, these distance values are used to rank database images with respect to their distance to a given query image.
This dissertation presents a comparative study of image descriptors considering the Web as the environment of use. This environment presents a huge amount of images with heterogeneous content. The comparative study was conducted by taking into account two approaches. The first approach considers the asymptotic complexity of feature vectors extraction algorithms and distance functions, the size of the feature vectors generated by the descriptors and the environment where each descriptor was validated. The second approach compares the descriptors in practical experiments using four different image databases. The evaluation considers the time required for features extraction, the time for computing distance values, the storage requirements and the effectiveness of each descriptor. Color, texture, and shape descriptors were compared. The experiments were performed with each kind of descriptor independently and, based on these results, a set of descriptors was evaluated in an image database containing more than 230 thousand heterogeneous images, reflecting the content existent in the Web. The evaluation of descriptors effectiveness in the heterogeneous database was made by experiments using real users. This dissertation also presents a tool for executing experiments aiming to evaluate image descriptors.

Undergraduate research project (07/2006 - 12/2006):
Content-based image retrieval using spatial relationship descriptors
Advisor: Ricardo da Silva Torres
Awards:
- 3º place CTIC 2007 - Undergraduate Research Projects Contest - Brazilian Computer Society (SBC)
- Best Undergraduate Research Project 2006 - Institute of Computing - University of Campinas (Unicamp)
Abstract | Demo

The growth in size of image collections has increased the demand for image retrieval systems. These systems use a great variety of techniques. One of the most important ones is content-based image retrieval (CBIR). CBIR is based on image properties, like color, texture, shape and spatial relationships. The last one can be fundamental for the recognition and retrieval of images bringing benefits for several applications, like geographic and medical, for example. This work presents a comparative study of spatial relationship descriptors. The experiments compare several descriptors considering efficiency and effectiveness as the evaluation criteria. Also, new spatial relationship descriptors are proposed. The results indicate that the proposed descriptors are superior when compared to the existent ones.

Patents:

[My patents on Google Patents]

Publications:

[My publications on Google Scholar]

Conference

Journal

Learning multiplane images from single views with self-supervision

CARVALHO, G. S. P. ; LUVIZON, D. C. ; JOIA, A. ; PACHECO, A. G. C. ; PENATTI, O. A. B.

In: British Machine Vision Conference (BMVC), 2021.
Abstract | Paper | Slides & Video (BMVC) | Video (YouTube)
Generating single novel views from an already captured image is a hard task in computer vision and graphics, in particular when the single input image has dynamic parts such as persons or moving objects. In this paper, we tackle this problem by proposing a new framework, called CycleMPI, that is capable of learning a multiplane image representation from single images through a cyclic training strategy for self-supervision. Our framework does not require stereo data for training, therefore it can be trained with massive visual data from the Internet, resulting in a better generalization capability even for very challenging cases. Although our method does not require stereo data for supervision, it reaches results on stereo datasets comparable to the state of the art in a zero-shot scenario. We evaluated our method on RealEstate10K and Mannequim Challenge datasets for view synthesis, and presented qualitative results on Places II dataset.

Adaptive Multiplane Image Generation From a Single Internet Picture

LUVIZON, D. C. ; CARVALHO, G. S. P. ; dos Santos, A. A. ; CONCEICAO, J. S. ; FLORES-CAMPANA, J. L. ; DECKER, L. G. ; SOUZA, M. R. ; PEDRINI, H. ; JOIA, A. ; PENATTI, O. A. B.

In: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021, p. 2556-2565.
Abstract | Paper
In the last few years, several works have tackled the problem of novel view synthesis from a pair of stereo images or even from a single picture. However, previous methods are computationally expensive, specially for high-resolution images. In this paper, we address the problem of generating an efficient multiplane image (MPI) from a single high-resolution picture. We present the adaptive-MPI representation, which allows rendering novel views with low computational requirements. To this end, we propose an adaptive slicing algorithm that produces an MPI with a variable number of image planes. We also present a new lightweight CNN for depth estimation, which is learned by knowledge distillation from a larger network. Occluded regions in the adaptive-MPI are inpainted also by a lightweight CNN. We show that our method is capable of producing high-quality predictions with one order of magnitude less parameters, when compared to previous approaches. In addition, we show the robustness of our method for novel view synthesis on challenging pictures from the Internet.

A comparison of graph-based semi-supervised learning for data augmentation

OLIVEIRA, W. D. G. ; PENATTI, O. A. B. ; BERTON, L.

In: Conference on Graphics, Patterns, and Images (SIBGRAPI), 2020.
Abstract | Paper | Videos: 8min / 4min
In supervised learning, the algorithm accuracy usually improves with the size of the labeled dataset used for training the classifier. However, in many real-life scenarios, obtaining enough labeled data is costly or even not possible. In many circumstances, Data Augmentation (DA) techniques are usually employed, generating more labeled data for training machine learning algorithms. The common DA techniques are applied to already labeled data, generating simple variations of this data. For example, for image classification, image samples are rotated, cropped, flipped or other operators to generate variations of input image samples, and keeping their original labels. Other options are using Neural Networks algorithms that create new synthetic data or to employ Semi-supervised Learning (SSL) that label existing unlabeled data. In this paper, we perform a comparison among graph-based semi-supervised learning (GSSL) algorithms to augment the labeled dataset. The main advantage of using GSSL is that we can increase the training set by adding non-annotated images to the training set, therefore, we can benefit from the huge amount of unlabeled data available. Experiments are performed on five datasets for recognition of handwritten digits and letters (MNIST and EMINIST), animals (Dogs vs Cats), clothes (MNIST-Fashion) and remote sensing images (Brazilian Coffee Scenes), in which we compare different possibilities for DA, including the GSSL, Generative Adversarial Networks (GANs) and traditional Image Transformations (IT) applied on input labeled data. We also evaluated the impact of such techniques on different convolutional neural networks (CNN). Results indicate that, although all DA techniques performed well, GSSL was more robust to different image properties, presenting less accuracy variation across datasets.

Object-based Temporal Segment Relational Network for Activity Recognition

MELO, V. H. C. ; SANTOS, J. B. ; CAETANO, C. ; SENA, J. ; PENATTI, O. A. B. ; SCHWARTZ, W. R.

In: Conference on Graphics, Patterns, and Images (SIBGRAPI), 2018, p. 103-109.
Abstract | Paper
Video understanding is the next frontier of computer vision, in which activity recognition plays a major role. Despite the recent improvements in holistic activity recognition, further researching part-based models such as context may allow us to better understand what is important for activities and thus improve our current activity recognition models. This work tackles contextual cues obtained from object detections, in which we posit that objects relevant to an action are related to its spatial arrangement regarding an agent. Based on that, we propose Egocentric Pyramid to encode such spatial relationships. We further extend it by proposing a data-centric approach named Temporal Segment Relational Network (TSRN). Our experiments give support to the hypothesis that object spatiality provides an important clue to activity recognition. In addition, our datacentric approach shows that besides such spatial features, there may be other important information that further enhances the object-based activity recognition, such as co-occurrence, relative size, and temporal information.

Bag of Attributes for Video Event Retrieval

DUARTE, L. A. ; PENATTI, O. A. B. ; ALMEIDA, J.

In: Conference on Graphics, Patterns, and Images (SIBGRAPI), 2018, p. 447-454.
Abstract (Best Paper Award) | Paper
In this paper, we present the Bag-of-Attributes (BoA) model for video representation aiming at video event retrieval. The BoA model is based on a semantic feature space for representing videos, resulting in high-level video feature vectors. For creating a semantic space, i.e., the attribute space, we can train a classifier using a labeled image dataset, obtaining a classification model that can be understood as a high-level codebook. This model is used to map low-level frame vectors into high-level vectors (e.g., classifier probability scores). Then, we apply pooling operations to the frame vectors to create the final bag of attributes for the video. In the BoA representation, each dimension corresponds to one category (or attribute) of the semantic space. Other interesting properties are: compactness, flexibility regarding the classifier, and ability to encode multiple semantic concepts in a single video representation. Our experiments considered the semantic space created by state-of-the-art convolutional neural networks pre-trained on 1000 object categories of ImageNet. Such deep neural networks were used to classify each video frame and then different coding strategies were used to encode the probability distribution from the softmax layer into a frame vector. Next, different pooling strategies were used to combine frame vectors in the BoA representation for a video. Results using BoA were comparable or superior to the baselines in the task of video event retrieval using the EVVE dataset, with the advantage of providing a much more compact representation.

Exploiting ConvNet Diversity for Flooding Identification

NOGUEIRA K. ; FADEL, S. G. ; DOURADO, I. C., WERNECK, R. de O. ; MUNOZ, J. A. V. ; PENATTI, O. A. B. ; CALUMBY, R. T., LI, L. T. ; SANTOS, J. A., TORRES, R. da S.

In: IEEE Geoscience and Remote Sensing Letters, volume 15, issue 9, p. 1446-1450, 2018.
Abstract | Paper
Flooding is the world's most costly type of natural disaster in terms of both economic losses and human causalities. A first and essential procedure towards flood monitoring is based on identifying the area most vulnerable to flooding, which gives authorities relevant regions to focus. In this work, we propose several methods to perform flooding identification in high resolution remote sensing images using deep learning. Specifically, some proposed techniques are based upon unique networks, such as dilated and deconvolutional ones, while other was conceived to exploit diversity of distinct networks in order to extract the maximum performance of each classifier. Evaluation of the proposed methods were conducted in a high-resolution remote sensing dataset. Results show that the proposed algorithms outperformed state-of-the-art baselines, providing improvements ranging from 1 to 4% in terms of the Jaccard Index.

TWM: A framework for creating highly compressible videos targeted to computer vision tasks

ANDALÓ, F. A. ; PENATTI, O. A. B. ; TESTONI, V.

In: Elsevier Pattern Recognition letters, volume 114, p. 63-72, 2017.
Abstract | Paper | Patent
We present a simple yet effective framework - Transmitting What Matters (TWM) - to generate highly compressible videos containing only relevant information targeted to specific computer vision tasks, such as faces for the task of face expression recognition, license plates for the task of optical character recognition, among others. TWM takes advantage of the final desired computer vision task to compose video frames only with the necessary data. The video frames are compressed and can be stored or transmitted to powerful servers where extensive and time-consuming tasks are performed. Experiments explore the trade-offs between distortion and bitrate for a wide range of compression levels, and the impact generated by compression artifacts on the accuracy of the desired vision task. We show that, for two computer vision tasks implemented by different methods, it is possible to dramatically reduce the amount of required data to be stored or transmitted, without compromising accuracy. With PSNR_YUV quality of over 41 dB, the bitrate was reduced up to four times, while a detection task was affected by only ˜1 pixel and a classification task by 1˜2 percentage points.

Kuaa: A unified framework for design, deployment, execution, and recommendation of machine learning experiments

WERNECK, R. de O. ; DE ALMEIDA, W. R. ; STEIN, B. V. ; PAZINATO, D. V. ; MENDES JUNIOR, P. R. ; PENATTI, O. A. B. ; TORRES, R. da S. ; ROCHA, A.

In: Future Generation Computer Systems, volume 78, part 1, p. 59-76, 2018.
Abstract | Paper | Code (Github)
In this work, we propose Kuaa, a workflow-based framework that can be used for designing, deploying, and executing machine learning experiments in an automated fashion. This framework is able to provide a standardized environment for exploratory analysis of machine learning solutions, as it supports the evaluation of feature descriptors, normalizers, classifiers, and fusion approaches in a wide range of tasks involving machine learning. Kuaa also is capable of providing users with the recommendation of machine-learning workflows. The use of recommendations allows users to identify, evaluate, and possibly reuse previously defined successful solutions. We propose the use of similarity measures (e.g., Jaccard, Sørensen, and Jaro-Winkler) and learning-to-rank methods (LRAR) in the implementation of the recommendation service. Experimental results show that Jaro-Winkler yields the highest effectiveness performance with comparable results to those observed for LRAR, presenting the best alternative machine learning experiments to the user. In both cases, the recommendations performed are very promising and the developed framework might help users in different daily exploratory machine learning tasks.

Nearest neighbors distance ratio open-set classifier

MENDES JUNIOR, P. R. ; DE SOUZA, R. M. ; WERNECK, R. de O. ; STEIN, B. V. ; PAZINATO, D. V. ; DE ALMEIDA, W. R. ; PENATTI, O. A. B. ; TORRES, R. da S. ; ROCHA, A.

In: Springer Machine Learning, volume 106, issue 3, p. 359-386, 2017.
Abstract | Paper | Patent
In this paper, we propose a novel multiclass classifier for the open-set recognition scenario. This scenario is the one in which there are no a priori training samples for some classes that might appear during testing. Usually, many applications are inherently open set. Consequently, successful closed-set solutions in the literature are not always suitable for real-world recognition problems. The proposed open-set classifier extends upon the Nearest-Neighbor (NN) classifier. Nearest neighbors are simple, parameter independent, multiclass, and widely used for closed-set problems. The proposed Open-Set NN (OSNN) method incorporates the ability of recognizing samples belonging to classes that are unknown at training time, being suitable for open-set recognition. In addition, we explore evaluation measures for open-set problems, properly measuring the resilience of methods to unknown classes during testing. For validation, we consider large freely-available benchmarks with different open-set recognition regimes and demonstrate that the proposed OSNN significantly outperforms their counterparts in the literature.

Towards Better Exploiting Convolutional Neural Networks for Remote Sensing Scene Classification

NOGUEIRA, K. ; PENATTI, O. A. B. ; SANTOS, J. A. dos

In: Pattern Recognition, volume 61, p. 539-556, 2017.
Abstract | Paper
We present an analysis of three possible strategies for exploiting the power of existing convolutional neural networks (ConvNets or CNNs) in different scenarios from the ones they were trained: full training, fine tuning, and using ConvNets as feature extractors. In many applications, especially including remote sensing, it is not feasible to fully design and train a new ConvNet, as this usually requires a considerable amount of labeled data and demands high computational costs. Therefore, it is important to understand how to better use existing ConvNets. We perform experiments with six popular ConvNets using three remote sensing datasets. We also compare ConvNets in each strategy with existing descriptors and with state-of-the-art baselines. Results point that fine tuning tends to be the best performing strategy. In fact, using the features from the fine-tuned ConvNet with linear SVM obtains the best results. We also achieved state-of-the-art results for the three datasets used.

Bag of Genres for Video Retrieval

DUARTE, L. A. ; PENATTI, O. A. B. ; ALMEIDA, J.

In: Conference on Graphics, Patterns, and Images (SIBGRAPI), 2016, p. 257-264.
Abstract | Paper
Often, videos are composed of multiple concepts or even genres. For instance, news videos may contain sports, action, nature, etc. Therefore, encoding the distribution of such concepts/genres in a compact and effective representation is a challenging task. In this sense, we propose the Bag of Genres representation, which is based on a visual dictionary defined by a genre classifier. Each visual word corresponds to a region in the classification space. The Bag of Genres video vector contains a summary of the activations of each genre in the video content. We evaluate the proposed method for video genre retrieval using the dataset of MediaEval Tagging Task of 2012 and for video event retrieval using the EVVE dataset. Results show that the proposed method achieves results comparable or superior to state-of-the-art methods, with the advantage of providing a much more compact representation than existing features.

Transmitting What Matters - Task-oriented video composition and compression

ANDALÓ, F. A. ; PENATTI, O. A. B. ; TESTONI, V.

In: Conference on Graphics, Patterns, and Images (SIBGRAPI), 2016, p. 72-79.
Abstract | Paper
We present a simple yet effective framework - Transmitting What Matters (TWM) - to generate compressed videos containing only relevant objects targeted to specific computer vision tasks, such as faces for the task of face expression recognition, license plates for the task of optical character recognition, among others. TWM takes advantage of the final desired computer vision task to compose video frames only with the necessary data. The video frames are compressed and can be stored or transmitted to powerful servers where extensive and time-consuming tasks can be performed. We experimentally present the trade-offs between distortion and bitrate for a wide range of compression levels, and the impact generated by compression artifacts on the accuracy of the desired vision task. We show that, for one selected computer vision task, it is possible to dramatically reduce the amount of required data to be stored or transmitted, without compromising accuracy.

Detection of Fragmented Rectangular Enclosures in Very-High-Resolution Remote Sensing Images

ZINGMAN, I. ; SAUPE, D. ; PENATTI, O. A. B. ; LAMBERS, K.

In: IEEE Transactions on Geoscience and Remote Sensing, volume 54, number 8, p. 4580-4593, 2016.
Abstract | Paper
We develop an approach for the detection of ruins of livestock enclosures (LEs) in alpine areas captured by high-resolution remotely sensed images. These structures are usually of approximately rectangular shape and appear in images as faint fragmented contours in complex background. We address this problem by introducing a rectangularity feature that quantifies the degree of alignment of an optimal subset of extracted linear segments with a contour of rectangular shape. The rectangularity feature has high values not only for perfectly regular enclosures but also for ruined ones with distorted angles, fragmented walls, or even a completely missing wall. Furthermore, it has a zero value for spurious structures with less than three sides of a perceivable rectangle. We show how the detection performance can be improved by learning a linear combination of the rectangularity and size features from just a few available representative examples and a large number of negatives. Our approach allowed detection of enclosures in the Silvretta Alps that were previously unknown. A comparative performance analysis is provided. Among other features, our comparison includes the state-of-the-art features that were generated by pretrained deep convolutional neural networks (CNNs). The deep CNN features, although learned from a very different type of images, provided the basic ability to capture the visual concept of the LEs. However, our handcrafted rectangularity-size features showed considerably higher performance.

Pixel-Level Tissue Classification for Ultrasound Images

PAZINATO, D. V. ; STEIN, B. V. ; DE ALMEIDA, W. R. ; WERNECK, R. de O. ; MENDES JUNIOR, P. R. ; PENATTI, O. A. B. ; TORRES, R. da S. ; MENEZES, F. H. ; ROCHA, A.

In: IEEE Journal of Biomedical and Health Informatics (J-BHI), volume 20, number 1, p. 256-267, 2016.
Abstract | Paper | Pre-print PDF
Background: Pixel-level tissue classification for ultrasound images, commonly applied to carotid images, is usually based on defining thresholds for the isolated pixel values. Ranges of pixel values are defined for the classification of each tissue. The classification of pixels is then used to determine the carotid plaque composition and, consequently, to determine the risk of diseases (e.g., strokes) and whether or not a surgery is necessary. The use of threshold-based methods dates from the early 2000's but it is still widely used for virtual histology. Methodology/Principal Findings: We propose the use of descriptors that take into account information about a neighborhood of a pixel when classifying it. We evaluated experimentally different descriptors (statistical moments, texture-based, gradient-based, local binary patterns, etc.) on a dataset of five types of tissues: blood, lipids, muscle, fibrous, and calcium. The pipeline of the proposed classification method is based on image normalization, multiscale feature extraction, including the proposal of a new descriptor, and machine learning classification. We have also analyzed the correlation between the proposed pixel classification method in the ultrasound images and the real histology with the aid of medical specialists. Conclusions/Significance: The classification accuracy obtained by the proposed method with the novel descriptor in the ultrasound tissue images (around 73%) is significantly above the accuracy of the state-of-the-art threshold-based methods (around 54%). The results are validated by statistical tests. The correlation between the virtual and real histology confirms the quality of the proposed approach showing it is a robust ally for the virtual histology in ultrasound images.

Mid-level Image Representations for Real-time Heart View Plane Classification of Echocardiograms

PENATTI, O. A. B. ; WERNECK, R. de O. ; DE ALMEIDA, W. R. ; STEIN, B. V. ; PAZINATO, D. V. ; MENDES JUNIOR, P. R. ; TORRES, R. da S. ; ROCHA, A.

In: Computers in Biology and Medicine, volume 66, p. 66-81, 2015.
Abstract | Paper | Pre-print PDF
In this paper, we explore mid-level image representations for real-time heart view plane classification of 2D echocardiogram ultrasound images. The proposed representations rely on bags of visual words, successfully used by the computer vision community in visual recognition problems. An important element of the proposed representations is the image sampling with large regions, drastically reducing the execution time of the image characterization procedure. Throughout an extensive set of experiments, we evaluate the proposed approach against different image descriptors for classifying four heart view planes. The results show that our approach is effective and efficient for the target problem, making it suitable for use in real-time setups. The proposed representations are also robust to different image transformations, e.g., downsampling, noise filtering, and different machine learning classifiers, keeping classification accuracy above 90%. Feature extraction can be performed in 30 fps or 60 fps in some cases. This paper also includes an in-depth review of the literature in the area of automatic echocardiogram view classification giving the reader a through comprehension of this field of study.

Do Deep Features Generalize from Everyday Objects to Remote Sensing and Aerial Scenes Domains?

PENATTI, O. A. B. ; NOGUEIRA, K. ; SANTOS, J. A. dos

In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR EarthVision Workshop), p. 44-51, 2015.
Abstract | Paper | Coffee Scenes dataset
In this paper, we evaluate the generalization power of deep features (ConvNets) in two new scenarios: aerial and remote sensing image classification. We evaluate experimentally ConvNets trained for recognizing everyday objects for the classification of aerial and remote sensing images. ConvNets obtained the best results for aerial images, while for remote sensing, they performed well but were outperformed by low-level color descriptors, such as BIC. We also present a correlation analysis, showing the potential for combining/fusing different ConvNets with other descriptors or even for combining multiple ConvNets. A preliminary set of experiments fusing ConvNets obtains state-of-the-art results for the well-known UCMerced dataset.

Unsupervised Manifold Learning for Video Genre Retrieval

ALMEIDA, J. ; PEDRONETTE, D. C. G. ; PENATTI, O. A. B.

In: Iberoamerican Congress on Pattern Recognition (CIARP), Puerto Vallarta, Mexico, 2014, p. 604-612 (LNCS 8827)
Abstract | Paper | PDF
This paper investigates the perspective of exploiting pairwise similarities to improve the performance of visual features for video genre retrieval. We employ manifold learning based on the reciprocal neighborhood and on the authority of ranked lists to improve the retrieval of videos considering their genre. A comparative analysis of different visual features is conducted and discussed. We experimentally show in the dataset of 14,838 videos from the MediaEval benchmark that we can achieve considerable improvements in results. In addition, we also evaluate how the late fusion of different visual features using the same manifold learning scheme can improve the retrieval results.

Efficient and Effective Hierarchical Feature Propagation

SANTOS, J. A. dos ; PENATTI, O. A. B. ; GOSSELIN, P-H. ; FALCÃO, A. X. ; PHILIPP-FOLIGUET, S. ; TORRES, R. da S.

In: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS), volume 7, number 12, p. 4632-4643, 2014.
Abstract | Paper
Many methods have been recently proposed to deal with the large amount of data provided by the new remote sensing technologies. Several of those methods rely on the use of segmented regions. However, a common issue in region-based applications is the definition of the appropriate representation scale of the data, a problem usually addressed by exploiting multiple scales of segmentation. The use of multiple scales, however, raises new challenges related to the definition of effective and efficient mechanisms for extracting features. In this paper, we address the problem of extracting features from a hierarchy by proposing two approaches that exploit the existing relationships among regions at different scales. The H-Propagation propagates any histogram-based low-level descriptors. The BoW-Propagation approach uses the bag-of-visual-word model to propagate features along multiple scales. The proposed methods are very efficient as features need to be extracted only at the base of the hierarchy and yield comparable results to low-level extraction approaches.

Unsupervised Distance Learning By Reciprocal kNN Distance for Image Retrieval

PEDRONETTE, D. C. G. ; PENATTI, O. A. B. ; CALUMBY, R. T. ; TORRES, R. da S.

In: ACM International Conference on Multimedia Retrieval (ICMR), Glasgow, Scotland, 2014, p. 345:345-345:352.
Abstract | Paper
This paper presents a novel unsupervised learning approach that takes into account the intrinsic dataset structure, which is represented in terms of the reciprocal neighborhood references found in different ranked lists. The proposed Reciprocal kNN Distance defines a more effective distance between two images, and is used to improve the effectiveness of image retrieval systems. Several experiments were conducted for different image retrieval tasks involving shape, color, and texture descriptors. The proposed approach is also evaluated on multimodal retrieval tasks, considering visual and textual descriptors. Experimental results demonstrate the effectiveness of proposed approach. The Reciprocal kNN Distance yields better results in terms of effectiveness than various state-of-the-art algorithms.

Unsupervised Manifold Learning Using Reciprocal kNN Graphs in Image Re-Ranking and Rank Aggregation Tasks

PEDRONETTE, D. C. G. ; PENATTI, O. A. B. ; TORRES, R. da S.

In: Image and Vision Computing, volume 32, number 2, p. 120-130, 2014.
Abstract | Paper | Pre-print PDF
In this paper, we present an unsupervised distance learning approach for improving the effectiveness of image retrieval tasks. We propose a Reciprocal kNN Graph algorithm that considers the relationships among ranked lists in the context of a k-reciprocal neighborhood. The similarity is propagated among neighbors considering the geometry of the dataset manifold. The proposed method can be used both for re-ranking and rank aggregation tasks. Unlike traditional diffusion process methods, which require matrix multiplication operations, our algorithm takes only a subset of ranked lists as input, presenting linear complexity in terms of computational and storage requirements. We conducted a large evaluation protocol involving shape, color, and texture descriptors, various datasets, and comparisons with other post-processing approaches. The re-ranking and rank aggregation algorithms yield better results in terms of effectiveness performance than various state-of-the-art algorithms recently proposed in the literature, achieving bull's eye and MAP scores of 100% on the well-known MPEG-7 shape dataset.

Visual word spatial arrangement for image retrieval and classification

PENATTI, O. A. B. ; SILVA, F. B. ; VALLE, E. ; GOUET-BRUNET, V ; TORRES, R. da S.

In: Pattern Recognition, volume 47, number 2, p. 705-720, 2014.
Abstract | WSA info page
We present word spatial arrangement (WSA), an approach to represent the spatial arrangement of visual words under the bag-of-visual-words model. It lies in a simple idea which encodes the relative position of visual words by splitting the image space into quadrants using each detected point as origin. WSA generates compact feature vectors and is flexible for being used for image retrieval and classification, for working with hard or soft assignment, requiring no pre/post processing for spatial verification. Experiments in the retrieval scenario show the superiority of WSA in relation to Spatial Pyramids. Experiments in the classification scenario show a reasonable compromise between those methods, with Spatial Pyramids generating larger feature vectors, while WSA provides adequate performance with much more compact features. As WSA encodes only the spatial information of visual words and not their frequency of occurrence, the results indicate the importance of such information for visual categorization.

A rank aggregation framework for video multimodal geocoding

LI, L. T. ; PEDRONETTE, D. C. G. ; ALMEIDA, J. ; PENATTI, O. A. B. ; CALUMBY, R. T. ; TORRES, R. da S.

In: Multimedia Tools and Applications, 2013.
Abstract | Paper
This paper proposes a rank aggregation framework for video multimodal geocoding. Textual and visual descriptions associated with videos are used to define ranked lists. These ranked lists are later combined, and the resulting ranked list is used to define appropriate locations for videos. An architecture that implements the proposed framework is designed. In this architecture, there are specific modules for each modality (e.g, textual and visual) that can be developed and evolved independently. Another component is a data fusion module responsible for combining seamlessly the ranked lists defined for each modality. We have validated the proposed framework in the context of the MediaEval 2012 Placing Task, whose objective is to automatically assign geographical coordinates to videos. Obtained results show how our multimodal approach improves the geocoding results when compared to methods that rely on a single modality (either textual or visual descriptors). We also show that the proposed multimodal approach yields comparable results to the best submissions to the Placing Task in 2012 using no extra information besides the available development/training data. Another contribution of this work is related to the proposal of a new effectiveness evaluation measure. The proposed measure is based on distance scores that summarize how effective a designed/tested approach is, considering its overall result for a test dataset.

Image and Video Representations based on Visual Dictionaries

PENATTI, O. A. B. ; VALLE, E. ; TORRES, R. da S.

In: Workshop of Thesis and Dissertations (WTD), 26th Conference on Graphics, Patterns, and Images (SIBGRAPI), Arequipa, Peru, 2013.
Abstract | Paper
The thesis explores three research topics involving the popular approach used for representing visual content: the visual dictionaries. The first topic concerns the generality of visual dictionaries: does a dictionary based on one dataset generalize to another dataset? Our findings create the opportunity to greatly alleviate the burden in generating dictionaries. The second topic is related to the importance of the spatial information of visual words in the image space for distinguishing types of scenes and objects. We propose an efficient and effective spatial pooling approach which presents promising results for image retrieval. And the third topic refers to the semantic information in the visual dictionary model. We claim that a bag-of-prototypes model, where the prototypes are visual words carrying semantics, is promising for improving image and video representations. Employing this model, we propose a semantically enriched dictionary based on scenes, which was effectively used for video geocoding. Defended in November 29th, 2012, the thesis has already generated 6 publications, including a best paper award. One of the proposed approaches has also obtained one of the best results in the Placing Task of MediaEval challenge in the last two years.

Domain-specific Image Geocoding: A Case Study on Virginia Tech Building Photos

LI, L. T. ; PENATTI, O. A. B. ; FOX, E. A. ; TORRES, R. da S.

In: Joint Conference on Digital Libraries (JCDL), Indianapolis, Indiana, USA, 2013, p. 363-366.
Abstract | Paper | VTBuildings dataset
The use of map-based browser services is of great relevance in numerous digital libraries. The implementation of such services, however, demands the use of geocoded data collections. This paper investigates the use of image content local representations in geocoding tasks. Performed experiments demonstrate that some of the evaluated descriptors yield effective results in the task of geocoding VT building photos. This study is the first step to geocode multimedia material related to the VT April 16, 2007 school shooting tragedy.

Remote Sensing Image Representation based on Hierarchical Histogram Propagation

SANTOS, J. A. dos ; PENATTI, O. A. B. ; TORRES, R. da S. ; GOSSELIN, P-H. ; PHILIPP-FOLIGUET, S. ; FALCÃO, A. X.

In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Melbourne, Australia, 2013.
Abstract | Paper
Many methods have been recently proposed to deal with the large amount of data provided by high-resolution remote sensing technologies. Several of these methods rely on the use of image segmentation algorithms for delineating target objects. However, a common issue in geographic object-based applications is the definition of the appropriate data representation scale, a problem that can be addressed by exploiting multiscale segmentation. The use of multiple scales, however, raises new challenges related to the definition of effective and efficient mechanisms for extracting features. In this paper, we address the problem of extracting histogram-based features from a hierarchy of regions for multiscale classification. The strategy, called H-Propagation, exploits the existing relationships among regions in a hierarchy to iteratively propagate features along multiple scales. The proposed method speeds up the feature extraction process and yields good results when compared with global low-level extraction approaches.

Multimedia Multimodal Geocoding

LI, L. T. ; PEDRONETTE, D. C. G. ; ALMEIDA, J. ; PENATTI, O. A. B. ; CALUMBY, R. T. ; TORRES, R. da S.

In: ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS), Redondo Beach, California, USA, 2012. p. 474-477.
Abstract | Paper
This work is developed in the context of the placing task of the MediaEval 2011 initiative. The objective is to geocode (or geotag) a set of videos, i.e., automatically assign geographical coordinates to them. This paper presents an architecture for multimodal geocoding that exploits both visual and textual descriptions associated with videos. This paper also describes our efforts regarding the implementation of this architecture aiming to demonstrate its applicability. Conducted experiments show how our multimodal approach enhances the results compared to relying on a single modality (text or visual).

A Visual Approach for Video Geocoding using Bag-of-Scenes

PENATTI, O. A. B. ; LI, L. T. ; ALMEIDA, J. ; TORRES, R. da S.

In: ACM International Conference on Multimedia Retrieval (ICMR), Hong Kong, China, 2012, p. 53:1-53:8.
Abstract | Paper | PDF
This paper presents a novel approach for video representation, called bag-of-scenes. The proposed method is based on dictionaries of scenes, which provide a high-level representation for videos. Scenes are elements with much more semantic information than local features, specially for geotagging videos using visual content. Thus, each component of the representation model has self-contained semantics and, hence, it can be directly related to a specific place of interest. Experiments were conducted in the context of the MediaEval 2011 Placing Task. The reported results show our strategy compared to those from other participants that used only visual content to accomplish this task. Despite our very simple way to generate the visual dictionary, which has taken photos at random, the results show that our approach presents high accuracy relative to the state-of-the art solutions.

Improving Texture Description in Remote Sensing Image Multi-Scale Classification Tasks By Using Visual Words

SANTOS, J. A. dos ; PENATTI, O. A. B. ; TORRES, R. da S. ; P-H. GOSSELIN ; PHILIPP-FOLIGUET, S. ; FALCÃO, A. X.

In: International Conference on Pattern Recognition (ICPR), Tsukuba Science City, Japan, 2012, p. 3090-3093.
Abstract | Paper
Although texture features are important for region-based classification of remote sensing images, the literature shows that texture descriptors usually have poor performance when compared and combined with color descriptors. In this paper, we propose a bag-of-visual-words (BOW) "propagation" approach to extract texture features from a hierarchy of regions. This strategy improves the features efficacy by encoding texture independently of the region shape. Experiments show that the proposed approach improves the classification results when compared with global descriptors using the bounding box padding strategy.

Comparative Study of Global Color and Texture Descriptors for Web Image Retrieval

PENATTI, O. A. B. ; VALLE E. ; TORRES, R. da S.

In: Journal of Visual Communication and Image Representation, volume 23, number 2, p. 359-380, 2012.
Abstract (JVCI Best Paper Award Runner Up 2014) | Paper | Journal's Hottest Article: 11th (2014), 5th (2013), 1st (2012).
This paper presents a comparative study of color and texture descriptors considering the Web as the environment of use. We take into account the diversity and large-scale aspects of the Web considering a large number of descriptors (24 color and 28 texture descriptors, including both traditional and recently proposed ones). The evaluation is made on two levels: a theoretical analysis in terms of algorithms complexities and an experimental comparison considering efficiency and effectiveness aspects. The experimental comparison contrasts the performances of the descriptors in small-scale datasets and in a large heterogeneous database containing more than 230 thousand images. Although there is a significant correlation between descriptors performances in the two settings, there are notable deviations, which must be taken into account when selecting the descriptors for large-scale tasks. An analysis of the correlation is provided for the best descriptors, which hints at the best opportunities of their use in combination.

Encoding spatial arrangement of visual words

PENATTI, O. A. B. ; VALLE, E. ; TORRES, R. da S.

In: Iberoamerican Congress on Pattern Recognition (CIARP), Pucón, Chile, 2011, p. 240-247 (LNCS 7042)
Abstract (Best Paper Award) | Paper
This paper presents a new approach to encode spatial-relationship information of visual words in the well-known visual dictionary model. The current most popular approach to describe images based on visual words is by means of bags-of-words which do not encode any spatial information. We propose a graceful way to capture spatial-relationship information of visual words that encodes the spatial arrangement of every visual word in an image. Our experiments show the importance of the spatial information of visual words for image classification and show the gain in classification accuracy when using the new method. The proposed approach creates opportunities for further improvements in image description under the visual dictionary model.

User-oriented evaluation of color descriptors for Web image retrieval

PENATTI, O. A. B. ; TORRES, R. da S.

In: European Conference on Research and Advanced Technology for Digital Libraries (ECDL), Glasgow, Scotland, 2010, p. 486-489.
Abstract | Paper
This paper proposes a methodology for effectiveness evaluation in content-based image retrieval systems. The methodology is based on the opinion of real users. This paper also presents the results of using this methodology to evaluate color descriptors for Web image retrieval. The experiments were performed using a database containing more than 230 thousand heterogeneous images that represents the existing content on the Web.

Eva - An Evaluation Tool for Comparing Descriptors in Content-based Image Retrieval Tasks

PENATTI, O. A. B. ; TORRES, R. da S.

In: 11th ACM SIGMM International Conference on Multimedia Information Retrieval (MIR), Philadelphia, Pennsylvania, USA, 2010, p. 413-416.
Abstract | Paper | Eva tool
This paper presents Eva, a tool for evaluating image descriptors for content-based image retrieval. Eva integrates the most common stages of an image retrieval process and provides functionalities to facilitate the comparison of image descriptors in the context of content-based image retrieval. Eva supports the management of image descriptors and image collections and creates a standardized environment to run comparative experiments using them.

Evaluating the Potential of Texture and Color Descriptors for Remote Sensing Image Retrieval and Classification

SANTOS, J. A. dos ; PENATTI, O. A. B. ; TORRES, R. da S.

In: Proceedings of International Conference on Computer Vision Theory and Applications (VISAPP), Angers, France, 2010, p. 203-210.
Abstract | Paper
Classifying Remote Sensing Images (RSI) is a hard task. There are automatic approaches whose results normally need to be revised. The identification and polygon extraction tasks usually rely on applying classification strategies that exploit visual aspects related to spectral and texture patterns identified in RSI regions. There are a lot of image descriptors proposed in the literature for content-based image retrieval purposes that can be useful for RSI classification. This paper presents a comparative study to evaluate the potential of using successful color and texture image descriptors for remote sensing retrieval and classification. Seven descriptors that encode texture information and twelve color descriptors that can be used to encode spectral information were selected. We perform experiments to evaluate the effectiveness of these descriptors, considering image retrieval and classification tasks. To evaluate descriptors in classification tasks, we also propose a methodology based on KNN classifier. Experiments demonstrate that Joint Auto-Correlogram (JAC), Color Bitmap, Invariant Steerable Pyramid Decomposition (SID) and Quantized Compound Change Histogram (QCCH) yield the best results.

Color Descriptors for Web Image Retrieval: a Comparative Study

PENATTI, O. A. B. ; TORRES, R. da S.

In: XXI Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI), Campo Grande, MS, Brazil, 2008, p. 163-170.
Abstract | Paper
This paper presents a comparative study of color descriptors for content-based image retrieval on the Web. Several image descriptors were compared theoretically and the most relevant ones were implemented and tested in two different databases. The main goal was to find out the best descriptors for Web image retrieval. Descriptors are compared according to the extraction and distance functions complexities, the compactness of feature vectors, and the ability to retrieve relevant images.

Recuperação de Imagens: Desafios e Novos Rumos (Image retrieval: Challenges and new trends)

TORRES, R. da S. ; ZEGARRA, J. A. M. ; SANTOS, J. A. ; FERREIRA, C. D. ; PENATTI, O. A. B. ; ANDALO, F. A. ; ALMEIDA JUNIOR, J. G.

In: XXXV Seminário Integrado de Software e Hardware, Belém, PA, Brazil, 2008, p. 223-237.
Abstract | Paper
Huge image collections have been created, managed and stored into image databases. Given the large size of these collections it is essential to provide efficient and effective mechanisms to retrieve images. This is the objective of the so-called content-based image retrieval (CBIR) systems. Traditionally, these systems are based on objective criteria to represent and compare images. However, users of CBIR systems tend to use subjective elements to compare images. The use of these elements have improved the effectiveness of content-based image retrieval systems. This paper discusses approaches that incorporate semantic information into content-based image retrieval process, highlighting some new challenges on this area.

Spatial relationship descriptor based on partitions (Descritor de Relacionamento Espacial Baseado em Partições)

PENATTI, O. A. B. ; TORRES, R. da S.

In: Electronic Magazine of Undergraduate Research Projects (REIC-SBC), v. VII, p. 3, 2007. In Portuguese
Abstract (in pt-br) | Paper
Neste trabalho, propomos um novo descritor de relacionamento espacial para recuperação de imagens por conteúdo. Relacionamentos espaciais podem ser fundamentais para o reconhecimento e recuperação de imagens beneficiando aplicações geográficas e médicas, por exemplo. O novo descritor apresentado se baseia no particionamento do espaço em análise em quadrantes e na contagem da ocorrência de pontos do objeto de interesse em cada quadrante. O experimentos comparam o descritor proposto com descritores da literatura. Os resultados mostram que o novo descritor é mais eficaz que importantes descritores da literatura.

Working notes:

[MediaEval 2017 - Satellite Task] Data-Driven Flood Detection using Neural Networks

NOGUEIRA, K. ; FADEL, S. G. ; DOURADO, I. C. ; WERNECK, R. de O. ; MUNOZ, J. A. V. ; PENATTI, O. A. B. ; CALUMBY, R. T. ; LI, L. T. ; SANTOS, J. A. dos ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Dublin, Ireland, 2017.
PDF | Best results in FDSI sub-task and best AP@480 on 2 runs (textual only and textual+visual) of DIRSM sub-task

[MediaEval 2016 - Placing Task] RECOD @ Placing Task of MediaEval 2016: A Ranking Fusion Approach for Geographic-Location Prediction of Multimedia Objects

MUNOZ, J. A. V. ; LI, L. T. ; DOURADO, I. C. ; NOGUEIRA, K. ; FADEL, S. G. ; PENATTI, O. A. B. ; ALMEIDA, J. ; PEREIRA, L. A. M. ; CALUMBY, R. T. ; SANTOS, J. A. dos ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Hilversum, Netherlands, 2016, v. 1739.
PDF

[MediaEval 2016 - Diversity Task] Recod @ MediaEval 2016: Diverse Social Images Retrieval

FERREIRA, C. D. ; CALUMBY, R. T. ; ARAUJO, I. B. A. C. ; DOURADO, I. C. ; MUNOZ, J. A. V. ; PENATTI, O. A. B. ; LI, L. T. ; ALMEIDA, J. ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Hilversum, Netherlands, 2016, v. 1739.
PDF

[MediaEval 2015 - Placing Task] RECOD @ Placing Task of MediaEval 2015

LI, L. T. ; MUNOZ, J. A. V. ; ALMEIDA, J. ; CALUMBY, R. T. ; PENATTI, O. A. B. ; DOURADO, I. C. ; NOGUEIRA, K. ; MENDES JR, P. R. ; PEREIRA, L. A. M. ; PEDRONETTE, D. C. G. ; SANTOS, J. A. dos ; GONÇALVES, M. A. ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Wurzen, Germany, 2015, v. 1463.
PDF

[MediaEval 2015 - Diversity Task] Recod @ MediaEval 2015: Diverse Social Images Retrieval

CALUMBY, R. T. ; ARAUJO, I. B. A. do C. ; SANTANA, V. P. ; MUNOZ, J. A. V. ; PENATTI, O. A. B. ; LI, L. T. ; ALMEIDA, J. ; CHIACHIA G. ; GONÇALVES, M. A. ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Wurzen, Germany, 2015, v. 1463.
PDF

[MediaEval 2014 - Placing Task] Multimedia Geocoding: The RECOD 2014 Approach

LI, L. T. ; PENATTI, O. A. B. ; ALMEIDA, J. ; CHIACHIA G. ; CALUMBY, R. T. ; MENDES JR, P. R. ; PEDRONETTE, D. C. G. ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Barcelona, Spain, 2014, v. 1263.
PDF | Best overall results for 100m precision range & Distinctive Mention for multimodality

[MediaEval 2014 - Diversity Task] Recod @ MediaEval 2014: Diverse Social Images Retrieval

CALUMBY, R. T. ; SANTANA, V. P. ; CORDEIRO, F. S. ; PENATTI, O. A. B. ; LI, L. T. ; CHIACHIA G. ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Barcelona, Spain, 2014, v. 1263.
PDF

[MediaEval 2013 - Placing Task] Multimodal Image Geocoding: The 2013 RECOD's Approach

LI, T. L. ; ALMEIDA, J. ; PENATTI, O. A. B. ; CALUMBY, R. T. ; PEDRONETTE, D. C. G. ; GONÇALVES, M. A. ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Barcelona, Spain, 2013, v. 1043.
PDF

[MediaEval 2012 - Placing Task] A Multimodal Approach for Video Geocoding

LI, L. T. ; ALMEIDA, J. ; PEDRONETTE, D. C. G. , PENATTI, O. A. B. ; TORRES, R. da S.

In: Working Notes Proceedings of the MediaEval Workshop, Pisa, Italy, 2012, v. 927.
PDF | Best results of visual-based geocoding

[MediaEval 2012 - Tagging Task] UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task

ALMEIDA J. ; SALLES, T. ; MARTINS, E. F. ; PENATTI, O. A. B. ; TORRES, R. da S. ; GONÇALVES, M. A. ; ALMEIDA J. M.

In: Working Notes Proceedings of the MediaEval Workshop, Pisa, Italy, 2012, v. 927.
PDF

Courses/Teaching: (in Portuguese)

Machine Learning and Computer Vision - An Introduction to Artificial Intelligence
IDP Open Class
November 2020
Unicamp - Programa de Estágio Docente (PED)
Disciplina MC102: Algoritmos e Programação de Computadores
1º semestre 2010
Turmas: K e L (Engenharia de Alimentos)
Unicamp - Programa de Estágio Docente (PED)
Disciplina MC102: Algoritmos e Programação de Computadores
1º semestre 2008
Turmas: Q, R, S e T (Engenharia de Computação)
Unicamp - Programa de Estágio Docente (PED)
Disciplina MC102: Algoritmos e Programação de Computadores
2º semestre 2007
Turmas: O e P (Engenharia Química)

Extras

Here you find some old things that I created.

Mercadão Unicamp - Site created with the objective of facilitating the selling and buying of products among students from Unicamp.
Image Converter - Converts images to HTML tables or to text. I have made this converter to learn and practice some PHP image functions.
Dots Game - team work for the Artificial Intelligence discipline during my undergraduate course at Unicamp - 2006
XSLT Tutorial - published in 2005, March 15 in Dicas-L website
XPath Tutorial - published in 2005, March 14 in Dicas-L website
Renaming Multiple Files - published in 2005, March 05 in Dicas-L website
Multiple searches in a given web site - published in 2004, December 08 in Dicas-L website

Contact:

If you want to be in touch, please leave me a message on LinkedIn.

Disclaimer: This page is not an official Samsung publication. Its content is of responsibility of Otávio Penatti.