Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

projects

Harnessing Big Data to Inform Tourism Destination Management Organizations

Published:

This project studied the potential of Big data to inform destination management organizations. To do so, three sources of Big data are discussed: Telecom, Social media and Airbnb data. This is done through the demonstration and analysis of a set of visualizations and tools, as well as a discussion of applications and recommendations for challenges that have been identified in the market.

IPSTERS - IPSentinel Terrestrial Enhanced Recognition System

Published:

This project focused on the exploration of several machine learning (ML) techniques, covering different stages of a Land Use/Land Cover Classification (LULC) pipeline. These techniques aimed to minimise problems typically found in this kind of data, namely data ingestion, feature selection, data filtering and classification. This work was a joint effort between me and Manvel Khudinyan.

Amphan - Analyzing Experiences of Extreme Weather Events using Online Data

Published:

Cyclone Amphan made landfall in South Asia on May 20, 2020. It was the most damaging storm in the history of the Indian Ocean, rendering hundreds of thousands of people homeless, ravaging agricultural lands and causing billions of dollars in damage. How were people affected by the storm? What were the responses of individuals, governments, corporates and NGOs? How was it covered by local, national and international media, as opposed to individuals’ accounts? Who has created the dominant narratives of Cyclone Amphan; and whose voices go unheard? We aim to use online data – such as Twitter posts, news headlines and research publications – to analyze people’s experiences of Cyclone Amphan.

MapIntel - Interactive Visual Analytics Platform for Competitive Intelligence

Published:

This project aims to develop a Competitive Intelligence platform through Natural Language Processing and various visualization techniques. We employ text preprocessing and embedding techniques to encode a large corpus of text as well as Self-Organizing Maps, an unsupervised neural network that facilitates the development of multiple machine learning tasks and visualize high dimensional data.

ML-Research - An Open Source Library for Machine Learning Research

Published:

ML-Research contains the software implementation of most algorithms used or developed in my research. Specifically, it contains scikit-learn compatible implementations for Active Learning, Oversampling, Datasets and various utilities to assist in experiment design and results reporting. Other techniques, such as self-supervised learning and semi-supervised learning are currently under development and are being implemented in pytorch and intended to be scikit-learn compatible.

publications

Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm

Published in Remote Sensing, 2019

In this paper, we address the imbalanced learning problem, a common and difficult conundrum in remote sensing that affects the quality of classification results, by proposing Geometric-SMOTE, a novel oversampling method, as a tool for addressing the imbalanced learning problem in remote sensing.

Recommended citation: Douzas, G., Bacao, F., Fonseca, J., & Khudinyan, M. (2019). Imbalanced Learning in Land Cover Classification: Improving Minority Classes’ Prediction Accuracy Using the Geometric SMOTE Algorithm. Remote Sensing, 11(24), 3040. https://doi.org/10.3390/rs11243040

Narratives and Needs: Analyzing Experiences of Cyclone Amphan Using Twitter Discourse

Published in IJCAI 2021 Workshop on AI for Social Good, 2021

In this paper, we contribute two novel methodologies that leverage Twitter discourse to characterize narratives and identify unmet needs in response to Cyclone Amphan, which affected 18 million people in May 2020.

Recommended citation: Crayton A, Fonseca J, Mehra K, Ng M, Ross J, Sandoval-Castañeda M, von Gnecht R. (2021). Narratives and Needs: Analyzing Experiences of Cyclone Amphan Using Twitter Discourse, in IJCAI 2021 Workshop on AI for Social Good. https://crcs.seas.harvard.edu/publications/narratives-and-needs-analyzing-experiences-cyclone-amphan-using-twitter-discourse

Improving Imbalanced Land Cover Classification with K-means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures

Published in Information, 2021

In this paper, we address the imbalanced learning problem, by using K-means and the Synthetic Minority Oversampling TEchnique (SMOTE) as an improved oversampling algorithm. K-Means SMOTE improves the quality of newly created artificial data by addressing both the between-class imbalance, as traditional oversamplers do, but also the within-class imbalance, avoiding the generation of noisy data while effectively overcoming data imbalance.

Recommended citation: Fonseca, J., Douzas, G., Bacao, F. (2021). Improving Imbalanced Land Cover Classification with K-Means SMOTE: Detecting and Oversampling Distinctive Minority Spectral Signatures. Information, 12(7), 266. https://doi.org/10.3390/info12070266

Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification

Published in Remote Sensing, 2021

In this paper, we introduce a new component to the typical AL framework, the data generator, a source of artificial data to reduce the amount of user-labeled data required in AL.

Recommended citation: Fonseca, J., Douzas, G., Bacao, F. (2021). Increasing the Effectiveness of Active Learning: Introducing Artificial Data Generation in Active Learning for Land Use/Land Cover Classification. Remote Sensing, 13(13), 2619. https://doi.org/10.3390/rs13132619

Research Trends and Applications of Data Augmentation Algorithms

Published in arXiv, 2022

In this paper we identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.

Recommended citation: Fonseca, J., & Bacao, F. (2022). Research Trends and Applications of Data Augmentation Algorithms. arXiv preprint arXiv:2207.08817. https://arxiv.org/abs/2207.08817

Geometric SMOTE for Imbalanced Datasets with Nominal and Continuous Features

Published in UNDER SUBMISSION, 2023

In this paper, we propose Geometric SMOTE for Nominal and Continuous features (G-SMOTENC), based on a combination of G-SMOTE and SMOTENC.

Recommended citation: Fonseca, J., & Bacao, F. (2023). Geometric SMOTE for Imbalanced Datasets with Nominal and Continuous Features. Under Submission.

talks

teaching