Aug 4, 2021

UII America Research Results Published at Premier AI/Vision Conferences in 2021

Three papers from the UII America team have been accepted at the 2021 IEEE/CVF International Conference on Computer Vision (ICCV) and the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). ICCV and CVPR are premier computer vision events focused on disseminating cutting-edge research in the fields of computer vision and AI.

In “Ensemble Attention Distillation for Privacy-Preserving Federated Learning”, the UII America team developed a new technique to train machine learning models from multiple decentralized compute nodes while preserving the privacy of local data at these nodes. One application of this technique is to teach a “central” machine learning model in a multi-hospital setting while ensuring local proprietary hospital data stays strictly within the hospital, a critical consideration in preserving data privacy. To achieve this requirement, the researchers relied only on data that was already in the public domain (e.g., standard benchmark datasets) and proposed new techniques to “distill” information from multiple machine learning models (i.e., one at each hospital) into the central machine learning model.

Figure 1: An illustration of the proposed privacy-preserving federated learning framework compared to traditional update or parameter-sharing based federated learning framework.

In “Spatio-Temporal Representation Factorization for Video-based Person Re-Identification”, our scientists developed a new technique to automatically detect a person of interest in crowded places like the lobby of a hospital or a convention center. Existing techniques that address this problem typically fail to give satisfactory results in the presence of many commonly occurring real-world challenges such as occlusions or similar appearances (e.g., multiple people may wear similar-looking clothes in scenarios that warrant a dress code).  To address these challenges, our team proposed a new way of learning a factorized feature representation (comprising spatial and temporal factors) that trains models to adjust the weight of constituent factors appropriately based on the scenario (e.g., with images of people wearing similar clothes, the model would weigh temporal factors as more important like walking patterns).

In “A Peek into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts”, our scientists developed a generic framework, called VRX, that can equip existing deep learning models in computer vision with structural reasoning capability. Our team demonstrated proof-of-concept results with numerous popular computer vision classification models, showing VRX can help answer questions such as “why did the neural network predict an image as class A instead of class B?”. By providing reasoning for any incorrect predictions made by the model, VRX enables deep learning practitioners with a tool to diagnose failures of the neural network and improve its performance, an especially important tool in situations where mistakes can be critical like in the medical field.

Figure 2. Overview of the Interpretation Process