Projects

Wav2Vec-U: Unsupervised Speech Recognition with GANs : Ablation Study

Replicated the 2021 paper “Unsupervised Speech Recognition”, which developed a method to train speech recognition models without labeled data, focusing on GANs and Autoencoders. Worked on improving the baseline performance of the model. Performed experiments to improve the baseline performance. Achieved a Levenshtein distance of 20.12 on the test dataset.

Report Code

Skills: Generative Adversarial Networks (GANs) · PyTorch · Fairseq · Variational Autoencoders (VAEs) · Speech Recognition

Attention-based End-to-End Speech-to-Text Deep Neural Network

Learnt to build an encoder to effectively extract features from a speech signal, constructed a decoder that can sequentially spell out the transcription of the audio, and implemented an attention mechanism between the decoder and the encoder.Learnt to build an encoder to effectively extract features from a speech signal, constructed a decoder that can sequentially spell out the transcription of the audio, and implemented an attention mechanism between the decoder and the encoder. Worked on improving the performance of the baseline LAS (Listen, Attend, Spell) model.

Report Code

Skills: Attention Mechanism · Teacher Forcing · Beam Search

Few-shot Object Detection in Valorant-a first-person Game

This project addresses the issue by introducing few-shot object detection methods tailored for Valorant, a three-dimensional first-person game by Riot Games. Given the ever-evolving nature of game content, our objective is to develop a detector capable of adapting swiftly to new categories with limited training data, while capitalizing on knowledge acquired from a well-annotated base set. The proposed approach involves fine-tuning a pre-trained Faster-RCNN object detector with 20 images from 10 distinct Valorant characters. Three training methods are explored: training with no frozen layers, training with all frozen layers except the last, and a novel method named Multiplicative Layer-wise Learning Rates (MLLR). Our findings reveal that the MLLR method outperforms the other two approaches on the custom dataset.This project addresses the issue by introducing few-shot object detection methods tailored for Valorant, a three-dimensional first-person game by Riot Games. Given the ever-evolving nature of game content, our objective is to develop a detector capable of adapting swiftly to new categories with limited training data, while capitalizing on knowledge acquired from a well-annotated base set. The proposed approach involves fine-tuning a pre-trained Faster-RCNN object detector with 20 images from 10 distinct Valorant characters. Three training methods are explored: training with no frozen layers, training with all frozen layers except the last, and a novel method named Multiplicative Layer-wise Learning Rates (MLLR). Our findings reveal that the MLLR method outperforms the other two approaches on the custom dataset.
Report

Skills: Computer Vision · Few-shot Object Detection · Google Cloud Platform (GCP) · Version Control

Speech Recognition by Predicting Phonemes

Implemented RNNs and the dynamic programming algorithm, Connectionist Temporal Classification, to generate labels.

Face Classification and Verification using CNN

In this kaggle challenge, worked on implementing pattern recognition problems that require position invariance. Specifically on the problem of recognizing or verifying faces in images. Ranked among the top 10% of the competition. Kaggle

Image Registration in Medical Imaging using CNN

The project is designed to find and understand ways that deep learning is utilized to help and improve image registration, specifically for medical imaging. Medical imaging is a great tool for getting information about the inside of the body, both in structure and function. I believe that some of the main areas of improvement that are constantly being researched in this field include quality and usefulness of the images obtained. This also expands to 3D modeling, registration, segmentation, and other methods to gain more information about the patient.The project is designed to find and understand ways that deep learning is utilized to help and improve image registration, specifically for medical imaging. Medical imaging is a great tool for getting information about the inside of the body, both in structure and function. I believe that some of the main areas of improvement that are constantly being researched in this field include quality and usefulness of the images obtained. This also expands to 3D modeling, registration, segmentation, and other methods to gain more information about the patient.

Report Code

Skills: Image Processing · Computation Modeling

Disease Modelling and Analysis of Meningitis

Presented and implemented a summary of different computational approaches aimed at describing and modeling the effect of different factors on mortality rate of meningitis around the world.

Slides

Game Theory in Python

Implemented several game theory models to explain the neuroscience of decision making. Simulated game theory models such as Axelrod and Chicken Game in python.

Phylogenetic Relationship of Conserved and Non-Conserved Proteins across Model Organisms

This study compares both structures and sequences of one conserved and one non-conserved protein different species across a broad range of phylogeny. Histone protein responsible for DNA packaging is the subject of the study. Histones H1 and H4 are considered as most and least conserved among all histone proteins respectively. This study also provides an idea on how the sequence of protein and their structures have changed over time and adapted to the environment.

Return to Main Page

Adrita Das

Projects