Projects
Wav2Vec-U: Unsupervised Speech Recognition with GANs : Ablation Study
Replicated the 2021 paper “Unsupervised Speech Recognition”, which developed a method to train speech recognition models without labeled data, focusing on GANs and Autoencoders. Worked on improving the baseline performance of the model. Performed experiments to improve the baseline performance. Achieved a Levenshtein distance of 20.12 on the test dataset.
Skills: Generative Adversarial Networks (GANs) · PyTorch · Fairseq · Variational Autoencoders (VAEs) · Speech Recognition
Attention-based End-to-End Speech-to-Text Deep Neural Network
Learnt to build an encoder to effectively extract features from a speech signal, constructed a decoder that can sequentially spell out the transcription of the audio, and implemented an attention mechanism between the decoder and the encoder.Learnt to build an encoder to effectively extract features from a speech signal, constructed a decoder that can sequentially spell out the transcription of the audio, and implemented an attention mechanism between the decoder and the encoder. Worked on improving the performance of the baseline LAS (Listen, Attend, Spell) model.
Skills: Attention Mechanism · Teacher Forcing · Beam Search
Few-shot Object Detection in Valorant-a first-person Game
This project addresses the issue by introducing few-shot object detection methods tailored for Valorant, a three-dimensional first-person game by Riot Games. Given the ever-evolving nature of game content, our objective is to develop a detector capable of adapting swiftly to new categories with limited training data, while capitalizing on knowledge acquired from a well-annotated base set. The proposed approach involves fine-tuning a pre-trained Faster-RCNN object detector with 20 images from 10 distinct Valorant characters. Three training methods are explored: training with no frozen layers, training with all frozen layers except the last, and a novel method named Multiplicative Layer-wise Learning Rates (MLLR). Our findings reveal that the MLLR method outperforms the other two approaches on the custom dataset.This project addresses the issue by introducing few-shot object detection methods tailored for Valorant, a three-dimensional first-person game by Riot Games. Given the ever-evolving nature of game content, our objective is to develop a detector capable of adapting swiftly to new categories with limited training data, while capitalizing on knowledge acquired from a well-annotated base set. The proposed approach involves fine-tuning a pre-trained Faster-RCNN object detector with 20 images from 10 distinct Valorant characters. Three training methods are explored: training with no frozen layers, training with all frozen layers except the last, and a novel method named Multiplicative Layer-wise Learning Rates (MLLR). Our findings reveal that the MLLR method outperforms the other two approaches on the custom dataset.
Report
Skills: Computer Vision · Few-shot Object Detection · Google Cloud Platform (GCP) · Version Control
Speech Recognition by Predicting Phonemes
Implemented RNNs and the dynamic programming algorithm, Connectionist Temporal Classification, to generate labels.
Face Classification and Verification using CNN
In this kaggle challenge, worked on implementing pattern recognition problems that require position invariance. Specifically on the problem of recognizing or verifying faces in images. Ranked among the top 10% of the competition. Kaggle
Image Registration in Medical Imaging using CNN
The project is designed to find and understand ways that deep learning is utilized to help and improve image registration, specifically for medical imaging. Medical imaging is a great tool for getting information about the inside of the body, both in structure and function. I believe that some of the main areas of improvement that are constantly being researched in this field include quality and usefulness of the images obtained. This also expands to 3D modeling, registration, segmentation, and other methods to gain more information about the patient.The project is designed to find and understand ways that deep learning is utilized to help and improve image registration, specifically for medical imaging. Medical imaging is a great tool for getting information about the inside of the body, both in structure and function. I believe that some of the main areas of improvement that are constantly being researched in this field include quality and usefulness of the images obtained. This also expands to 3D modeling, registration, segmentation, and other methods to gain more information about the patient.
Skills: Image Processing · Computation Modeling
Disease Modelling and Analysis of Meningitis
Presented and implemented a summary of different computational approaches aimed at describing and modeling the effect of different factors on mortality rate of meningitis around the world.
Game Theory in Python
Implemented several game theory models to explain the neuroscience of decision making. Simulated game theory models such as Axelrod and Chicken Game in python.
Phylogenetic Relationship of Conserved and Non-Conserved Proteins across Model Organisms
This study compares both structures and sequences of one conserved and one non-conserved protein different species across a broad range of phylogeny. Histone protein responsible for DNA packaging is the subject of the study. Histones H1 and H4 are considered as most and least conserved among all histone proteins respectively. This study also provides an idea on how the sequence of protein and their structures have changed over time and adapted to the environment.