Saqib Azim

CV / Github / LinkedIn

Hey there! I am currently working as an AI/ML Engineer at the San Diego Supercomputer Center, UC San Diego. I completed my Masters from UC San Diego in 2023, where I was a student researcher at the Existential Robotics Lab within the Contextual Robotics Institute working under the supervision of Prof. Nikolay Atanasov. Previously, I was an assistant researcher in the Intelligent Vision Research Team at Hitachi R&D Group in Japan, where I was advised by Dr. Katsuyuki Nakamura. My research interests are in machine learning, reinforcement learning, computer vision, and robotics.

Prior to this, I graduated from IIT Bombay, earning B.Tech in Electrical Engineering with Minor in Computer Science, and received Undergraduate Research Award 2019. I have also interned at Samsung R&D Institute (Summer 2018).

Email: sazim@ucsd.edu

Research Interests

I am broadly interested in the field of machine learning, computer vision, and robot learning, which arises from my fascination with discovering similarities between human learning and artificial intelligence. As a remarkable product of evolution, humans can serve as a blueprint for the generalization and adaptation of neural agents. My research aims to develop AI algorithms that can be seamlessly implemented into real-world systems, enabling them to learn from human demonstrations and advance through self-supervised learning and curiosity. In my view, the future of AI lies in the development of flexible systems that require minimal supervision and have the ability to learn continuously throughout their lifespan.

Research Projects

	Robotic Manipulation using Adversarial Imitation Learning Saqib Azim, Nikolay Atanasov
	Visual Localization in Dynamic Environments with Targeted Inference SLAM Saqib Azim, Takumi Nito, Katsuyuki Nakamura Japan Patent Application Filed in Aug 2021 (pending) report / presentation
	Near Real-Time WildFire Damage Assessment using Aerial Thermal Imagery and Machine Learning Saqib Azim, Mai Nguyen, Daniel Crawl, Jessica Block, Rawaf Al Rawaf, Francesca Hart, Mark Campbell, Robert Scott, Ilkay Altintas IEEE International Conference on Big Data, 2024 paper
	Indoor Distance Estimation using LSTMs over WLAN Network Pranav Sankhe, Saqib Azim, Sachin Goyal, Tanya Choudhary, Kumar Appaiah, Sukumar Srikant IEEE 16th Workshop on Positioning, Navigation and Communications (WPNC), 2019 Indian Patent No. 467255, Awarded in November 2023 abstract / arXiv / paper / presentation The Global Navigation Satellite Systems (GNSS) like GPS suffer from accuracy degradation and are almost unavailable in indoor environments. Indoor positioning systems (IPS) based on WiFi signals have been gaining popularity. However, owing to the strong spatial and temporal variations of wireless communication channels in the indoor environment, the achieved accuracy of existing IPS is around several tens of centimeters. We present the detailed design and implementation of a self-adaptive WiFi-based indoor distance estimation system using LSTMs. The system is novel in its method of estimating with high accuracy the distance of an object by overcoming possible causes of channel variations and is self-adaptive to the changing environmental and surrounding conditions. The proposed design has been developed and physically realized over a WiFi network consisting of ESP8266 (NodeMCU) devices. The experiments were conducted in a real indoor environment while changing the surroundings in order to establish the adaptability of the system. We compare different architectures for this task based on LSTMs, CNNs, and fully connected networks (FCNs). We show that the LSTM based model performs better among all the above-mentioned architectures by achieving an accuracy of 5.85 cm with a confidence interval of 93% on the scale of (8.46 m × 6.98 m). To the best of our knowledge, the proposed method outperforms other methods reported in the literature by a significant margin
	Multiagent Pursuer-Evader Optimal Trajectory Estimation Advisor: Prof. Debraj Chakraborty abstract / thesis / presentation In this report, we proposed an interaction rule between an evader and a pursuer and our objective was to try to find an optimal feedback control for the pursuer to drive the evaders to destination. With this regard, we first formulated our problem as a constrained optimization problem and solved using global search algorithm available in global optimization toolbox of matlab. The result from these experiments were then used to predict a feedback control algorithm but unfortunately this could not be made possible. Then we shifted from predicting ourselves to let the machine learn from the data and predict the trajectory for us. We used LSTM-based model with fully connected layers and posed the problem as a regression task to produce pursuer next position given current and past trajectory information of all the agents. The experimental results from the optimization task was used as dataset for this approach. After training, the trajectories were estimated iteratively for numerous initial conditions but we could not get the desired result. This approach requires modifications in order for it to work.
	3D Handwritten Text Recognition using Smartwatch Machine Learning Intern at Samsung R&D Institute advised by Dr. Shankar Venkatesan summary As part of the text recognition team at the Advanced Technology Lab, I played a key role in developing a 3D handwritten text recognition system that estimated wrist and hand movements using smartwatch IMU sensors. One of the major challenges we faced was modeling sensor noise, which resulted in significant drifts in the generated characters. To mitigate this issue, I implemented adaptive frequency filters to preprocess the raw signals and improve the signal-to-noise ratio. I also designed the data collection procedures for training our system, utilizing a pipelined SVM and LSTM model to learn the relation between hand movements and character patterns, thus achieving an impressive 95% accuracy on unseen test data.

Miscellaneous Projects

	Survey of Autoregressive Models for Image and Video Generation Saqib Azim, Mehul Arora, Narayanan Ranganatha, Mahesh Kumar abstract / report This survey paper offers a comprehensive overview of recent advances in autoregressive (AR) models for image and video generation. It discusses state-of-the-art AR models like PixelCNN, PixelRNN, Gated PixelCNN, and PixelSNAIL, emphasizing their unique archi- tectures and contributions. The main challenge in AR models, handling long-range dependencies effectively, is addressed through various approaches, such as gated activations, self-attention mechanisms, and residual blocks. The paper presents Locally Masked Convolution and Autoregressive Diffusion Models as examples of order-agnostic approaches, improving upon traditional autoregressive models. Transformer-based networks are explored for autoregressive image generation, showcasing superior performance in image quality and synthesis tasks. Quantization-based models enhance image diversity and quality through feature quantization and variational regularization. The paper then discusses Autoregressive modeling in pixel space and latent space for video generation. The paper concludes by discussing the strengths, limitations, and future research directions in autoregressive models for image and video generation, providing valuable insights for researchers and practitioners.
	Speech Enhancement using Wavelet-based Convolutional-Recurrent Network Parthasarathi Kumar, Saqib Azim abstract / report / presentation In this project, we present an end-to-end data-driven system for enhancing the quality of speech signals using a convolutional-recurrent neural network. We present a quantitative and qualitative analysis of our speech enhancement system on a real-world noisy speech dataset and evaluate our proposed system's performance using several metrics such as SNR, PESQ, STOI, etc. We have employed wavelet pooling mechanism instead of max-pooling layer in the convolutional layer of our proposed model and compared the performances of these variants. Based on our experiments, we demonstrate that our model's performance on noisy speech signals using haar wavelet is better than when using max-pooling. In addition, wavelet based approach results in faster convergence during training as compared to other variants.
	Semantic Temporal Constrained Pose Estimation using Structure-from-Motion Narayanan Ranganatha, Saqib Azim, Mehul Arora, Mahesh Kumar abstract / report The objective of this project is to accurately estimate the 6D poses (position and orientation) of a monocular camera moving in an environment. We present an approach for visual pose estimation using the Structure from Motion (SfM) technique with temporally constrained frame matching and semantic assistance in the context of autonomous driving scenarios. We address the challenge of pose estimation in dynamic scene environments, which can introduce errors due to incorrect matching in the reconstruction of 3D scenes and the estimated trajectory using the SfM algorithm. Specifically, we use visual data from outdoor driving scenarios such as the KITTI dataset to evaluate our approach since accurate estimation of the car's pose in dynamic environments is crucial for autonomous driving applications. Our method contributes to this field by providing reliable and precise car pose information, thus advancing the development of autonomous driving systems.
	Particle-Filter SLAM and 2D Texture Mapping for Autonomous Navigation Saqib Azim abstract / report In this project, we have successfully developed a SLAM (Simultaneous Localization and Mapping) system that integrates particle filters for concurrent localization and mapping of environments. This system harnesses data from a variety of sensors including encoders, LIDAR, IMU, and an RGBD Kinect camera. The project is structured in two main phases. Initially, we apply a particle filter algorithm for environment localization and mapping, utilizing data solely from LIDAR, encoders, and IMU sensors. In the subsequent phase, we enhance the generated map by adding textural details. This is achieved by incorporating data from the RGBD Kinect camera mounted on the robot, alongside the optimized robot trajectory derived from the particle filter algorithm employed in the first phase. This two-pronged approach allows for a more detailed and accurate representation of the mapped environment.
	Proximal Policy Optimization (PPO) PyTorch Implementation abstract / code This repository offers a beginner-friendly, modular implementation of Proximal Policy Optimization (PPO) with a clipped objective in PyTorch, supporting both continuous and discrete action spaces. It includes YAML-based configuration for customizable hyperparameters and is compatible with OpenAI Gym environments like CartPole, LunarLander and HalfCheetah, providing a flexible framework for experimenting with reinforcement learning algorithms and adapting to custom environments.
	GPT-2 Implementation in PyTorch abstract / code This project involves implementing OpenAI's GPT-2 model from scratch in PyTorch, trained on the FineWebEdu dataset. The code is structured modularly, offering customizable hyperparameters for training and inference.
	Hazardous Activity Detection in Workplaces using Computer Vision Saqib Azim, Takumi Nito, Tomokazu Murakami Accepted at Hitachi Annual Research Symposium 2020
	Adversarial Robustness Analysis of Deep Learning Models Saqib Azim, Lily Weng summary We utilized attack methods such as FGSM, PGD, Auto-Attack to generate adversarial examples and conducted an empirical analysis of CLIP model's resilience to adversarial perturbations. I further developed robust CLIP-based classifier against L2-norm perturbations using adversarial training and randomized smoothing and evaluated the robust classifier on CIFAR10 and ImageNet datasets.
	TV Audience Measurement Challenge Saqib Azim, Pranav Sankhe, Sachin Goyal, Sanyam Khandelwal, Tanmay Patil Bronze Medal (3^rd / 23 teams) at the Inter-IIT Technical Meet 2018 summary / code / presentation Proposed scalable and robust solutions for various challenges put forward by BARC India such as channel identification, advertisement and content classification and recognition, age and gender recognition of viewers and providing hardware free solution in order to capture TV viewership data of the country
	Zero-Shot Learning for Object Recognition Advisor: Prof. Subhasis Chaudhuri summary / code Proposed a semi-supervised VGG16-based encoder-decoder network to learn visual-to-semantic space mapping using novel combination of margin-based hinge-rank loss and Word2Vec embeddings. Explored multiple networks for better visual feature representations. Achieved improvement in recognition performance from 58.7% to 65.3% on the Animals with Attributes dataset over existing methods.
	Photoplethysmogram (PPG) Signal Acquisition Module Saqib Azim, Pranav Sankhe, Ritik Madan abstract / report A photoplethysmogram(PPG) is an optically obtained plethysmogram, a volumetric measurement of an organ. With each cardiac cycle the heart pumps blood to the periphery. The change in the volume caused by the blood is detected by illuminating the skin with IR light. We developed and implemented an electronic system to capture and display the PPG signal. We make infrared(IR) light incident on finger tip and measure the reflected IR light using a phototransistor which contains the PPG signal. The raw PPG signal is in the form of current output of the phototransistor, typically [0.2-0.4] mA, and we use a current to voltage converter to get the voltage signal. The raw PPG often has a large slowly varying baseline and it needs to be restored to optimally use the available ADC range. We carry out baseline restoration by controlling the bias voltage of the current injector using a microcontroller. We amplify the signal using a fixed value of gain resistor in the current to voltage converter. We also designed an auto-led intensity control to control the LED current and hence the emitted IR light in an effort to make the acquisition module adaptable to users with varying skin colours, motion artifacts etc. Finally we display the PPG signal on an android smartphone by transmitting the PPG signal over bluetooth.

Teaching and Mentoring Experience

	Graduate Teaching Assistant, UC San Diego [1] DSC 140A - Probabilistic Modeling and Machine Learning (Spring 2023) by Prof. Berk Ustun [2] CSE 166 - Image Processing (Winter 2023) by Prof. Ben Ochoa [3] ECE 225A - Probability and Statistics for Data Science (Fall 2022) by Prof. Alon Orlitsky [4] ECE 109 - Engineering Probability and Statistics (Spring 2022) by Prof. Alon Orlitsky [5] ECE 101 - Linear Signals and Systems (Winter 2022) by Prof. Saharnaz Baghdadchi
	Teaching Assistant, IIT Bombay [1] EE 210 - Signals and Systems (Spring 2019) by Prof. J.K. Nair
	Teaching Volunteer, National Service Scheme, IIT Bombay Taught Science and Mathematics to underprivileged students under Education Outreach Program during 2015 - 16
	Mentor, Summer of Science (2019, 2020), IIT Bombay Guided undergraduate and graduate students to develop and successfully complete projects in the areas of Machine Learning, Computer Vision, and Image Processing.

Other Contributions

	Member, Autonomous Car Team @ IIT Bombay summary / video Worked on the vision and navigation pipelines of an autonomous car. Proposed a compute-efficient image processing algorithm to mitigate the effects of shadows and varying lighting conditions on roads. Managed the collection and annotation of a road dataset used to train our deep learning framework for road and obstacle detection.
	Kivy, KivEnt (open-source platforms for Python native UI development) Introduced several new features, worked on map integration for KivEnt game engine interfaces, resolved multiple software bugs and issues.