I'm Isha Arora

Let's start scrolling and learn more about me.

About Me

Recently graduated from Northeastern University with my Masters in Data Science, I particularly enjoy working with data and trying to understand how I could solve problems in all spheres using Machine Learning and Data Science. Innovative by nature, I am extremely passionate about what I do. I have worked on state-of-the-art Computer Vision Models (EfficientNet, InceptionV3, and ResNet for Cancer Detection and Classification). I have also interned at Massachusetts General Hospital as a Research Student where I primarily worked on statistical analyses and recently published a paper about establishing the validity of a self-reporting questionnaire for Childbirth-related Posttraumatic Stress Disorder.

In my downtime, I love cooking, and trying out new cuisines (I'm on a roll trying to cook more East Asian cuisines). Reading has always been one of my favorite things to do (although, I must admit, I have a lot of reading to catch up on if I ever intend on completing my list).

Also, something that I have decided quite recently is trying to learn one new thing, be it about work like machine learning or some random trivia (the more you know, I guess).

I'm currently on the lookout for my next opportunity.


Contact Details

Isha Arora
Boston, MA US
arora.isha4128@gmail.com

Skills

I have experimented a lot with the aim of finding my niche, and have thus gained varied skills along the way. Some of my major skills are highlighted below.

Python
R
R Studio
SQL
PostgreSQL
MySQL
Oracle PL/SQL
Research
Data Curation
Data Cleaning
Data Analysis
Data Interpretation
Data Visualization
Statistical Analysis
Statistical Modeling
Supervised Machine Learning
Unsupervised Machine Learning
Deep Learning
Image Processing
Computer Vision
Predictive Modeling
Time-Series Modeling
TensorFlow
Keras
Torch
OpenCV
sklearn
plotly
Linear Regression
Logistic Regression
Random Forest
Naïve Bayes
Support Vector Machine
Clustering
Natural Language Processing
Amazon Web Services
Microsoft Azure
Google Cloud
Microsoft Office
C/C++
Java
Git
Tableau
PowerBI
OpenRefine
SPSS
SAS
VBA
SpringBoot

Career

Data Analyst

Massachusetts General Hospital - The Dekel Laboratory
May 2024 - Present
January 2023 - August 2023

  • Working with Dr. Sharon Dekel and Dr. Kathleen Jagodnik on a research focused on developing a Machine Learning approach to identify the risk factors for childbirth-associated posttraumatic stress disorder (CB-PTSD).
  • Identified data sources from the Massachusetts General Hospital (MGH) that not only hosted the data required but also remained compliant with the IRB protocol created for the study.
  • Furthermore, cleaned the data collected, prepared it for model deployment, and read literature reviews to understand the extent of existing work and designing a new model.
  • Analyzed and researched PTSD reporting metrics involving data analysis and in-depth use of statistical computational methods. More about this study can be read in our paper here.
  • Conducted comprehensive analysis to investigate the impact of partner deployment during wartime on the mental health of expecting and postpartum women, and examined the influence of delivery mode on postpartum mothers' well-being
  • Currently, analyzing how Childbirth PTSD, Depression, and Maternal-Infant Bonding along with Maternal-Fetal Bonding are related using data collected from around 1200 MGH patients.

Data Science Research Assistant

Northeastern University - The Amal Lab for Precision Medicine
February 2023 - November 2024

  • Working with Prof. Saeed Amal, on using Artificial Intelligence and Digital Pathology for Cancer Care.
  • Initially focused on classifying Prostate Cancer using Whole Slide Images, I focused on reviewing state-of-the-art models like EfficientNet (b0-b7), Inception_V3, ResNet-34, ResNet-50, and PROMETEO (a CNN model specifically designed to test for PCa).
  • Achieved an agreement of 0.66 QWK and 0.81 weighted accuracy using a customized EfficientNet-b1 model on the PANDA Challenge dataset.
  • Furthermore, worked on model transferability to other cancers like Breast and Gastric Cancer.
  • Attained a 0.99 weighted accuracy on the binary classification of the BreakHis dataset - a Breast cancer dataset, using a VGG16-ResNet50 Ensemble model and contributed to a paper, more about it can be read here.
  • Extended use of VGG-16, ResNet-34, ResNet-50, EfficientNet, and Ensemble models on the Gastric Cancer dataset GasHisSDB achieving an accuracy of around 0.99 across all image resolutions using the best-peforming Ensemble model.
    More about this paper here.

Associate Engineer

Virtusa Consulting Services Pvt. Ltd.
August 2020 - August 2021

  • Collaborated on a project with Wolters Kluwer USA New York office working on a project that involved creating a system that would host all US laws (for all states) and regulations in the banking and insurance industries.
  • Worked in an Agile setting on cleaning and developing the database in PostgreSQL that held all laws and regulations. The data sent in by the client though was extremely noisy and I learned OpenRefine to clean the dataset before adding it to the database.
  • The project helped improve client efficiency and accelerated lookup time by at least 60% in the following 5 months.

Data Analytics Intern

Financial Software and Systems Pvt. Ltd.
December 2019 - May 2020

  • Worked on a spam detection project for the reviews of the Google Play Store historical banking application developed by the company for Vijaya Bank.
  • Pulled 4480 noisy reviews and manually labeled them as spam or not, creating a supervised problem.
  • Performed vectorization on the reviews using Bag-of-Words and TF-IDF and trained the Naïve Bayes and Decision Tree Classifier for the binary classification problem, achieving a testing accuracy of 61%.
  • Performed sentiment analysis on the non-spam reviews using the VADER algorithm and found that around 50% of the non-spam reviews were positive.

Project Trainee

Tata Consultancy Services
May 2019 - July 2019

  • Helped create a system to automatically redirect incoming email-based tickets created for the Tata Capital Project to the specific departments and projects.
  • Around 5000 historically labeled tickets were vectorized using Bag-of-Words and TF-IDF to create the training data and a Naïve Bayes Classifier was trained to perform a multiclass classification.
  • A Cosine Similarity was applied to each new ticket logged achieving a testing accuracy of 68%.

Publications

One of my favorite things about data science and machine learning is the vastness of it and that there is always something new to explore and find. The intention is that my work can be the start of something new and be helpful to someone, someday. A ripple effect.

A diagnostic questionnaire for childbirth related posttraumatic stress disorder: a validation study

American Journal of Obstetrics and Gynecology

July 2024 (published)
November 2023 (accepted)

PTSD (Post-Traumatic Stress Disorder) from traumatic childbirth is a very real condition resulting from distressing labor experiences. Similar to other traumas, it causes persistent symptoms of anxiety and avoidance behaviors. An estimated ~7-8 million women globally are affected each year by childbirth-related PSTD (CB-PSTD). However, this condition is under-studied amd therefore under-diagonsed and under-treated, and to date, no validated tools to rapidly and efficiently screen for this disorder have existed.

  • We validate the use of the self-reporting Posttraumatic Stress Disorder Checklist for DSM-5 (PCL-5) to effectively assess the presence of CB-PTSD against the Clinician-Administered PTSD Scale for DSM-5 (CAPS-5).
  • A cutoff value of 28 maximized the sensitivity (0.80) and specificity (0.93)>, and correctly diagnosed 86% of women.
  • Noted an AUC-ROC value of 0.94 and a correlation value of 0.82 between different reporting metrics showing excellent diagnostic performance.

Gastric cancer is a major worldwide health concern and is the fifth most occuring cancer, underscoring the importance of early detection to enhance patient outcomes. Traditional histological analysis, while considered the gold standard, is labour intensive and manual. Deep learning (DL) is a potential approach, but existing models fail to extract all of the visual data required for successful categorization. This work overcomes these constraints by using ensemble models that mix different deep-learning architectures to improve classification performance for stomach cancer diagnosis.

  • Dataset: Gastric Histopathology Sub-Size Images Database (dataset)
  • Ensemble model (VGG-16 and ResNet-34) obtained an average accuracy of more than 99% at various resolutions (160 x 160, 120 x 120, 80 x 80).
  • ResNet50, VGGNet, and ResNet34 performed better than EfficientNet and VitNet, with the ensemble model continuously delivering higher accuracy.

Education

Northeastern University

Master of Science in Data Science
December 2023

Khoury College of Computer Sciences
Related Coursework: Supervised Machine Learning, Unsupervised Machine Learning, Deep Learning, Natural Language Processing, Introduction to Data Management and Processing, and Algorithms

Vellore Institute of Technology

B.Tech in Computer Science
July 2020

Related Coursework: Database Management Systems, Data Mining, Web Mining, Statistics, Image Processing
Activities and societies: Google Developers Group (GDG) VIT Vellore Chapter; volunteering at Make A Difference (MAD) Vellore Chapter

Certifications

Learning is a journey and there is always something new to discover. What better way to upskill than structured courses. Here are some of my key certifications I received in my endeavors.

A Few Of My Latest Projects