SKILLS AND TOOLS
Languages: Python, SQL (PostgreSQL, MySQL), C++
Libraries: NumPy, Pandas, Matplotlib, Seaborn, Scikit -learn, TensorFlow , Keras, NLTK
Tools: Tableau, Git, Jupyter Notebook, MS Office , Jira
Cloud: AWS (EC2, S3, RDS, Lambda, Elastic Beanstalk), Heroku
Data Science: Regression, Classification, Clustering, Dimension Reduction, Trees, Natural Language Processing, Deep Learning.
Data Scien tist , MOPO Life, Inc , Fremont, CA 08/ 20 - Present
• Design ed and buil t a Recommendation Engine to match experiences to users . Compared three different machine learning models,
fed the results into Elasticsearch to obtain the final recommendations and similarity score (Best score ~ 93% ).
• Generated initial data for MVP, identified data sources for web scraping, built data pipeline , and applied Natural Language
Processing (NLP) to analyze the data in Python .
• Design a reusable and extensible web scraping template , transform and load ( ETL ) the data in to AWS RDS and S3 buckets .
Data Science Intern, Banyan Data Services , San Jose , CA 06/ 20 - 08/20
• Created an end -to-end pipeline - extract ed data from AWS, clean ed using regex , NLP , and Pandas , train ed , and stor ed data.
• Applied data mining techniques - log parsing, feature creation , and anomaly detection to infrastructure logs data using Python and
predicted the occurrence of an error in the system with 75% accuracy and increased server performance .
Research & Development Analyst - Student , Fidelity Investments, San Francisco, CA 01/20 -05/ 20
• Collaborated with a team to p erform search engine analytics, social media analytics, website analytics , and audit , identif ied top
trends, features, and search queries, and develop ed a website, “grokinvestments.com" (Top features: Interactive Chatbot, Quiz) .
• Analyzed user behavior & engagement through topic modeling, correlation analysis , and descriptive analysis using the data gathered
from google analytics, A/B/n test , and survey results , thereby increasing the SEO score from 82 to 100 , accessibility score from 79
to 84 , and achieved an overall rating of 9/10 . The purpose of the project was to make Fidelity products more appealing to millennials .
Quality Manager , Seacorr Nondestructive Testing Services LLC, Dubai, UAE 07/ 15-07/19
• Worked with cross -functional teams and d eveloped SQL queries to capture & interpret raw data from 10M rows of product quality
datase t, built dashboards and reports in Tableau , contributing to product performance improvements.
• Investigated user behavior , product performance gaps , and implemented regression models using python on customer & product
data to validate churns, resulting in a 2% improvement in overall sales yearly.
• Created an end -to-end tool to clean, analyze , interpret student data and stipulated recommendations and corrective action plans,
increasing the quality objective targets by 5% .
PROJECTS 2019 -2020
Amazon apparel - Image classification and product recommendation Tools/Package: Python NLTK, Tensorflow , Keras , sklearn
• Created a tool that identifies an apparel image (Accuracy~ 85%) and recommends the top N product based on the image, product
title , and description. Scraped over 1000 apparel images from Amazon using python and selenium. Performed feature engineering,
feat ure extraction using VGG16, CNN image classification, content -based filtering , and determined the best model.
Restaurant recommendation engine (Zindi competition) Tools/Package: Sci -kit learn, Clustering, XGB, imblearn
• Built a recommendation engine to predict the restaurants that customers are most likely to order from given the customer location,
restaurant information, and the customer order history. Applied resampling strategies to the imbalanced data and built various data
modeling pipelines to achieve the best F1 score of 0.0 53 using SMOTE and KNN .
Relational Database Management System for Air bnb and performed Customer behavior analysis Tools: MySQL, Excel, Tableau
• Developed a relational database model for Air bnb to manage its property listings and stakeholders and analyze d patterns in
customer behavior by seasonality trends, demographics , and user groups to estimate the likelihood of customer booking rate.
Master of Science in Business Analytics /Data Science (GPA: 3.9 7/4.0) , San Francisco State University , CA, USA 08/19 -12/20
Bachelor’s in Engineering (Hons.) Mechanical (First class with honors) , BITS Pilani, Dubai, UAE 08/ 11 -06/ 15
LEADERSHIP & ACHIEVEMENTS
• Volunteer for Statistics Without Borders – Built an MVP, which is a curated list of national funding sources available to small
businesses impacted by COVID -19 (Methodology: Web scraping, Text analytics, Topic modeling, Document classification )