Ankur Ojha

I'm

About Me

Hey, There,
I'm Ankur, a graduate student at USC specializing in Analytics. I am deeply passionate about harnessing data to uncover actionable insights and drive business success.

My academic and professional journey has solidified my foundation in Data Science, Data Analysis, Time Series Analysis, Machine Learning, and Statistical Inference & Modeling.

I am proficient in Python, R, SQL, and skilled in BI tools like Tableau, Looker, and Plotly. I adeptly navigate through big data technologies such as Hadoop, Spark, Snowflake, and cloud platforms including AWS, Google Cloud, and Databricks. I have successfully utilized these skills during my tenure as a Data Analyst Intern at Marqeta and academic projects.

Eager to embark on my career in Data Analytics, Data Science, and Data Engineering, I am ready to apply my comprehensive skill set and experiences to make a significant impact as a dynamic team contributor.

  • Degree: Masters of Science, Analytics
  • Primary Email: ankur15ojha@gmail.com

Technical Skills

Programming (Python, R, SQL, Bash)100%
Database (MySQL, Hive-QL, Spark-SQL, PostgreSQL) 90%
Data Analysis (Tableau, Looker, Plotly, Excel) 90%
Machine Learning (Tensorflow, Keras, PyTorch, SciKit, NumPy, Pandas) 80%
Bigdata Technology (Hadoop, Spark, Kafka, Hive, Aiflow) 75%
Cloud Platforms (AWS, Google Cloud, Databricks, Snowflake) 70%

Resume

Summary

Ankur Ojha

  • 107 S Mary Ave, Sunnyvale, CA
  • (213) 431-9255
  • ankur15ojha@gmail.com

Education

University of Southern California, Los Angels, CA

GPA: 3.85/4.0
Aug 2022 - May 2024

Master of Science, Analytics

  • Data Mining
  • Machine Learning
  • Data Management( SQL, BI, ETL )
  • Predictive Analytics
  • Fruad Analytics
  • Text Analytics( NLP, LLM, Gen AI )
  • Data Visualization
  • Business Intelligence

Galgotias College of Engineering & Technology, Greater Noida, India

July 2015 - Jun 2019

Bachelor of Technology : Mechanical Enginerring

  • Engineering Mathematics
  • Computer Programming
  • Statistics

Online Courses & Certifications

July 2021 - Current

Coursera, Edx, MITx, Udemy

Completed various onlines courses related to Machine Learning and Data Analytics

  • Machine Learning
  • Coursera : Deep Learning Specialization
  • Google Data Analytics
  • Python For Data Science

Professional Experience

Marqeta, Oakland, CA

June 2023 - August 2023

Software Engineer Intern

  • Designed & built an end-to-end fully automated data pipeline using Airflow and Spark, collecting and refining data from diverse IT systems via Okta Workflows, reducing data processing time by 80% .
  • Developed a Snowflake data repository to efficiently manage diverse datasets, including 50K Okta SSO entries, 3.4M minutes of app usage, user data and IT system data, reducing report generation SLA from 7 days to real-time.
  • Developed sophisticated SQL queries to optimize data retrieval, enhancing efficiency and reducing processing times.
  • Developed various applications & tools like interactive Looker dashboards leveraging Snowflake data, providing real-time analytics on system usage and user access patterns, facilitating planned maintenance and reducing response times by 35%, and boosted policy compliance by 25%..

BitVault Technologies Private Limited, Lucknow, India

Jul 2019 - Jun 2020

Software Engineer

  • Implemented clickstream analytics by building real-time data pipelines using Python and Spark to monitor user behavior across client websites. This allowed us to identify and optimize underperforming user flows, increasing engagement by 20% and enabling faster, data-driven decisions for business teams.
  • Optimized database designs by analyzing access patterns and usage analytics, reducing query response times upto 40% and improving system scalability , ensuring optimal performance during peak traffic.
  • Built centralized data repositories across multiple products, reducing data silos and providing unified analytics, which streamlined cross-product reporting and enhanced operational efficiency.
  • Developed ETL pipelines using Spark and Kafka for efficient data collection and processing, enabling real-time analytics. Also created real-time dashboards and data APIs to improve data accessibility and drive better decision-making across teams.

USC Viterbi School of Engineering, Los Angeles, CA

Jul 2024 - Present

Data Science Research Assistant

  • Designed and built efficient, robust, and scalable data pipelines for machine learning tasks in a cloud environment, utilizing Spark and SageMaker to enhance data processing by 30% for rapid model iterations.
  • Utilized AWS Glue for serverless ETL and Athena for efficient SQL querying on S3, streamlining data analysis and reducing overhead. Tuned Spark SQL for faster data transformations, enhancing overall efficiency by 20%.
  • Leveraged SageMaker Canvas to build no-code, automated ML pipelines, simplifying model development & deployment.
  • Explored GCP tools such as Vertex AI and BigQuery to build end-to-end data engineering pipelines.

Kiana Analytics, Los Angels, CA

January 2023 - May 2023

Data Analytics Consulting (USC Practicum)

  • Consulted clients to understand business needs, gathered requirements, and translated them into analytical solutions, KPIs, and clear user stories, defining the who, what, and business value for the development team.
  • Analyzed indoor device location data to offer location-based analytical solutions for predictive maintenance, intrusion detection, and emergency preparedness using locality patterns.
  • Developed features like moving radius, centroid, frequency, and working days from device location data to identify movement patterns. Utilized clustering and classification algorithms to classify devices as fixed or moving, achieving 96% accuracy. Further categorizing devices into employees and visitors by movement patterns and active hours.
  • Utilized Spark for machine learning and data preprocessing, handling 200 million records to generate nuanced variables for in-depth analysis. Implemented real-time intrusion detection using Kafka. Developed predictive ML models to accurately forecast visitor flow and equipment usage patterns.

R.K. Enterprises, Gorakhpur, India

Jun 2020 - Jul 2022

Business Analyst

  • Collected and analyzed operational data from cross-functional teams to assess and enhance business performance. Implemented data-driven methods to improve customer service, forecasting accuracy, resource allocation, and scrap settlement efficiency.
  • Built MySQL data repository for manufacturing and operational data and automated SQL-based reporting to visualize various business KPIs, improving decision time by 55% using key insights via Tableau dashboards.
  • Utilized Tableau's Gantt charts for resource management, cutting delivery times by 20%.
  • Developed predictive models for precise resource procurement and delivery estimates, significantly boosting the production efficiency of 10 million bricks annually

NTPC Limited, Vidyut Nagar, India

Jun 2018 - August 2018

Intern

  • Learned about various sustainability strategies employed by NTPC on the TBL framework and working on the Ash Handling Plant and Fly Ash Utilization processes, which involved using data analysis and smart recommendations for energy management.

Head volunteer [2016-2021]

June 2016 - December 2021

UPKAR SAMITI NGO, Pratapgarh, India

  • Nai Roshini (A Project of Ministry of Minority Affairs, Govt. of India for the Empowerment & Leadership Development of Minority Women).
  • National Tobacco Control Programme (A project of the Ministry of Health, Govt. of Uttar Pradesh).
  • Atal Bhujal Yojana (Ground Water Conservation, A Project funded by World Bank).

Project Work

List of key projects that I developed throughout my studies. Through these project I tried to materlize my learnings in solving problems. Some projects that I build are core Machine learning model creation where focus was to achieve higher accuracy, Analysis projects where object is to find the key business insights, and Data Engineering projects to explore new technologies or frameworks to handle large datasets and achieve higher query performance.

Customer Segmentation using Yelp review Data

Customer segmentation platform is a scalable solution for the marketing needs. I build a aggregated data cube by incorporating customer Behavior Attributes like, categories which users like, food preferences, location preferences, etc. and Predictive Attributes like business recommendations. This attributes bring all the data together which resulted in faster audience creation. Also create a data pipeline using Airflow, Spark and Kafka to incorporate the realtime updates.


Credit Card Application Fraud Detection

Built a supervised machine learning model to address an imbalanced classification challenge, targeting the detection of fraudulent credit card applications. Utilizing a synthetic dataset containing 1 million records with Personal Identifying Information. This model demonstrated robust performance in identifying 53.43% of fraud by examining the top 3% of applications.


- Marshall School of Business, Spring 2023

NYC Property Tax Fraud Detection Using Unsupervised ML Models

This project detects tax fraud in over 1 million NYC property records using unsupervised ML techniques. Implemented PCA for dimensionality reduction, Minkowski distance for anomaly detection, and trained an autoencoder to identify non-linear fraud patterns. Developed a comprehensive fraud scoring system to rank properties and utilized z-score heatmaps for in-depth analysis of fraud score drivers, preparing the dataset for expert fraud assessment.


- Marshall School of Business, Spring 2023

Stock Price Prediction of NVIDIA and AMD

Led a project comparing ARIMA and LSTM models using Python for forecasting stock trends. Applied ADF tests and time series decomposition to enhance model accuracy, optimizing ARIMA through AIC/BIC and refining LSTM with strategic look-back adjustments. Conducted comprehensive performance analysis using RMSE metrics, offering valuable insights for data-driven financial decision-making.


- Financial Analytics, Viterbi School of Engineering, Fall 2023

Aspect-Based Sentiment Analysis on Yelp Reviews

Employed NLP techniques for detailed aspect-level sentiment analysis on Yelp reviews, across various businesses. Utilized EDA, NLTK, LDA, and BERTopic to identify key aspects like Food, Place, Service, Menu, Drinks, and Time, enhancing review classification with a three-point sentiment scale. Over 3,000 reviews were meticulously labeled and analyzed using advanced ML models like RNN, LSTM, and BERT, benchmarked against OpenAI's GPT3.5.


- Text Analytics, Viterbi School of Engineering, Fall 2023

Credit Card Transaction Fraud Detection Project

Developed a supervised ML model to detect and predict credit card fraud from a dataset of 96,753 transactions in 2010. The project involved cleaning data, creating variables, selecting relevant features, and building various models including Logistic Regression, Random Forest, Light GBM, and Neural Network. By analyzing and optimizing model performance based on Fraud Detection Rate (FDR) at a 3% rejection rate, the final model effectively identified 55.63% of fraud cases, potentially saving an estimated $21 million annually. This project demonstrates significant potential in mitigating financial losses due to credit card fraud.


- Fraud Analytics, Marshall School of Business, Spring 2023

Real Estate Price Prediction

Developed a deep learning predictive model using Tensorflow to estimate home sale prices between May 2014 and May 2015 in King County. The model utilized features like bedrooms, bathrooms, views, and square footage. Key steps included training the model with a limited feature set, freezing weights to evaluate performance on a new test dataset, and re-training with an increased number of features. This predictive tool is designed to aid real estate agents, buyers, and sellers in making well-informed decisions regarding property pricing and transactions.


Tableau Dashboards

Here are some of my featured dashbaords.

  • All

Customer Purchase Analysis Dashboard

This dashboard to analyze 12-month customer-sales data in the USA, showcasing sales metrics, demographic trends, and geographical demand through diverse chart types, and assessing the impact of discounts on purchasing behavior.

Bank Customer Segmentation Dashboard

This dashboards used for segment bank customers, analyzing a UK-based dummy dataset. Insights revealed key demographics like age, gender, and job classification across regions, guiding recommendations for targeted sales strategies.

WeWash_USleep_dashboard

Online Courses & Certifications

  • All
  • BI Tools
  • Programming
  • Data Science
  • Agile & Supply Chains

Tableau Certificate

Tableau Certificate

Python Programming

Python Programming

SQL Programming

SQL Programming Joins

SQL Programming Joins

SQL Certification

Python Certificate

Python Certificate

Python Certificate

Python Certificate

Python Certificate

Python Certificate

Python Certificate

Python Certificate

Python Certificate

Python Certificate

Python Certificate

Facts

95 percentile / 120K GATE-2021

GRE Quant: 165 / 170

Years Of Experience

Hours Of Volunteering

Percentile Problem Solving Test (CBSE)

Contact

Location:

107 S Mary Ave, CA 94086

Call:

+1-213-431-9255

Loading
Your message has been sent. Thank you!