Module 1 - Data Wrangling with Python
Data Processing with NumPy and Pandas
This project marks the beginning of my journey into data analysis, where I focus on analyzing data and applying
fundamental concepts such as data cleaning, correlation analysis, and introductory data analysis techniques.
Module 1 - Data Wrangling with Python
Data analysis and visualization
This marks the second phase of the project, where I delve deeper into data analysis concepts and expand our use of data visualization techniques. I utilize libraries like matplotlib, seaborn, pandas, and numpy to explore various methods of visualizing data, analyze data distribution patterns, and derive insights from correlations between different features.
Module 1 - Data Wrangling with Python
Capstone Project
This wraps up the Module 1. The capstone project seeks to implement the concepts covered in previous projects. It involves utilizing familiar libraries such as matplotlib, seaborn, pandas, and numpy, along with incorporating the Geopy library for geocoding services and accessing other geographical data APIs.
Module 2 - Data Analysis
Data Analysis with SQL
In the module 2 this project aims to apply some data analysis, and statistics to explore and analyze a dataset. Through iterative questioning and strategic exploration, I identified key insights and patterns, using data-driven approaches to come up with ideas to address issues related to the Mental Health in Tech Industry.
Module 2 - Data Analysis
A/B Testing
For these two A/B Testing projects, I applied some statistical concepts such as estimation, the central limit theorem, and confidence intervals, while deepening my understanding of hypothesis testing and experiment design. I conducted experiments to evaluate and validate business hypotheses, driving data-informed decision-making and optimizing outcomes.
Module 2 - Data Analysis
Regression
In this regression analysis project, I developed some explanatory models to investigate the factors that influence the target variable. By fitting a statistical model to the dataset, I identified the key variables that most significantly impact the target, providing valuable insights into the underlying relationships between predictors and the target variable.
Module 3 - Machine Learning
Supervised Machine Learning Fundamentals
In this project, fundamental Machine Learning techniques for Supervised Learning were utilized to identify key patterns and build predictive models. Through Exploratory Data Analysis (EDA), statistical inference, confidence intervals, and testing, Support Vector Machines (SVMs) were applied alongside hyperparameter tuning to identify the best model for the dataset.
Module 3 - Machine Learning
Gradient Boosted Trees, XGBoost, CatBoost, and LightGBM
This project explores Gradient Boosted Trees and ensemble learning techniques using XGBoost, CatBoost, and LightGBM to analyze a dataset. It focuses on feature engineering, hyperparameter tuning, and model evaluation, leveraging ensemble methods to improve predictive accuracy and generalization.
Module 3 - Machine Learning
Unsupervised Learning and Hyperparameter Tuning
This project provides hands-on practice with various machine learning models, focusing on ensemble learning techniques using XGBoost, CatBoost, and LightGBM. It covers building and optimizing model ensembles, applying hyperparameter tuning, and leveraging AutoML tools for improved performance. Additionally, it includes data visualization with Matplotlib & Seaborn and data manipulation tasks such as reading, querying, and filtering datasets.
Module 3 - Machine Learning
Capstone Project
This Capstone project integrates key machine learning concepts, providing hands-on experience with Pandas, NumPy, and visualization libraries like Matplotlib and Seaborn. It focuses on translating business requirements into ML-driven solutions through statistical analysis, data wrangling, and exploratory data analysis (EDA). The project covers supervised and unsupervised learning algorithms, including Scikit-Learn and gradient boosting libraries, along with model selection, evaluation, and hyperparameter tuning. Additionally, it involves statistical inference techniques, proficient use of Jupyter Notebooks, and model deployment on Google Cloud or other platforms accessible via HTTP requests.