Fabiano Chapuis

Github Portfolio

These studies initially began as an opportunity to learn about Data Science and AI while enhancing my coding abilities. My decision was motivated by a desire to immerse myself in data analysis, a field of personal interest, and to leverage these newfound skills in practical settings. Ultimately, I aim to contribute positively to society by uncovering insights that can benefit us all.


A brief introduction about me

Holding a BA in Business Management and certified as an Oracle/DBA, I have also completed an Applied Business post-graduation program and am currently self-studying Data Science. With over 30 years of experience in various roles including DBA, R&D Project Manager, Data Analysis, Business Manager, Team Leadership, Researcher, Customer Service, Business Intelligence Analyst, Business Analyst and Banking, I have worked in both domestic and international companies. Currently, I am working as a Data Engineer. My experience encompasses data analysis, collecting business data, analyzing information, and developing improvement solutions based on findings. I possess strong management skills, critical thinking abilities, and attention to detail. I chose to pursue this path in order to engage more deeply with data, a subject of great interest to me. By applying these skills in real-world situations, I aim to contribute value to society and uncover insights that may prove beneficial to us.

Module 1 - Data Wrangling with Python

Data Processing with NumPy and Pandas

This project marks the beginning of my journey into data analysis, where I focus on analyzing data and applying
fundamental concepts such as data cleaning, correlation analysis, and introductory data analysis techniques.

Module 1 - Data Wrangling with Python

Data analysis and visualization

This marks the second phase of the project, where I delve deeper into data analysis concepts and expand our use of data visualization techniques. I utilize libraries like matplotlib, seaborn, pandas, and numpy to explore various methods of visualizing data, analyze data distribution patterns, and derive insights from correlations between different features.

Module 1 - Data Wrangling with Python

Capstone Project

This wraps up the Module 1. The capstone project seeks to implement the concepts covered in previous projects. It involves utilizing familiar libraries such as matplotlib, seaborn, pandas, and numpy, along with incorporating the Geopy library for geocoding services and accessing other geographical data APIs.

Module 2 - Data Analysis

Data Analysis with SQL

In the module 2 this project aims to apply some data analysis, and statistics to explore and analyze a dataset. Through iterative questioning and strategic exploration, I identified key insights and patterns, using data-driven approaches to come up with ideas to address issues related to the Mental Health in Tech Industry.

Module 2 - Data Analysis

A/B Testing

For these two A/B Testing projects, I applied some statistical concepts such as estimation, the central limit theorem, and confidence intervals, while deepening my understanding of hypothesis testing and experiment design. I conducted experiments to evaluate and validate business hypotheses, driving data-informed decision-making and optimizing outcomes.

Module 2 - Data Analysis

Regression

In this regression analysis project, I developed some explanatory models to investigate the factors that influence the target variable. By fitting a statistical model to the dataset, I identified the key variables that most significantly impact the target, providing valuable insights into the underlying relationships between predictors and the target variable.

Module 3 - Machine Learning

Supervised Machine Learning Fundamentals

In this project, fundamental Machine Learning techniques for Supervised Learning were utilized to identify key patterns and build predictive models. Through Exploratory Data Analysis (EDA), statistical inference, confidence intervals, and testing, Support Vector Machines (SVMs) were applied alongside hyperparameter tuning to identify the best model for the dataset.

Module 3 - Machine Learning

Gradient Boosted Trees, XGBoost, CatBoost, and LightGBM

This project explores Gradient Boosted Trees and ensemble learning techniques using XGBoost, CatBoost, and LightGBM to analyze a dataset. It focuses on feature engineering, hyperparameter tuning, and model evaluation, leveraging ensemble methods to improve predictive accuracy and generalization.

Module 3 - Machine Learning

Unsupervised Learning and Hyperparameter Tuning

This project provides hands-on practice with various machine learning models, focusing on ensemble learning techniques using XGBoost, CatBoost, and LightGBM. It covers building and optimizing model ensembles, applying hyperparameter tuning, and leveraging AutoML tools for improved performance. Additionally, it includes data visualization with Matplotlib & Seaborn and data manipulation tasks such as reading, querying, and filtering datasets.

Module 3 - Machine Learning

Capstone Project

This Capstone project integrates key machine learning concepts, providing hands-on experience with Pandas, NumPy, and visualization libraries like Matplotlib and Seaborn. It focuses on translating business requirements into ML-driven solutions through statistical analysis, data wrangling, and exploratory data analysis (EDA). The project covers supervised and unsupervised learning algorithms, including Scikit-Learn and gradient boosting libraries, along with model selection, evaluation, and hyperparameter tuning. Additionally, it involves statistical inference techniques, proficient use of Jupyter Notebooks, and model deployment on Google Cloud or other platforms accessible via HTTP requests.