Über mich
Hi, ich binJacqueline und liebe Zahlen, Daten Fakten. Ich helfe Unternehmen dabei, aus Daten echte Geschäftsvorteile zu schaffen – von Predictive Analytics und Machine Learning Modellen bis hin zu Dashboards, die Entscheidungen auf C-Level ermöglichen. Mit über 7 Jahren Erfahrung in Analytics, Data Science und Business Intelligence habe ich unter anderem bei LG Electronics und SharkNinja datengetriebene Lösungen entwickelt, die Reportingzeiten um bis zu 60 % reduziert und Marketing- sowie Salesstrategien messbar verbessert haben. Was mich ausmacht? ✨ End-to-End Expertise: von SQL-Pipelines über ML-Modelle in Python und R bis zu Power BI, Tableau & Looker Studio. ✨ Cloud Native: GCP & AWS. ✨ Kommunikation: komplexe Daten einfach erklären – ob für Entwicklerteams oder Vorstände. Meine Superkraft: Ich baue die Brücke zwischen Business und Data – also nicht nur Modelle bauen, sondern dieso so einführen, dass sie nachhaltig genutzt werden können von Eueren Teams. Wenn Du nach jemandem suchst, der Machine Learning, Data Engineering und Visualisierung verbindet, um Deine Datenprojekte wirklich zum Fliegen zu bringen, dann kontaktiere mich und wir schauen gemeinsam wo die Datenreise hingehen kann.
Skills
Expert:in
Fortgeschritten
Grundkenntnisse
Portfolio
Multi-Strategy Customer Segmentation Summary
Projekte
A Computer Vision Application: Convolutional Neural Networks for Detection of Cancer Tissue in Histopathologic Images
2022 — 2023
Cancer up until this day does not have a cure and this is due to it not being one disease, but multiple diseases in different forms. When diagnosing cancer, the world changes for the patient. If cancer is diagnosed too late, then it can end fatally for the patient. When diagnosing cancer, a biopsy is done, which is extracting the atypical tissue cells. This is then examined under a microscope, which is histopathology. In this paper, histopathology images are examined and then classified to either positive or negative for cancer. The images are 96x96 pixels and if the image is from a patient with cancer then the cancer tissue can be found in the middle of the image (32x32) pixels. If the tumor tissue is around the edges, then the classification is not influenced. The goal is to classify the images correctly into either positive or negative for cancer. The first method is to train models with pre-trained bases to see where the accuracy and losses lie during training. After training multiple models with pre-trained bases an own convolutional neural network will be compiled and trained. All models trained are explored in the paper. A good model does not only have high accuracy but also a low value for identifying false negatives. False positives are not crucial, as if someone does not have cancer and the doctor thinks the patient has cancer, after further medical examinations the correct diagnosis will emerge. However, if the patient has cancer but it is falsely identified as non-cancer tissue, the doctor might stop further medical examinations, and this results in treatment time that is lost. Cancer is a disease, where the patient is running against time when fighting it. The dataset does not provide the labels for the test images, to see how well the model performs the prediction results were uploaded to the Kaggle challenge for evaluation.
Drug Violations in Boston - A Forecast Based on Drug Violations in previous years
2021 — 2022
Criminal offenses around the world are an issue to society. Offenses including drugs are particularly important as drugs change the person’s physical and mental state. The Boston Police Department introduced a new criminal incident report system in 2019. Data dating back to mid of 2015 have been uploaded into the system. All criminal offenses that are reported can be downloaded from the Boston Analyze website. The data is up-to-date, thus allowing the public to track any incident that has been reported. The data set does not include any personal information or information about the case, but more so on the offense that has occurred, when it occurred and where. While this data set gives an insight on all violations that have occurred, it does not give insight on reasons behind the offense. In every country it is a main priority to reduce the number of criminal acts to make living in an area safe for everyone. Being able to predict what happens in the near future is not only helpful for the police but also ensures people that they are safe. The purpose of this analysis is to see if a model can be created to predict the number of violations including drugs with limited information about the offense and about the person who committed the offense. As such data includes date and time various time series models can be created. Before creating models, the data has to be prepared. Preparing the data is a crucial step before analyzing and model creation can be started. As in the data set all types of criminal incidents are included, the first step is to subset the data to only include violations in combination with drugs. Then the date column has to be reworked by stripping the time away from the date. Afterwards, the number of violations on a given day can be summarized. It would be impossible to predict the number of violations that occur without summarizing it, as there are a various number of offense codes. 3 The first type of model created, was by using Facebook’s prophet function. The models created by using this function show an extreme decrease in the number of violations in the future. The models even show a negative number of violations, which is unrealistic. The second type of models implemented are dynamic models using the linear regression model. By creating such models, it is evident that the variable date expresses about 24% of the variation in the dependent variable, number of violations. The dynamic models show a decrease as well, however not as extreme as with the prophet function. The third and final type of model created, is by using the time series and arima function in R. When examining the forecast, this model shows the most realistic prediction of the number of violations in the future. As all of the models show a decrease in the number of violations involving drugs in the future, it can be concluded that the number of violations does decrease in the coming years. As two of the seven and a half years (26% of the dataset) included in this analysis are extraordinary years as there were worldwide lockdowns due to a pandemic, it is hard to conclude if the forecast is correct. The limitations that were encountered while creating forecast models, was that with the given information a forecast model can be created but its predictive measures are limited. More information such as age, race, sex, educational background, financial background, first time violation or multiple time violation, district from where the person is, incarceration before could be helpful in creating a better model.
A tutorial on building a CNN to predict the emotions of people in images
2021 — 2022
Background
People around the world express emotions through facial expressions. To understand someone‘s emotion one does not have to speak the language of the other person. The feeling‘s can be read off the person‘s face. Paul Ekman analyzed exactly this aspect of people: if facial expressions of emotions are universal.
Aim
The aim is to see if a convolutional neural network can be created to categorize the emotions in the pictures correctly to the actual emotion shown.
Dataset
The used dataset can be downloaded from Kaggle.
The dataset consists of 48x48 pixel pictures of people. The pictures can be categorized into 7 groups: 0 = Angry, 1 = Disgust, 2 = Fear, 3 = Happy, 4 = Sad, 5 = Surprise, 6 = Neutral.
The data set is divided into train, validation and 2 separate test sets: private and public. The whole train data set that was provided in the Kaggle challenge included 28709 images. These images were then split into train and validation with 35% of the images as validation and 65% as training.
Technology
For this project the notebook was created and run online on the Paperspace gradient website. This website allows the user to choose a machine with different GPU and CPU settings that are needed. To run the models at a moderate speed a GPU of 16 GiB is needed as well as 8 CPU and 30 GiB RAM,.
To prepare the data the pandas library and the numpy library were used. To create the models various Tensorflow libraries were imported into the notebook.
Benefits
Even though the model has lacks in prediction, an accuracy of over 63% was achieved by testing various implementations of the model by changing: batch size, epochs, drop-out values and filter sizes.
Drawbacks
After evaluating a couple of images, it is noticeable that some images have wrong labels. This leads when training a model to not be able to predict reliably. To eliminate this issue all images would need to be reevaluated to check if the labeling is correct and then rerun the models.
Challenges
The challenge that persisted throughout the notebook was the challenge of boosting the accuracy on validation and test data. The first model created had an accuracy of about 50%. The model that was then used to create the final model started off at around 56% accuracy and the final model ending at 63% accuracy on the test data shows an increase however not an increase that would be suitable for a model to predict reliably facial expressions.
Results
The final model created trained for 200 epochs with a batch size of 128. The model had an accuracy of 61.3% on the public test set and a 63.56% accuracy on the private test set. This model has multiple conv2d layers with max-pooling layers, dropout layers and batch normalization layers.
Berufserfahrungen
Teaching Assisstant / Teaching Fellow · Befristete Beschäftigung
Harvard University – Harvard Extension School · Bildung und Wissenschaft
2024 — heute
- Holding sessions for Foundations of Data Science and Engineering I & Foundations of Data Science and Engineering II
- Grading of assignments and projects
Senior SS&A Analyst – Strategic Planning Operations · Vollzeit
SharkNinja · Konsumgüter und Handel
2024 — 2025
- Led analytics for strategic sales planning using SQL and Python to extract and analyze business-critical data.
- Designed and owned Power BI dashboards that reduced manual reporting time by 60% and were used by senior leadership.
- Provided actionable insights that influenced product launch timelines and demand planning strategy.
- Collaborated with Data Architects, Data Engineers, and stakeholders to scope and prioritize enhancements to existing reporting pipelines.
Data Analytics Specialist – Online Brand Store / D2C · Vollzeit
LG Electronics · Konsumgüter und Handel
2022 — 2024
- Managed reporting operations for eCommerce business, developing KPI dashboards and marketing ROI analysis.
- Built SQL pipelines querying GCP DB to support campaign performance tracking and sales funnel analysis.
- Developed Tableau & Looker Studio dashboards adopted by cross-functional teams across marketing and sales departments.
- Implement data definitions and ensure data integrity across systems