Wei-Wei Du | Machine Learning Scientist

Hi, I'm Wei-Wei Du.

A

Passionate machine learning researcher with a business mindset, driven to solve real-world challenges through data and innovation.

About

I am a Machine Learning Researcher at Sony, focusing on user modeling tasks. Before joining Sony, I worked as a Machine Learning Scientist Intern at Appier, developing real-time bidding models. Before graduation, my research spanned several domains, including property valuation (self-supervised learning for few-shot scenarios, graph neural networks), natural language processing (depression detection, multi-modal fact-checking), and Sports AI, with multiple publications on these topics. This year, I published a comprehensive survey on self-supervised learning to make a broader impact on the academic community. With over four years of experience in the field of Machine Learning, I am passionate about making an impact by applying data-driven solutions to real-world applications and continuously seeking new knowledge. In my free time, I enjoy photography, cycling, and practicing yoga.

Programming: Python, Linux, Shell, R, SQL, PySpark, Java, C++
Languages: English, Mandarin
Tools & Technologies: Git, Docker, AWS, GCP

Experience

Sony

Machine Learning Researcher

Develop personalization techniques, including recommender system and causal inference, to improve user engagement.
Research on irregular time intervals in time-series data and application for the adaptation of LLMs to recommendation tasks.

Oct 2023 - Recent | Tokyo, Japan

Appier

Machine Learning Scientist Intern

Implemented an ensemble real-time bidding model with new data-driven features from 10M+ e-commerce click stream data that achieved 2x performance in production.
Conducted tree analysis and feature importance with SHAP to analyze model behavior.
Cooperated with 3 data scientists to build the RTB model for re-engagement campaigns.

Jun 2022 - Nov 2022 | Taipei, Taiwan

Research Papers

                
 SSL for NSTD
                  A Survey on Self-Supervised Learning for Non-Sequential Tabular Data (ACML-24)
                
Accomplishments
                    The first comprehensive survey of recent advancements in SSL4NS-TD, consisting of problem definitions, taxonomy, application issues, NS-TD datasets, and evaluation protocols
                  
                    Evaluate representative SSL4NS-TD approaches of each learning category on the most recent and large-scale benchmark, TabZilla.
                  
                    Highlight the addressed challenges and future directions in SSL for the existing NS-TD methods.
                  
AI for Sport
                  Benchmarking Stroke Forecasting with Stroke-Level Badminton Dataset (IJCAI-24 Demo)
                
Accomplishments
                    Introduce ShuttleSet22, a stroke-level badminton singles dataset collected from realworld high-ranking matches in 2022
                  
                    Initiated a challenge within CoachAI Badminton Challenge 2023 (https://sites.google.com/view/coachai-challenge-2023/) in conjunction with IJCAI 2023.
                  
Depression Detection
                  Ensemble Models with VADER and Contrastive Learning for Detecting Signs of Depression from Social Media (ACL-22 Workshop)
                
Accomplishments
                    Developed an ensemble model with VADER and contrastive learning for detecting depression.
                  
                    Won second place in 30+ teams without any auxiliary information.
                  
Multi-Modal Fact Checking
                  Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact Verification (AAAI-23 Workshop)
                
Accomplishments
                    Introduced a parameter-efficient large foundation model by utilizing adapters and additional features.
                  
                    Incorporated co-attention modules for different modalities (image and text) and different types (claim and
                    document).
                  
                    Surpassed 25.9% compared with the official baseline.
                  
SSL for Few-Shot Learning
                  Dora: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal (CIKM-23)
                
AccomplishmentsThe first work focusing on low-resource real estate appraisal, which meets the needs of real-world scenarios.
Introduced with novel and effective intra- and inter-sample SSL objectives to learn robust geographical knowledge from unlabeled records.
Illustrate a developed system of DoRA and the real-world industrial scenarios for cities and towns with extremely limited transactions.

Graph-Based Learning
                  Look Around! A Neighbor Relation Graph Learning Framework for Real Estate Appraisal (PAKDD-24)
                
AccomplishmentsIncorporate the relationship between the target transaction and neighbors with an attention mechanism
Utilize the neighbors’ price information to predict a preliminary value
Introduce dynamic predictor to model the price of target transactions with different characteristics