Hi, I'm Wei-Wei Du.
A
Passionate machine learning researcher with a business mindset, driven to solve real-world challenges through data and innovation.
About
I am a Machine Learning Researcher at Sony, focusing on user modeling tasks. Before joining Sony, I worked as a Machine Learning Scientist Intern at Appier, developing real-time bidding models. Before graduation, my research spanned several domains, including property valuation (self-supervised learning for few-shot scenarios, graph neural networks), natural language processing (depression detection, multi-modal fact-checking), and Sports AI, with multiple publications on these topics. This year, I published a comprehensive survey on self-supervised learning to make a broader impact on the academic community. With over four years of experience in the field of Machine Learning, I am passionate about making an impact by applying data-driven solutions to real-world applications and continuously seeking new knowledge. In my free time, I enjoy photography, cycling, and practicing yoga.
- Programming: Python, Linux, Shell, R, SQL, PySpark, Java, C++
- Languages: English, Mandarin
- Tools & Technologies: Git, Docker, AWS, GCP
Experience
- Develop personalization techniques, including recommender system and causal inference, to improve user engagement.
- Research on irregular time intervals in time-series data and application for the adaptation of LLMs to recommendation tasks.
- Implemented an ensemble real-time bidding model with new data-driven features from 10M+ e-commerce click stream data that achieved 2x performance in production.
- Conducted tree analysis and feature importance with SHAP to analyze model behavior.
- Cooperated with 3 data scientists to build the RTB model for re-engagement campaigns.
Research Papers

A Survey on Self-Supervised Learning for Non-Sequential Tabular Data (ACML-24)
- The first comprehensive survey of recent advancements in SSL4NS-TD, consisting of problem definitions, taxonomy, application issues, NS-TD datasets, and evaluation protocols
- Evaluate representative SSL4NS-TD approaches of each learning category on the most recent and large-scale benchmark, TabZilla.
- Highlight the addressed challenges and future directions in SSL for the existing NS-TD methods.

Benchmarking Stroke Forecasting with Stroke-Level Badminton Dataset (IJCAI-24 Demo)

Ensemble Models with VADER and Contrastive Learning for Detecting Signs of Depression from Social Media (ACL-22 Workshop)

Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact Verification (AAAI-23 Workshop)

Dora: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal (CIKM-23)
- The first work focusing on low-resource real estate appraisal, which meets the needs of real-world scenarios.
- Introduced with novel and effective intra- and inter-sample SSL objectives to learn robust geographical knowledge from unlabeled records.
- Illustrate a developed system of DoRA and the real-world industrial scenarios for cities and towns with extremely limited transactions.

Look Around! A Neighbor Relation Graph Learning Framework for Real Estate Appraisal (PAKDD-24)
Education
National Yang Ming Chiao Tung University
Advanced Database System Lab, Advisor: Prof. Wen-Chih Peng
Degree: Master of Data Science and Engineering
- Recommender System
- Natural Language Processing
- Explainable AI
- Self-supervised Learning
Research Interests:
Data Lab, Advisor: Prof. Shan-Hung Wu
Degree: Bachelor of Quantitative Finance and Computer Science
- Natural Language Processing
- Deep Learning
- Machine Learning
- Statistical Learning
- Database System
Relevant Courseworks: