Hi, I'm Wei-Wei Du.

A
Passionate Applied Research Scientist with a business mindset, driven to solve real-world challenges through data and innovation.

About

I am an Applied Research Scientist at Sony, working on personalization, recommender systems, and off-policy evaluation to enhance user engagement. Before joining Sony, I worked as a Machine Learning Scientist Intern at Appier, developing real-time bidding models. Before graduation, my research spanned several domains, including property valuation (self-supervised learning for few-shot scenarios, graph neural networks), natural language processing (depression detection, multi-modal fact-checking), and Sports AI, with multiple publications on these topics. This year, I published a comprehensive survey on self-supervised learning to make a broader impact on the academic community.

With over five years of experience in the field of Machine Learning, I am passionate about making an impact by applying data-driven solutions to real-world applications and continuously seeking new knowledge. In my free time, I enjoy photography, cycling, and practicing yoga.

  • Programming: Python, Linux, Shell, R, SQL, PySpark, Java, C++
  • Languages: English, Mandarin
  • Tools & Technologies: Git, Docker, AWS, GCP

Experience

Applied Research Scientist
  • Develop personalization techniques, including personalization, recommender systems, and off-policy evaluation, to improve user engagement.
  • Research on LLM-based recommender models and next-generation fan engagement techniques.
  • Collaborate with global R&D teams to support and contribute to overseas business units.
Oct 2023 - Recent | Tokyo, Japan
Machine Learning Scientist Intern
  • Implemented an ensemble real-time bidding model with new data-driven features from 10M+ e-commerce click stream data that achieved 2x performance in production.
  • Conducted tree analysis and feature importance with SHAP to analyze model behavior.
  • Cooperated with 3 data scientists to build the RTB model for re-engagement campaigns.
Jun 2022 - Nov 2022 | Taipei, Taiwan

Research Papers

recsys25
LLM-based Recommender

Not Just What, But When: Integrating Irregular Intervals to LLM for Sequential Recommendation (RecSys-25)

Accomplishments
  • The first work to integrate time intervals into LLMs for sequential recommendation.
  • Propose Interval-Infused Attention to capture the temporal relevance between items and intervals.
  • Introduce a novel perspective on the cold-start problem by considering time intervals and reveal that existing methods suffer from significant performance drops in interval cold-start scenarios.
acml24
SSL for NSTD

A Survey on Self-Supervised Learning for Non-Sequential Tabular Data (ACML-24)

Accomplishments
  • The first comprehensive survey of recent advancements in SSL4NS-TD, consisting of problem definitions, taxonomy, application issues, NS-TD datasets, and evaluation protocols
  • Evaluate representative SSL4NS-TD approaches of each learning category on the most recent and large-scale benchmark, TabZilla.
  • Highlight the addressed challenges and future directions in SSL for the existing NS-TD methods.
ijcai24
AI for Sport

Benchmarking Stroke Forecasting with Stroke-Level Badminton Dataset (IJCAI-24 Demo)

Accomplishments
  • Introduce ShuttleSet22, a stroke-level badminton singles dataset collected from realworld high-ranking matches in 2022
  • Initiated a challenge within CoachAI Badminton Challenge 2023 (https://sites.google.com/view/coachai-challenge-2023/) in conjunction with IJCAI 2023.
acl22
Depression Detection

Ensemble Models with VADER and Contrastive Learning for Detecting Signs of Depression from Social Media (ACL-22 Workshop)

Accomplishments
  • Developed an ensemble model with VADER and contrastive learning for detecting depression.
  • Won second place in 30+ teams without any auxiliary information.
aaai23
Multi-Modal Fact Checking

Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact Verification (AAAI-23 Workshop)

Accomplishments
  • Introduced a parameter-efficient large foundation model by utilizing adapters and additional features.
  • Incorporated co-attention modules for different modalities (image and text) and different types (claim and document).
  • Surpassed 25.9% compared with the official baseline.
cikm23
SSL for Few-Shot Learning

Dora: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal (CIKM-23)

Accomplishments
  • The first work focusing on low-resource real estate appraisal, which meets the needs of real-world scenarios.
  • Introduced with novel and effective intra- and inter-sample SSL objectives to learn robust geographical knowledge from unlabeled records.
  • Illustrate a developed system of DoRA and the real-world industrial scenarios for cities and towns with extremely limited transactions.
pakdd24
Graph-Based Learning

Look Around! A Neighbor Relation Graph Learning Framework for Real Estate Appraisal (PAKDD-24)

Accomplishments
  • Incorporate the relationship between the target transaction and neighbors with an attention mechanism
  • Utilize the neighbors’ price information to predict a preliminary value
  • Introduce dynamic predictor to model the price of target transactions with different characteristics

Education

National Yang Ming Chiao Tung University

Advanced Database System Lab, Advisor: Prof. Wen-Chih Peng

Degree: Master of Data Science and Engineering

    Research Interests:

    • Recommender System
    • Natural Language Processing
    • Explainable AI
    • Self-supervised Learning

National Tsing Hua University

Data Lab, Advisor: Prof. Shan-Hung Wu

Degree: Bachelor of Quantitative Finance and Computer Science

    Relevant Courseworks:

    • Natural Language Processing
    • Deep Learning
    • Machine Learning
    • Statistical Learning
    • Database System

Contact