Hi, I'm Wei-Wei Du.

A
Passionate machine learning researcher with a business mindset, driven to solve real-world challenges through data and innovation.

About

I am a Machine Learning Researcher at Sony, focusing on user modeling tasks. Before joining Sony, I worked as a Machine Learning Scientist Intern at Appier, developing real-time bidding models. Before graduation, my research spanned several domains, including property valuation (self-supervised learning for few-shot scenarios, graph neural networks), natural language processing (depression detection, multi-modal fact-checking), and Sports AI, with multiple publications on these topics. This year, I published a comprehensive survey on self-supervised learning to make a broader impact on the academic community. With over four years of experience in the field of Machine Learning, I am passionate about making an impact by applying data-driven solutions to real-world applications and continuously seeking new knowledge. In my free time, I enjoy photography, cycling, and practicing yoga.

  • Programming: Python, Linux, Shell, R, SQL, PySpark, Java, C++
  • Languages: English, Mandarin
  • Tools & Technologies: Git, Docker, AWS, GCP

Experience

Machine Learning Researcher
  • Develop personalization techniques, including recommender system and causal inference, to improve user engagement.
  • Research on irregular time intervals in time-series data and application for the adaptation of LLMs to recommendation tasks.
Oct 2023 - Recent | Tokyo, Japan
Machine Learning Scientist Intern
  • Implemented an ensemble real-time bidding model with new data-driven features from 10M+ e-commerce click stream data that achieved 2x performance in production.
  • Conducted tree analysis and feature importance with SHAP to analyze model behavior.
  • Cooperated with 3 data scientists to build the RTB model for re-engagement campaigns.
Jun 2022 - Nov 2022 | Taipei, Taiwan

Research Papers

music streaming app
SSL for NSTD

A Survey on Self-Supervised Learning for Non-Sequential Tabular Data (ACML-24)

Accomplishments
  • The first comprehensive survey of recent advancements in SSL4NS-TD, consisting of problem definitions, taxonomy, application issues, NS-TD datasets, and evaluation protocols
  • Evaluate representative SSL4NS-TD approaches of each learning category on the most recent and large-scale benchmark, TabZilla.
  • Highlight the addressed challenges and future directions in SSL for the existing NS-TD methods.
music streaming app
AI for Sport

Benchmarking Stroke Forecasting with Stroke-Level Badminton Dataset (IJCAI-24 Demo)

Accomplishments
  • Introduce ShuttleSet22, a stroke-level badminton singles dataset collected from realworld high-ranking matches in 2022
  • Initiated a challenge within CoachAI Badminton Challenge 2023 (https://sites.google.com/view/coachai-challenge-2023/) in conjunction with IJCAI 2023.
music streaming app
Depression Detection

Ensemble Models with VADER and Contrastive Learning for Detecting Signs of Depression from Social Media (ACL-22 Workshop)

Accomplishments
  • Developed an ensemble model with VADER and contrastive learning for detecting depression.
  • Won second place in 30+ teams without any auxiliary information.
music streaming app
Multi-Modal Fact Checking

Parameter-Efficient Large Foundation Models with Feature Representations for Multi-Modal Fact Verification (AAAI-23 Workshop)

Accomplishments
  • Introduced a parameter-efficient large foundation model by utilizing adapters and additional features.
  • Incorporated co-attention modules for different modalities (image and text) and different types (claim and document).
  • Surpassed 25.9% compared with the official baseline.
music streaming app
SSL for Few-Shot Learning

Dora: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal (CIKM-23)

Accomplishments
  • The first work focusing on low-resource real estate appraisal, which meets the needs of real-world scenarios.
  • Introduced with novel and effective intra- and inter-sample SSL objectives to learn robust geographical knowledge from unlabeled records.
  • Illustrate a developed system of DoRA and the real-world industrial scenarios for cities and towns with extremely limited transactions.
music streaming app
Graph-Based Learning

Look Around! A Neighbor Relation Graph Learning Framework for Real Estate Appraisal (PAKDD-24)

Accomplishments
  • Incorporate the relationship between the target transaction and neighbors with an attention mechanism
  • Utilize the neighbors’ price information to predict a preliminary value
  • Introduce dynamic predictor to model the price of target transactions with different characteristics

Education

National Yang Ming Chiao Tung University

Advanced Database System Lab, Advisor: Prof. Wen-Chih Peng

Degree: Master of Data Science and Engineering

    Research Interests:

    • Recommender System
    • Natural Language Processing
    • Explainable AI
    • Self-supervised Learning

National Tsing Hua University

Data Lab, Advisor: Prof. Shan-Hung Wu

Degree: Bachelor of Quantitative Finance and Computer Science

    Relevant Courseworks:

    • Natural Language Processing
    • Deep Learning
    • Machine Learning
    • Statistical Learning
    • Database System

Contact