a.k.a. Oliver
I'm an AI Researcher and individual contributor at NUS working with Prof. Michael Shieh. I research next-generation modeling paradigms and build scalable foundation model systems.
I'm enjoying these topics at the moment:
Google Scholar  /  X (Twitter)  /  Blog Posts  /  GitHub  /  LinkedIn  /  Zhi Hu  /  Email
I'm on the job market! Feel free to drop me an email or reach out on X.
Featured Research |
|
| * means equal contribution. | |
| Large Language Models Pretraining, Scaling, and Architectures | |
| 2025.10 | >> Quokka |
| - Training Optimal Large Diffusion Language Models | |
| - [paper][tweet][github][training backend][resources] | |
| - Jinjie Ni, Qian Liu, Chao Du, Longxu Dou, Hang Yan, Zili Wang, Tianyu Pang, Michael Qizhe Shieh. | |
| - The first-ever large-scale scaling law for diffusion language models, covering both compute-optimal and data-constrained settings, and experimenting with extensive modeling & optimization designs, up to 11B model params, 260B tokens, 24000+ runs. Quokka is a good friend of Chinchilla while being more comprehensive. | |
| >> OpenMoE 2 | |
| - OpenMoE 2: Sparse Diffusion Language Models | |
| - [blog][tweet][github][training backend][resources] | |
| - Jinjie Ni and team. | |
| - The first-ever sparse diffusion large language model trained from scratch, focusing on architectural insights. | |
| >> Diffusion Language Models are Super Data Learners | |
| - Diffusion Language Models are Super Data Learners | |
| - [paper][blog][tweet1][tweet2][github][training backend][resources] | |
| - Jinjie Ni, Qian Liu, Longxu Dou, Chao Du, Zili Wang, Hang Yan, Tianyu Pang, Michael Qizhe Shieh. | |
| - The first work empirically showing diffusion language models have much higher data potential compared with autoregressive ones at scale (up to 8B parameters, 1.5T tokens, 480 epochs). Clear crossovers are seen across model sizes, data budgets, data qualities, model sparsities, etc. | |
| >> MegaDLMs | |
| - MegaDLMs: Training Diffusion Language Models at Any Scale | |
| - [github] | |
| - Jinjie Ni. | |
| - GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 training. MegaDLMs offers up to 47% Model FLOP Utilization (MFU) and 3× faster training speed compared with other frameworks. | |
| 2024 | >> OpenMoE |
| - OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models | |
| - [paper][tweet][github][resources] | |
| - ICML 2024 (Poster) | |
| - Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You | |
| - The first fully open Decoder-only MoE LLM trained from scratch. | |
| 2022 | >> GHA |
| - Finding the Pillars of Strength for Multi-head Attention. | |
| - [paper][github] | |
| - ACL 2023 main track (Poster) | |
| - Jinjie Ni, Rui Mao, Zonglin Yang, Han Lei, Erik Cambria | |
| - Cutting off redundancy for attention layers. SOTA efficiency and performance among efficient transformers. Concurrent work of GQA, cited and discussed in the GQA paper. | |
| Reinforcement Learning for LLM Reasoning | |
| 2025 | >> NoisyRollout |
| - NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation | |
| - [paper][tweet1][tweet2][github][resources] | |
| - NeurIPS 2025 (Poster) | |
| - Xiangyan Liu*, Jinjie Ni*, Zijian Wu*, Chao Du, Longxu Dou, Haonan Wang, Tianyu Pang, Michael Qizhe Shieh. | |
| - A simple, zero-cost method, NoisyRollout, that improves visual-language reinforcement learning and achieves state-of-the-art reasoning capabilities. | |
| >> SynthRL | |
| - SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis | |
| - [paper][tweet1][tweet2][github][resources] | |
| - Zijian Wu*, Jinjie Ni*, Xiangyan Liu*, Zichen Liu, Hang Yan, Michael Qizhe Shieh | |
| - SynthRL is a scalable and guaranteed method that automatically synthesizes verifiably correct and more challenging RL training samples at scale for visual reasoning models, validated by consistent and significant performance gains across reasoning benchmarks. | |
| Large Language Models Evaluation | |
| 2024 | >> MixEval |
| - MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures | |
| - [paper][tweet][github][resources] | |
| - NeurIPS 2024 main track (Poster) | |
| - Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You | |
| - A method to build golden-standard LLM evaluation from off-the-shelf benchmark mixtures. Maps real-world task distribution to offline benchmarks. The best LLM evaluation at the time of release for its SOTA model ranking accuracy (0.96 correlation with Chatbot Arena) and efficiency (6% the time and cost of running MMLU). It’s easily refreshable to be dynamic. | |
| >> MixEval-X | |
| - MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures | |
| - [paper][tweet][github][resources] | |
| - ICLR 2025 (Spotlight) | |
| - Jinjie Ni, Yifan Song, Deepanway Ghosal, Bo Li, David Junhao Zhang, Xiang Yue, Fuzhao Xue, Zian Zheng, Kaichen Zhang, Mahir Shah, Kabir Jain, Yang You, Michael Qizhe Shieh | |
| - MixEval-X is the first any-to-any, real-world benchmark featuring diverse input-output modalities, real-world task distributions, consistent high standards across modalities, and dynamism. It achieves up to 0.98 correlation with arena-like multi-modal evaluations while being way more efficient. | |
Experiences |
|
Academia |
|
| 2023 |
National University of Singapore
2023 - now
|
| Research Fellow | |
| - Foundation Models. | |
| 2020 |
Nanyang Technological University
2020 - 2023
|
| Ph.D. in Computer Science | |
| - Efficient Language Models and Dialogue Systems. | |
| 2016 |
Northwestern Polytechnical University
2016 - 2020
|
| B.Eng. in Electrical Engineering | |
| - Multimodal Models. | |
Industry |
|
| 2025 |
SEA AI Lab, Singapore
2024.10 - now
|
| Research Associate | |
| - Work on LLM pretraining and architectures, (multi-modal) reinforcement learning for reasoning, and diffusion language models. | |
| 2022 |
DAMO Academy, Alibaba Group, Singapore
2022.04 - 2022.10
|
| Research Intern | |
| - Work on modality alignment for pre-trained models. | |
Activities |
|
Teaching |
|
| 2021 | NTU-SC1003: Introduction to Computational Thinking and Programming |
| Teaching Assistant | |
| NTU-CE2100: Probability and Statistics for Computing | |
| Lecturer | |
| 2020 | NTU-CE1113: Physics for Computing |
| Teaching Assistant | |
| NTU-CZ2007: Introduction To Databases | |
| Teaching Assistant | |
| NTU-CZ2004: Human Computer Interaction | |
| Teaching Assistant | |
Services |
||
| Conference Reviewer | Neurips 2025, ICML 2025, ICLR 2025, Neurips 2024, ACL 2024, EMNLP 2024, ACL 2023, EMNLP 2023, AAAI 2023, ICASSP 2023 | |
| Journal Reviewer | Knowledge-Based Systems, Information Fusion, Artificial Intelligence Review, Cognitive Computation | |
| Co-organizer | MLNLP community | |