Jinjie Ni

a.k.a. Oliver

I'm an AI Researcher and individual contributor at NUS working with Prof. Michael Shieh. I research next-generation modeling paradigms and build scalable foundation model systems.

I'm enjoying these topics at the moment:

Large language models pretraining and scaling;
Model architectures;
Diffusion language models;
Reinforcement learning for LLM reasoning.

Google Scholar / X (Twitter) / Blog Posts / GitHub / LinkedIn / Zhi Hu / Email

I'm on the job market! Feel free to drop me an email or reach out on X.

	Featured Research
	* means equal contribution.


	Large Language Models Pretraining, Scaling, and Architectures

2025.10	>> Quokka
	- Training Optimal Large Diffusion Language Models
	- [paper][tweet][github][training backend][resources]
	- Jinjie Ni, Qian Liu, Chao Du, Longxu Dou, Hang Yan, Zili Wang, Tianyu Pang, Michael Qizhe Shieh.
	- The first-ever large-scale scaling law for diffusion language models, covering both compute-optimal and data-constrained settings, and experimenting with extensive modeling & optimization designs, up to 11B model params, 260B tokens, 24000+ runs. Quokka is a good friend of Chinchilla while being more comprehensive.

	>> OpenMoE 2
	- OpenMoE 2: Sparse Diffusion Language Models
	- [blog][tweet][github][training backend][resources]
	- Jinjie Ni and team.
	- The first-ever sparse diffusion large language model trained from scratch, focusing on architectural insights.

	>> Diffusion Language Models are Super Data Learners
	- Diffusion Language Models are Super Data Learners
	- [paper][blog][tweet1][tweet2][github][training backend][resources]
	- Jinjie Ni, Qian Liu, Longxu Dou, Chao Du, Zili Wang, Hang Yan, Tianyu Pang, Michael Qizhe Shieh.
	- The first work empirically showing diffusion language models have much higher data potential compared with autoregressive ones at scale (up to 8B parameters, 1.5T tokens, 480 epochs). Clear crossovers are seen across model sizes, data budgets, data qualities, model sparsities, etc.

	>> MegaDLMs
	- MegaDLMs: Training Diffusion Language Models at Any Scale
	- [github]
	- Jinjie Ni.
	- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 training. MegaDLMs offers up to 47% Model FLOP Utilization (MFU) and 3× faster training speed compared with other frameworks.

2024	>> OpenMoE
	- OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
	- [paper][tweet][github][resources]
	- ICML 2024 (Poster)
	- Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You
	- The first fully open Decoder-only MoE LLM trained from scratch.

2022	>> GHA
	- Finding the Pillars of Strength for Multi-head Attention.
	- [paper][github]
	- ACL 2023 main track (Poster)
	- Jinjie Ni, Rui Mao, Zonglin Yang, Han Lei, Erik Cambria
	- Cutting off redundancy for attention layers. SOTA efficiency and performance among efficient transformers. Concurrent work of GQA, cited and discussed in the GQA paper.


	Reinforcement Learning for LLM Reasoning

2025	>> NoisyRollout
	- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
	- [paper][tweet1][tweet2][github][resources]
	- NeurIPS 2025 (Poster)
	- Xiangyan Liu, Jinjie Ni, Zijian Wu*, Chao Du, Longxu Dou, Haonan Wang, Tianyu Pang, Michael Qizhe Shieh.
	- A simple, zero-cost method, NoisyRollout, that improves visual-language reinforcement learning and achieves state-of-the-art reasoning capabilities.

	>> SynthRL
	- SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis
	- [paper][tweet1][tweet2][github][resources]
	- Zijian Wu, Jinjie Ni, Xiangyan Liu*, Zichen Liu, Hang Yan, Michael Qizhe Shieh
	- SynthRL is a scalable and guaranteed method that automatically synthesizes verifiably correct and more challenging RL training samples at scale for visual reasoning models, validated by consistent and significant performance gains across reasoning benchmarks.


	Large Language Models Evaluation

2024	>> MixEval
	- MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures
	- [paper][tweet][github][resources]
	- NeurIPS 2024 main track (Poster)
	- Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You
	- A method to build golden-standard LLM evaluation from off-the-shelf benchmark mixtures. Maps real-world task distribution to offline benchmarks. The best LLM evaluation at the time of release for its SOTA model ranking accuracy (0.96 correlation with Chatbot Arena) and efficiency (6% the time and cost of running MMLU). It’s easily refreshable to be dynamic.

	>> MixEval-X
	- MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
	- [paper][tweet][github][resources]
	- ICLR 2025 (Spotlight)
	- Jinjie Ni, Yifan Song, Deepanway Ghosal, Bo Li, David Junhao Zhang, Xiang Yue, Fuzhao Xue, Zian Zheng, Kaichen Zhang, Mahir Shah, Kabir Jain, Yang You, Michael Qizhe Shieh
	- MixEval-X is the first any-to-any, real-world benchmark featuring diverse input-output modalities, real-world task distributions, consistent high standards across modalities, and dynamism. It achieves up to 0.98 correlation with arena-like multi-modal evaluations while being way more efficient.

	Experiences
	Academia
2023	National University of Singapore 2023 - now
	Research Fellow
	- Foundation Models.

2020	Nanyang Technological University 2020 - 2023
	Ph.D. in Computer Science
	- Efficient Language Models and Dialogue Systems.

2016	Northwestern Polytechnical University 2016 - 2020
	B.Eng. in Electrical Engineering
	- Multimodal Models.

	Industry
2025	SEA AI Lab, Singapore 2024.10 - now
	Research Associate
	- Work on LLM pretraining and architectures, (multi-modal) reinforcement learning for reasoning, and diffusion language models.

2022	DAMO Academy, Alibaba Group, Singapore 2022.04 - 2022.10
	Research Intern
	- Work on modality alignment for pre-trained models.

	Activities
	Teaching
2021	NTU-SC1003: Introduction to Computational Thinking and Programming
	Teaching Assistant

	NTU-CE2100: Probability and Statistics for Computing
	Lecturer

2020	NTU-CE1113: Physics for Computing
	Teaching Assistant

	NTU-CZ2007: Introduction To Databases
	Teaching Assistant

	NTU-CZ2004: Human Computer Interaction
	Teaching Assistant

	Services

	Conference Reviewer	Neurips 2025, ICML 2025, ICLR 2025, Neurips 2024, ACL 2024, EMNLP 2024, ACL 2023, EMNLP 2023, AAAI 2023, ICASSP 2023

	Journal Reviewer	Knowledge-Based Systems, Information Fusion, Artificial Intelligence Review, Cognitive Computation

	Co-organizer	MLNLP community

Jinjie Ni

Featured Research

Experiences

Academia

Industry

Activities

Teaching

Services