Haichao Zhang

I am a fourth-year Ph.D. candidate in Computer Engineering at Northeastern University, USA, where I am part of the SMILE Lab, fortunate to be advised by Professor Yun Raymond Fu (Member of the Academy of Europe; Fellow of NAI, AAAS, AAAI, IEEE). My research advances generative AI for computer vision, including Visual Language Models, Video Understanding & Generation, World Models, and Trajectory & Motion Prediction/Planning. I am particularly interested in bridging these areas to enable token-efficient dense video understanding, AI-driven content creation, and autonomous systems or robotics.

Prior to my Ph.D., I obtained my M.Sc. degree from Zhejiang University (ZJU). During my graduate studies, I was also a visiting student and remote intern at The Chinese University of Hong Kong (CUHK) and The University of California, San Diego (UCSD).

Beyond academia, I have industry experience in both applied and fundamental AI research. I am currently a Research Scientist Intern at Meta Reality Labs Research (Sep–Dec 2025), working on world models and VLMs. Previously, I worked with LinkedIn (Video AI, Summer 2025) on VLMs and recommendation, and was an Applied Scientist Intern at Amazon AWS AI Lab (Summer 2024) on video understanding and video large language models. Earlier, I was a research intern at Tencent (腾讯), focusing on generative models for images and videos.

I am actively seeking a research internship for Spring/Summer 2026 and open to collaborations. If you are interested in working together, feel free to reach out!

Email Me / Twitter / LinkedIn / GitHub / GoogleScholar / CV

News

2025.09: Our paper VQToken(Extreme Token Reduction) has been accepted to NeurIPS 2025.

2025.08: I joined Meta as a Research Scientist Intern under Reality Labs Research.

2025.05: I joined Microsoft’s LinkedIn Video AI as a Research Intern in Video GenAI.

2024.05: I joined Amazon AWS AI Labs as an Applied Scientist Intern this summer.

2024.02: Our paper "Out-of-Sight Trajectory Prediction" has been accepted at CVPR.

2023.09: I received the ACM SIGMM travel award.

2023.08: Our paper "Layout Sequence Prediction From Noisy Mobile Modality" has been accepted at ACM MM.

2022.09: I joined SMILE Lab at Northeastern University.

Research (First-Author)

My research lies in Computer Vision and Artificial Intelligence. Aims to explore the potential of GenAI for Video LLMs, World Models, and Motion Prediction &Planning.

	Dense Video Understanding with Gated Residual Tokenization (DIVE Benchmark) Haichao Zhang, Wenhao Chai, Shwai He, Ang Li, Yun Fu Preprint, 2025. Benchmark & DIVE code released; model & model (GRT) code coming soon. Cite Copy @article{zhang2025dive, title={Dense Video Understanding with Gated Residual Tokenization}, author={Zhang, Haichao and Chai, Wenhao and He, Shwai and Li, Ang and Fu, Yun}, journal={arXiv preprint arXiv:2509.14199}, year={2025} }
	VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models Haichao Zhang, Yun Fu Extreme Token Reduction for Video LLM. the 39th Conference on Neural Information Processing Systems NeurIPS 2025) Cite Copy @inproceedings{zhang2025vqtoken, title = {VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models}, author = {Haichao Zhang and Yun Fu}, booktitle = {Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS)}, year = {2025} }
	OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising Haichao Zhang, Yi Xu, Hongsheng Lu, Takayuki Shimizu, Yun Fu (First work on out-of-sight trajectory prediction.) IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR'24) Cite Copy @inproceedings{zhang2024oostraj, title={OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising}, author={Zhang, Haichao and Xu, Yi and Lu, Hongsheng and Shimizu, Takayuki and Fu, Yun}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={14802--14811}, year={2024} }
	Layout Sequence Prediction From Noisy Mobile Modality (See Beyond Vision: Denoising Diffusion Model for Layout Trajectory Prediction from Noisy Mobile Modality) Haichao Zhang, Yi Xu, Hongsheng Lu, Takayuki Shimizu, Yun Fu 31st ACM International Conference on Multimedia (ACM MM'23) Cite Copy @inproceedings{zhang2023layout, title={Layout sequence prediction from noisy mobile modality}, author={Zhang, Haichao and Xu, Yi and Lu, Hongsheng and Shimizu, Takayuki and Fu, Yun}, booktitle={Proceedings of the 31st ACM International Conference on Multimedia}, pages={3965--3974}, year={2023} }
	Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection Haichao Zhang, Can Qin, Yu Yin, Yun Fu Cite Copy @article{zhang2023camouflaged, title={Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection}, author={Zhang, Haichao and Qin, Can and Yin, Yu and Fu, Yun}, journal={arXiv preprint arXiv:2308.06701}, year={2023} }
	Sketch Me A Video Haichao Zhang, Gang Yu, Tao Chen, Guozhong Luo Cite Copy @article{zhang2021sketch, title={Sketch Me A Video}, author={Zhang, Haichao and Yu, Gang and Chen, Tao and Luo, Guozhong}, journal={arXiv preprint arXiv:2110.04710}, year={2021} }
	Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment Haichao Zhang, Youcheng Ben, Weixi Zhang, Tao Chen, Gang Yu, Bin Fu Cite Copy @article{zhang2021fine, title={Fine-grained identity preserving landmark synthesis for face reenactment}, author={Zhang, Haichao and Ben, Youcheng and Zhang, Weixi and Chen, Tao and Yu, Gang and Fu, Bin}, journal={arXiv preprint arXiv:2110.04708}, year={2021} }
	Restore DeepFakes Video Frames by Identifying Individual Motion Styles Haichao Zhang, Zhe-Ming Lu, Hao Luo, Ya-Pei Feng Electronics Letters Cite Copy @article{zhang2021restore, title={Restore DeepFakes video frames via identifying individual motion styles}, author={Zhang, Haichao and Lu, Zhe-Ming and Luo, Hao and Feng, Ya-Pei}, journal={Electronics Letters}, volume={57}, number={4}, pages={183--186}, year={2021}, publisher={Wiley Online Library} }

Some Very Old & Irrelevant Projects

Several years ago, I delved into the fascinating world of sensor modalities and signal processing, sparking a keen interest in embedded platforms. That experience led me to explore further into artificial intelligence and computer vision.

	Wheelchair Control System via analysis eye-blinking EMG and EEG Provincial Grand Prize at the *Challenge Cup* Competition of Science Achievement in China project video Mar. 2017 Proposed to detect eye blink EMG noise mixed in EEG signal, which uses the intense eye blink signal to control the direction of wheelchairs, while analysis EEG to predict tension and relaxation degree to control the speed of the wheelchair. An affordable solution for paralyzed patients to control their wheelchairs and move independently.
	Low power abnormal ECG detection system based on MSP430 National First Prize at National Biomedical Engineering Innovative Design Competition project video Nov. 2016 Responsible for developing upper computer software which received and filtered signals in the spectral domain from the MSP430 PCB board and developing an algorithm to detect the abnormal ECG.
	Sign language recognition system of wearable bending sensor gloves First Prize at Mobile Application Innovation Contest of North China Jul. 2016 Responsible for programming the embedding microprocessor to sample the analog signal of the bending sensor on the gloves, which is used to predict the sign language, and showing prediction results on the app.
	Vision-based paper money and coin sorting machine Summer 2015 Responsible for programming the embedding microprocessors to control the mechanical structure and developing upper machine software to detect the kind of paper money in traditional image processing method, then sort them.
	Multimedia Information Hiding Technology of Unstructured Data Alibaba-ZJU Joint Research Institute of Frontier Technologies Research Project Summer 2018 Responsible for developing C++ software "Shared Memory Based Code Hiding Platform. Particpated in video stream watermarking algorithm.

Experiences

Meta \| Reality Labs Research, Redmond, WA Research Scientist Intern, Sep. 2025 ~ Dec. 2025 Mentor: Dr. Chuan Qin, Prof. Stefan Scherer Focus Areas: Vision-Language Models (VLM), World Model, Multimodal Learning
LinkedIn \| Video AI, Mountain View, CA Research Intern in Video GenAI, May. 2025 ~ Aug. 2025 Mentor: Dr. Yao Lu, Dr. Lichen Wang Focus Areas: Vision-Language Models (VLM), Video Recommendation
Amazon \| AWS AI Lab, Bellevue, WA Applied Scientist Intern, Jun. 2024 ~ Aug. 2024
Toyota InfoTech Lab, Mountain View, CA Part-time Research, Feb. 2023 ~ Nov. 2023
Tencent, Shanghai, China Research Scientist Intern, Nov. 2020 ~ May. 2021
SMILE Lab, Northeastern University, Boston Graduate Student, Sep. 2022 ~ Now Supervisor: Prof. Yun Raymond Fu
University of California, San Diego Summer Intern, May. 2021 ~ Sep. 2021 Supervisor: Prof. Xiaolong Wang
the Chinese University of Hong Kong, Shenzhen Visiting Student, Feb. 2020 ~ May. 2020 Supervisor: Prof. Xiaoguang Han
Zhejiang University, Hangzhou China Master Student, Sep. 2018 ~ Mar. 2021 Supervisor: Prof. Zhe-Ming Lu

Selected Honors& Awards

NeurIPS Scholar Award
ACM MM Travel Grant Award	ACM SIGMM
National Biomedical Engineering Innovative Design Competition	National First Prize
Challenge Cup Competition of Science Achievement in China	Provincial Grand Prize
Mobile Application Innovation Contest of North China	Provincial First Prize
'Holtek cup' microcontroller application and design competition, Tianjin (6/453, < 1.3%)	Provincial First Prize
Tianjin IOT Innovation and Engineering Application Design Competition	Provincial First Prize
Tianjin Undergraduate Robotics Competition	Provincial First Prize
Tianjin International Student Internet Innovation and Entrepreneurship Competition	Provincial Second Prize
Northern China Robotics Competition	Provincial Second Prize

Academic Service

Reviewer

• Conference: NeurIPS’23, NeurIPS’24, NeurIPS’25, IJCAI'25, MM’24, ICLR’24, ECCV'24, ICCV’25, ICML’25, AISTATS’24, WACV'25

• Journal: TPAMI, TIP, PR, TKDD, MTA, TIV