I am a third-year Ph.D. candidate in Computer Engineering at Northeastern University, USA, where I am part of the SMILE Lab, fortunate to be advised by Professor Yun Raymond Fu (Member of the Academy of Europe; Fellow of NAI, AAAS, AAAI, IEEE, AIMBE, OSA, SPIE, IAPR, AAIA). My research focuses on generative models, including Visual Language Models, Video Generation, and Trajectory & Motion Synthesis/Prediction. I am particularly interested in bridging these areas to enhance efficient video understanding, AI-driven content creation and autonomous systems.
Beyond academia, I have industry experience in applied AI research. Most recently, I was an Applied Scientist Intern at Amazon AWS AI Lab, where I worked on video understanding and large language models. Previously, I was a research intern at Tencent (腾讯), focusing on generative models for images and videos.
I am actively seeking a research internship for 2025 and open to collaborations. If you are interested in working together, feel free to reach out!
2024.05: I joined Amazon AWS AI Labs as an Applied Scientist Intern this summer.
2024.02: Our paper "Out-of-Sight Trajectory Prediction" has been accepted at CVPR.
2023.09: I received the ACM SIGMM travel award.
2023.08: Our paper "Layout Sequence Prediction From Noisy Mobile Modality" has been accepted at ACM MM.
2022.09: I joined SMILE Lab at Northeastern University.
Research (First-Author)
My research lies in Computer Vision and Artificial Intelligence. Aims to explore the potential of generative models for AIGC and Trajectory Prediction.
I worked on Diffusion Models, AIGC, VLM, Video Synthesis/Editing, Image Editing, Multimodal Learning, Trajectory Prediction, NeRF, GANs.
Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models Haichao Zhang,
Zhuowei Li,
Dimitris Metaxas,
Yun Fu
Extreme Token Reduction for Video LLM.
Preprint
[arxiv] [code (Release Soon)]
📋
@article{zhang2024token,
title={Token Dynamics: Towards Efficient and Dynamic Video Token Representation for Video Large Language Models},
author={Zhang, Haichao and Li, Zhuowei and Metaxas, Dimitris and Fu, Yun},
journal={arXiv preprint arXiv:2503.16980},
year={2024}
}
IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024 (CVPR'24)
[arxiv]
[project page] [code]
📋
@inproceedings{zhang2024oostraj,
title={OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising},
author={Zhang, Haichao and Xu, Yi and Lu, Hongsheng and Shimizu, Takayuki and Fu, Yun},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14802--14811},
year={2024}
}
31st ACM International Conference on Multimedia (ACM MM'23)
[arxiv]
[project page] [video]
📋
@inproceedings{zhang2023layout,
title={Layout sequence prediction from noisy mobile modality},
author={Zhang, Haichao and Xu, Yi and Lu, Hongsheng and Shimizu, Takayuki and Fu, Yun},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={3965--3974},
year={2023}
}
@article{zhang2023camouflaged,
title={Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection},
author={Zhang, Haichao and Qin, Can and Yin, Yu and Fu, Yun},
journal={arXiv preprint arXiv:2308.06701},
year={2023}
}
@article{zhang2021sketch,
title={Sketch Me A Video},
author={Zhang, Haichao and Yu, Gang and Chen, Tao and Luo, Guozhong},
journal={arXiv preprint arXiv:2110.04710},
year={2021}
}
@article{zhang2021fine,
title={Fine-grained identity preserving landmark synthesis for face reenactment},
author={Zhang, Haichao and Ben, Youcheng and Zhang, Weixi and Chen, Tao and Yu, Gang and Fu, Bin},
journal={arXiv preprint arXiv:2110.04708},
year={2021}
}
@article{zhang2021restore,
title={Restore DeepFakes video frames via identifying individual motion styles},
author={Zhang, Haichao and Lu, Zhe-Ming and Luo, Hao and Feng, Ya-Pei},
journal={Electronics Letters},
volume={57},
number={4},
pages={183--186},
year={2021},
publisher={Wiley Online Library}
}
Some Very Old & Irrelevant Projects
Several years ago, I delved into the fascinating world of sensor modalities and signal processing, sparking a keen interest in embedded platforms. That experience led me to explore further into artificial intelligence and computer vision.
Proposed to detect eye blink EMG noise mixed in EEG signal, which uses the intense eye blink signal to control the direction of wheelchairs, while analysis EEG to predict tension and relaxation degree to control the speed of the wheelchair.
An affordable solution for paralyzed patients to control their wheelchairs and move independently.
Responsible for developing upper computer software which received and filtered signals in the spectral domain from the MSP430 PCB board and developing an algorithm to detect the abnormal ECG.
Sign language recognition system of wearable bending sensor gloves
First Prize at Mobile Application Innovation Contest of North China
Jul. 2016
Responsible for programming the embedding microprocessor to sample the analog signal of the bending sensor on the gloves, which is used to predict the sign language, and showing prediction results on the app.
Vision-based paper money and coin sorting machine
Summer 2015
Responsible for programming the embedding microprocessors to control the mechanical structure and developing upper machine software to detect the kind of paper money in traditional image processing method, then sort them.