About Me

Hi, I'm Chia-Hsuan Hsu (@tongyu0924), an undergraduate student majoring in Computer Science and Information Engineering at National Taiwan University of Science and Technology (NTUST). In 2023 and 2024, I served as a Core Team Member at Google Developer Groups (GDG) On Campus NTUST.

My research focuses on computer vision (CV), large language models (LLMs), and multimodal learning. I'm passionate about knowledge sharing and actively engaged in the developer community. I also contribute to open-source projects in AI and machine learning, including microsoft/autogen, huggingface/diffusers, and scikit-learn.

I’m currently conducting research at Far Eastern Memorial Hospital, focusing on Large Language Models (LLMs) and their integration with reinforcement learning (RL) and agent-based reasoning over electronic health records (EHRs).

Previously, I worked on diffusion-based generative models at Academia Sinica, and researched camera-based visual localization systems (e.g., VO/SLAM) at HIS-Lab (Human-computer interaction and Imaging Systems), NTUST.

I also work part-time as an AI Engineer at Toppan Security, a Japanese company that specializes in digital identity verification solutions.

In addition, I'm a co-developer of the open-source project spy-search, which has received over 300 GitHub stars and 40 forks.

Feel free to contact me. I'm always open to collaboration and meaningful exchange!

Currently Collaborating

Spy-Search: A Modular Multimodal Agent for Fast Research QA – Co-developing a research project focused on an AI research assistant platform (★300+) that supports natural language queries, multimodal document QA, and tool-augmented reasoning by integrating both text and non-text modalities using LangChain and LLMs.

Publications

VSTFusion-VO: Monocular Visual Odometry with Video Swin Transformer Multimodal Fusion.
Chia-Hsuan Hsu, Hsin-Chun Lin, Sin-Ye Jhong, Hui-Che Hsu, Ming-Xian Hong, and Yung-Yao Chen
IEEE International Conference on Advanced Robotics and Intelligent Systems (ARIS), 2025. [Paper] [Code]

Projects

AI / Machine Learning

Spy-Search – Modular multimodal assistant for retrieval and QA, using LangChain and LLMs.
VSTFusion-VO – Visual odometry using Video Swin Transformer and depth fusion. Accepted at IEEE ARIS 2025.
GPT Multi-Role Debate with Speech – Multi-agent autonomous debate simulator with speech output.
ORB-based Visual Odometry – VO system using ORB features and RANSAC for pose estimation.
Secure Diffusion Watermarking Survey – A study of traceability techniques in diffusion models.

Software Engineering

NTUST Order System – Real-time food ordering platform built with Go and Firebase.
ChessBoard – Interactive chess game in C++ using ImGui with sound and real-time stats.

Open Source Contributions

Major contributions:

Microsoft AutoGen
- PR #6057
Hugging Face Diffusers

Other notable contributions:

一些高中的東西

111台科資工面試 – 台科資工面試題目 & 心得
圖論 – 歐拉路徑&漢米爾頓路徑
AP325圖論 – AP325圖論程式碼

徐家萱