About Me
Hi, I'm Chia-Hsuan Hsu (@tongyu0924), an undergraduate student majoring in Computer Science and Information Engineering at National Taiwan University of Science and Technology (NTUST). In 2023 and 2024, I served as a Core Team Member at Google Developer Groups (GDG) On Campus NTUST.
My research focuses on computer vision (CV), large language models (LLMs), and multimodal learning. I'm passionate about knowledge sharing and actively engaged in the developer community. I also contribute to open-source projects in AI and machine learning, including microsoft/autogen, huggingface/diffusers, and scikit-learn.
I’m currently conducting research at Far Eastern Memorial Hospital, focusing on Large Language Models (LLMs) and their integration with reinforcement learning (RL) and agent-based reasoning over electronic health records (EHRs).
Previously, I worked on diffusion-based generative models at Academia Sinica, and researched camera-based visual localization systems (e.g., VO/SLAM) at HIS-Lab (Human-computer interaction and Imaging Systems), NTUST.
I also work part-time as an AI Engineer at Toppan Security, a Japanese company that specializes in digital identity verification solutions.
In addition, I'm a co-developer of the open-source project spy-search, which has received over 300 GitHub stars and 40 forks.
Feel free to contact me. I'm always open to collaboration and meaningful exchange!
Currently Collaborating
- Spy-Search: A Modular Multimodal Agent for Fast Research QA – Co-developing a research project focused on an AI research assistant platform (★300+) that supports natural language queries, multimodal document QA, and tool-augmented reasoning by integrating both text and non-text modalities using LangChain and LLMs.
Publications
-
VSTFusion-VO: Monocular Visual Odometry with Video Swin Transformer Multimodal Fusion.
Chia-Hsuan Hsu, Hsin-Chun Lin, Sin-Ye Jhong, Hui-Che Hsu, Ming-Xian Hong, and Yung-Yao Chen
IEEE International Conference on Advanced Robotics and Intelligent Systems (ARIS), 2025. [Paper] [Code]
Projects
AI / Machine Learning
- Spy-Search – Modular multimodal assistant for retrieval and QA, using LangChain and LLMs.
- VSTFusion-VO – Visual odometry using Video Swin Transformer and depth fusion. Accepted at IEEE ARIS 2025.
- GPT Multi-Role Debate with Speech – Multi-agent autonomous debate simulator with speech output.
- ORB-based Visual Odometry – VO system using ORB features and RANSAC for pose estimation.
- Secure Diffusion Watermarking Survey – A study of traceability techniques in diffusion models.
Software Engineering
- NTUST Order System – Real-time food ordering platform built with Go and Firebase.
- ChessBoard – Interactive chess game in C++ using ImGui with sound and real-time stats.
Open Source Contributions
- The Turing Way – Outside collaborator
- Scikit-learn
- Argilla-io Distilabel
- TensorZero
- Ersilia
- Harmony