Shulin He

Shulin He (何树林) is currently a Postdoctoral Researcher with the SUSTech Audio Intelligence Lab (SAIL), Department of Computer Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China, working with Dr. Zhongqiu Wang.

He received the B.Eng. degree in Computer Science from Inner Mongolia University (IMU), Hohhot, China, in 2019 and has been enrolled in a combined master-to-Ph.D. program at IMU since the same year, supervised by Prof. Xueliang Zhang in the IMUSpeech group.

From June 2021 to September 2021, he was a visiting researcher at the Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. From May 2022 to May 2023, he joined the Tencent Rhino-Bird Elite Talent Program as a joint Ph.D. trainee and received its Excellent Student Award. From October 2023 to April 2024, he was a visiting Ph.D. student in the Division of Emerging Interdisciplinary Areas, Hong Kong University of Science and Technology. He is also a recipient of the inaugural CAST Young Talent Support Program (Doctoral Student Track).

His research focuses on target speaker extraction, speech enhancement, and deep learning for speech and audio. His work has been published in top-tier journals and conferences such as Neural Networks, ICASSP, INTERSPEECH, and APSIPA ASC.

Education

Ph.D. Candidate, School of Computer Science, Inner Mongolia University
• Sep 2021 – Jun 2025
• Supervised by Prof. Xueliang Zhang

Visiting Ph.D. Student, Division of Emerging Interdisciplinary Areas (EMIA), Hong Kong University of Science & Technology
• Oct 2023 – Apr 2024
• Worked with Prof. Wei Xue on open-domain sound extraction

Joint Ph.D. Trainee, Tencent Rhino-Bird Elite Talent Program
• May 2022 – May 2023
• Recipient of the Excellent Student Award

Visiting Researcher, Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
• Jun 2021 – Sep 2021
• Worked with Prof. Wenju Liu; co-advisors Nie Shuai and Liang Shan on speech anti-spoofing research

M.Eng. (fast-track), School of Computer Science, Inner Mongolia University
• Sep 2019 – Jun 2021
• Supervised by Prof. Xueliang Zhang

B.Eng., Inner Mongolia University
• Sep 2015 – Jul 2019

Research Interests

Target speaker extraction
Speech enhancement
Deep learning for speech & audio

Publications

P Shen, K Chen, S He, P Chen, S Yuan, H Kong, X Zhang, ZQ Wang. Listen to Extract: Onset-Prompted Target Speaker Extraction. IEEE Transactions on Audio, Speech and Language Processing 33, 4832-4843 . 2025.
T Ling, S He, P Shen, ZQ Wang. MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism. arXiv preprint arXiv:2510.15437 . (arXiv 2025)
S He, ZQ Wang. VM-UNSSOR: Unsupervised Neural Speech Separation Enhanced by Higher-SNR Virtual Microphone Arrays. arXiv preprint arXiv:2510.08914 . (arXiv 2025)
J Sun, S He, R Pang, ZQ Wang. Neural Forward Filtering for Speaker-Image Separation. arXiv preprint arXiv:2510.05757 . (arXiv 2025)
K Li, G Chen, W Sang, Y Luo, Z Chen, S Wang, S He, ZQ Wang, A Li, .... Advances in speech separation: Techniques, challenges, and future trends. arXiv preprint arXiv:2508.10830 . (arXiv 2025)
S He, W Xue, Y Yang, H Zhang, J Pan, X Zhang. Enhancing target speaker extraction with hierarchical speaker representation learning. Neural Networks 188, 107388 . (NN 2025)
F Zhao, S He, X Zhang. Room Impulse Response as a Prompt for Acoustic Echo Cancellation. arXiv preprint arXiv:2505.19480 . (arXiv 2025)
Z Li, S He, X Zhang. Robust target speaker direction of arrival estimation. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2025)
X Liu, S He, X Zhang. HWB-Net: A Novel High-Performance and Efficient Hybrid Waveform Bandwidth Extension Method. Proc. Interspeech 2025, 4088-4092 . (INTERSPEECH 2025)
Z Li, S He, J Bai, X Zhang. TF-SkiMNet: Speech Enhancement Based on Inplace Modeling and Skipping Memory in Time-Frequency Domain. Proc. Interspeech 2025, 5143-5147 . (INTERSPEECH 2025)
P Shen, S He, X Zhang. ExARN: Target Speaker Extraction. Man-Machine Speech Communication: 19th National Conference, NCMMSC 2024 … . 2024.
Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun, J Pan, W Bian, S He, W Xue, .... Flashspeech: Efficient zero-shot speech synthesis. Proceedings of the 32nd ACM International Conference on Multimedia, 6998-7007 . (ACM MM 2024)
P Shen, S He, X Zhang. ExARN: Target Speaker Extraction with Attentive Recurrent Networks. National Conference on Man-Machine Speech Communication, 238-249 . (NCMMSC 2024)
S He, H Zhang, W Rao, K Zhang, Y Ju, Y Yang, X Zhang. Hierarchical speaker representation for target speaker extraction. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2024)
C Zhao, S He, X Zhang. Sicrn: Advancing speech enhancement through state space model and inplace convolution techniques. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2024)
S He, J Liu, H Li, Y Yang, F Chen, X Zhang. 3S-TSE: Efficient three-stage target speaker extraction for real-time and low-resource applications. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2024)
F Zhao, C Zhang, S He, J Liu, X Zhang. Deep Echo Path Modeling for Acoustic Echo Cancellation.. Interspeech . (INTERSPEECH 2024)
T Wu, S He, J Pan, H Huang, Z Mo, X Zhang. Unified Audio Visual Cues for Target Speaker Extraction.. INTERSPEECH . (INTERSPEECH 2024)
T Wu, S He, H Zhang, XL Zhang. ScaleFormer: Transformer-based speech enhancement in the multi-scale time domain. 2023 Asia Pacific Signal and Information Processing Association Annual … . 2023.
J Pan, S He, T Wu, H Zhang, X Zhang. PDPCRN: Parallel Dual-Path CRN with Bi-directional Inter-Branch Interactions for Multi-Channel Speech Enhancement. arXiv preprint arXiv:2309.10379 . (arXiv 2023)
J Pan, S He, H Zhang, X Zhang. Hierarchical Modeling of Spatial Cues via Spherical Harmonics for Multi-Channel Speech Enhancement. arXiv preprint arXiv:2309.10393 . (arXiv 2023)
J Chen, W Rao, Z Wang, J Lin, Y Ju, S He, Y Wang, Z Wu. Mc-spex: Towards effective speaker extraction with multi-scale interfusion and conditional speaker modulation. arXiv preprint arXiv:2306.16250 . (arXiv 2023)
W Liu, Y Shi, J Chen, W Rao, S He, A Li, Y Wang, Z Wu. Gesper: A restoration-enhancement framework for general speech reconstruction. arXiv preprint arXiv:2306.08454 . (arXiv 2023)
J Chen, Y Shi, W Liu, W Rao, S He, A Li, Y Wang, Z Wu, S Shang, .... Gesper: A unified framework for general speech restoration. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2023)
Y Ju, J Chen, S Zhang, S He, W Rao, W Zhu, Y Wang, T Yu, S Shang. Tea-pse 3.0: Tencent-ethereal-audio-lab personalized speech enhancement system for icassp 2023 dns-challenge. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2023)
S He, W Rao, J Liu, J Chen, Y Ju, X Zhang, Y Wang, S Shang. Speech enhancement with intelligent neural homomorphic synthesis. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2023)
K Zhang, S He, H Li, X Zhang. A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement. APSIPA Transactions on Signal and Information Processing 12 (3) . (APSIPA Trans 2023)
T Zhang, S He, H Li, X Zhang. RAT: RNN-Attention Transformer for Speech Enhancement. 2022 13th International Symposium on Chinese Spoken Language Processing … . (ISCSLP 2022)
S He, H Li, X Zhang. Speakerfilter-pro: an improved target speaker extractor combines the time domain and frequency domain. 2022 13th International Symposium on Chinese Spoken Language Processing … . (ISCSLP 2022)
P Shen, S He, X Zhang. ExARN: self-attending RNN for target speaker extraction. arXiv preprint arXiv:2212.01106 . (arXiv 2022)
S He, W Rao, K Zhang, Y Ju, Y Yang, X Zhang, Y Wang, S Shang. Local-global speaker representation for target speaker extraction. arXiv preprint arXiv:2210.15849 . (arXiv 2022)
K Zhang, S Liang, S Nie, S He, J Pan, X Zhang, H Ma, J Yi. A robust deep audio splicing detection method via singularity detection feature. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2022)
J Pan, S Nie, H Zhang, S He, K Zhang, S Liang, X Zhang, J Tao. Speaker recognition-assisted robust audio deepfake detection.. Interspeech, 4202-4206 . (INTERSPEECH 2022)
K Zhang, S He, H Li, X Zhang. DBNet: A dual-branch network architecture processing on spectrum and waveform for single-channel speech enhancement. arXiv preprint arXiv:2105.02436 . (arXiv 2021)
S He, H Li, X Zhang. Speakerfilter: Deep learning-based target speaker extraction using anchor speech. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2020)

Last updated: 2026-03-16

Projects

Personalized speech enhancement for edge devices — Doctoral Innovation Project, Inner Mongolia University (May 2023 – May 2024)

Academic Service

Reviewer: ICASSP, INTERSPEECH, ASRU, WASPAA

Industrial Transfer

Target speaker extraction algorithm deployed in Tencent Meeting (iOS) (2023)

Awards & Services

Graduate Scholarship, IMU (2020, 2021, 2022)
National Scholarship (2020)
INTERSPEECH 2021 DNS-Challenge — 5th & 8th place
ICASSP 2023 DNS-Challenge — 1st place (both tracks)
ICASSP 2023 SSI-Challenge — 1st place (both tracks)
Excellent Student Award, Tencent Rhino-Bird Program (2023)
CAST Young Talent Support Program (Doctoral Student Track) (2025)
Second Prize, Natural Science Award of the Inner Mongolia Autonomous Region (2025)