Shulin He is currently a Postdoctoral Researcher with the SUSTech Audio Intelligence Lab (SAIL), Department of Computer Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China, working with Dr. Zhongqiu Wang.

He received the B.Eng. degree in Computer Science from Inner Mongolia University (IMU), Hohhot, China, in 2019 and has been enrolled in a combined master-to-Ph.D. program at IMU since the same year, supervised by Prof. Xueliang Zhang in the IMUSpeech group.

From June 2021 to September 2021, he was a visiting researcher at the Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. From May 2022 to May 2023, he joined the Tencent Rhino-Bird Elite Talent Program as a joint Ph.D. trainee and received its Excellent Student Award. From October 2023 to April 2024, he was a visiting Ph.D. student in the Division of Emerging Interdisciplinary Areas, Hong Kong University of Science and Technology. He is also a recipient of the inaugural CAST Young Talent Support Program (Doctoral Student Track).

His research focuses on target speaker extraction, speech enhancement, and deep learning for speech and audio. His work has been published in top-tier journals and conferences such as Neural Networks, ICASSP, INTERSPEECH, and APSIPA ASC.


Education

Ph.D. Candidate, School of Computer Science, Inner Mongolia University
  • Sep 2021 – Jun 2025
  • Supervised by Prof. Xueliang Zhang

Visiting Ph.D. Student, Division of Emerging Interdisciplinary Areas (EMIA), Hong Kong University of Science & Technology
  • Oct 2023 – Apr 2024
  • Worked with Prof. Wei Xue on open-domain sound extraction

Joint Ph.D. Trainee, Tencent Rhino-Bird Elite Talent Program
  • May 2022 – May 2023
  • Recipient of the Excellent Student Award

Visiting Researcher, Key Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
  • Jun 2021 – Sep 2021
  • Worked with Prof. Wenju Liu; co-advisors Nie Shuai and Liang Shan on speech anti-spoofing research

M.Eng. (fast-track), School of Computer Science, Inner Mongolia University
  • Sep 2019 – Jun 2021
  • Supervised by Prof. Xueliang Zhang

B.Eng., Inner Mongolia University
  • Sep 2015 – Jul 2019

Research Interests

  • Target speaker extraction
  • Speech enhancement
  • Deep learning for speech & audio

Publications

  1. S. He, W. Xue, Y. Yang, H. Zhang, J. Pan, X. Zhang. Enhancing Target Speaker Extraction with Hierarchical Speaker Representation Learning. Neural Networks, 188, 107388, 2025. (NN 2025)
  2. Z. Li, S. He, X. Zhang. Robust Target Speaker Direction of Arrival Estimation. IEEE International Conference on Acoustics, Speech and Signal Processing. 2025. (ICASSP 2025)
  3. S. He, J. Liu, H. Li, Y. Yang, F. Chen, X. Zhang. 3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications. IEEE International Conference on Acoustics, Speech and Signal Processing. 2024: 421-425. (ICASSP 2024)
  4. S. He, H. Zhang, W. Rao, K. Zhang, Y. Ju, Y. Yang, X. Zhang. Hierarchical Speaker Representation for Target Speaker Extraction. IEEE International Conference on Acoustics, Speech and Signal Processing. 2024: 10361-10365. (ICASSP 2024)
  5. C. Zhao, S. He, X. Zhang. SiCRN: Advancing Speech Enhancement Through State-Space Model and In-Place Convolution Techniques. IEEE International Conference on Acoustics, Speech and Signal Processing. 2024: 10506-10510. (ICASSP 2024)
  6. P. Shen, S. He, X. Zhang. ExARN: Target Speaker Extraction with Attentive Recurrent Networks. National Conference on Man-Machine Speech Communication. 2024: 238-249. (NCMMSC 2024)
  7. Z. Ye, Z. Ju, H. Liu, X. Tan, J. Chen, Y. Lu, P. Sun, J. Pan, W. Bian, S. He, W. Xue, et al. FlashSpeech: Efficient Zero-Shot Speech Synthesis. Proceedings of the 32nd ACM International Conference on Multimedia. 2024: 6998-7007. (ACM MM 2024)
  8. F. Zhao, C. Zhang, S. He, J. Liu, X. Zhang. Deep Echo Path Modeling for Acoustic Echo Cancellation. Interspeech. 2024: 612-616. (INTERSPEECH 2024)
  9. T. Wu, S. He, J. Pan, H. Huang, Z. Mo, X. Zhang. Unified Audio-Visual Cues for Target Speaker Extraction. Interspeech. 2024: 4343-4347. (INTERSPEECH 2024)
  10. T. Wu, S. He, H. Zhang, X. Zhang. ScaleFormer: Transformer-Based Speech Enhancement in the Multi-Scale Time Domain. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. 2023: 2448-2453. (APSIPA ASC 2023)
  11. J. Chen, W. Rao, Z. Wang, J. Lin, Y. Ju, S. He, Y. Wang, Z. Wu. MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation. Interspeech. 2023: 4034-4038. (INTERSPEECH 2023)
  12. W. Liu, Y. Shi, J. Chen, W. Rao, S. He, A. Li, Y. Wang, Z. Wu. GESPER: A Restoration-Enhancement Framework for General Speech Reconstruction. Interspeech. 2023: 4044-4048. (INTERSPEECH 2023)
  13. J. Chen, Y. Shi, W. Liu, W. Rao, S. He, A. Li, Y. Wang, Z. Wu, S. Shang, C. Zheng. GESPER: A Unified Framework for General Speech Restoration. IEEE International Conference on Acoustics, Speech and Signal Processing. 2023: 1-2. (ICASSP 2023)
  14. Y. Ju, J. Chen, S. Zhang, S. He, W. Rao, W. Zhu, Y. Wang, T. Yu, S. Shang. Tea-PSE 3.0: Tencent-Ethereal-Audio-Lab Personalized Speech Enhancement System for ICASSP 2023 DNS-Challenge. IEEE International Conference on Acoustics, Speech and Signal Processing. 2023: 1-2. (ICASSP 2023)
  15. S. He, W. Rao, J. Liu, J. Chen, Y. Ju, X. Zhang, Y. Wang, S. Shang. Speech Enhancement with Intelligent Neural Homomorphic Synthesis. IEEE International Conference on Acoustics, Speech and Signal Processing. 2023: 1-5. (ICASSP 2023)
  16. K. Zhang, S. He, H. Li, X. Zhang. A Dual-Branch Convolutional Network Architecture Processing on Both Frequency and Time Domain for Single-Channel Speech Enhancement. APSIPA Transactions on Signal and Information Processing, 12(3), 2023. (APSIPA Trans 2023)
  17. T. Zhang, S. He, H. Li, X. Zhang. RAT: RNN-Attention Transformer for Speech Enhancement. International Symposium on Chinese Spoken Language Processing. 2022: 463-467. (ISCSLP 2022)
  18. S. He, H. Li, X. Zhang. SpeakerFilter-Pro: An Improved Target Speaker Extractor Combines the Time Domain and Frequency Domain. International Symposium on Chinese Spoken Language Processing. 2022: 473-477. (ISCSLP 2022)
  19. J. Pan, S. Nie, H. Zhang, S. He, K. Zhang, S. Liang, X. Zhang, J. Tao. Speaker Recognition-Assisted Robust Audio Deepfake Detection. Interspeech. 2022: 4202-4206. (INTERSPEECH 2022)
  20. K. Zhang, S. Liang, S. Nie, S. He, J. Pan, X. Zhang, H. Ma, J. Yi. A Robust Deep Audio Splicing Detection Method via Singularity Detection Feature. IEEE International Conference on Acoustics, Speech and Signal Processing. 2022: 2919-2923. (ICASSP 2022)
  21. K. Zhang, S. He, H. Li, X. Zhang. DBNet: A Dual-Branch Network Architecture Processing on Spectrum and Waveform for Single-Channel Speech Enhancement. Interspeech. 2021: 2821-2825. (INTERSPEECH 2021)
  22. S. He, H. Li, X. Zhang. SpeakerFilter: Deep Learning-Based Target Speaker Extraction Using Anchor Speech. IEEE International Conference on Acoustics, Speech and Signal Processing. 2020: 376-380. (ICASSP 2020)

Projects

  • Personalized speech enhancement for edge devices — Doctoral Innovation Project, Inner Mongolia University (May 2023 – May 2024)

Academic Service

  • Reviewer: ICASSP, INTERSPEECH, ASRU, WASPAA

Industrial Transfer

  • Target speaker extraction algorithm deployed in Tencent Meeting (iOS) (2023)

Awards & Services

  • Graduate Scholarship, IMU (2020, 2021, 2022)
  • National Scholarship (2020)
  • INTERSPEECH 2021 DNS-Challenge — 5th & 8th place
  • ICASSP 2023 DNS-Challenge — 1st place (both tracks)
  • ICASSP 2023 SSI-Challenge — 1st place (both tracks)
  • Excellent Student Award, Tencent Rhino-Bird Program (2023)
  • CAST Young Talent Support Program (Doctoral Student Track) (2025)
  • Second Prize, Natural Science Award of the Inner Mongolia Autonomous Region (2025)