Publications

You can also find my articles on my Google Scholar profile.

P Shen, K Chen, S He, P Chen, S Yuan, H Kong, X Zhang, ZQ Wang. Listen to Extract: Onset-Prompted Target Speaker Extraction. IEEE Transactions on Audio, Speech and Language Processing 33, 4832-4843 . 2025.
T Ling, S He, P Shen, ZQ Wang. MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism. arXiv preprint arXiv:2510.15437 . (arXiv 2025)
S He, ZQ Wang. VM-UNSSOR: Unsupervised Neural Speech Separation Enhanced by Higher-SNR Virtual Microphone Arrays. arXiv preprint arXiv:2510.08914 . (arXiv 2025)
J Sun, S He, R Pang, ZQ Wang. Neural Forward Filtering for Speaker-Image Separation. arXiv preprint arXiv:2510.05757 . (arXiv 2025)
K Li, G Chen, W Sang, Y Luo, Z Chen, S Wang, S He, ZQ Wang, A Li, .... Advances in speech separation: Techniques, challenges, and future trends. arXiv preprint arXiv:2508.10830 . (arXiv 2025)
S He, W Xue, Y Yang, H Zhang, J Pan, X Zhang. Enhancing target speaker extraction with hierarchical speaker representation learning. Neural Networks 188, 107388 . (NN 2025)
F Zhao, S He, X Zhang. Room Impulse Response as a Prompt for Acoustic Echo Cancellation. arXiv preprint arXiv:2505.19480 . (arXiv 2025)
Z Li, S He, X Zhang. Robust target speaker direction of arrival estimation. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2025)
X Liu, S He, X Zhang. HWB-Net: A Novel High-Performance and Efficient Hybrid Waveform Bandwidth Extension Method. Proc. Interspeech 2025, 4088-4092 . (INTERSPEECH 2025)
Z Li, S He, J Bai, X Zhang. TF-SkiMNet: Speech Enhancement Based on Inplace Modeling and Skipping Memory in Time-Frequency Domain. Proc. Interspeech 2025, 5143-5147 . (INTERSPEECH 2025)
P Shen, S He, X Zhang. ExARN: Target Speaker Extraction. Man-Machine Speech Communication: 19th National Conference, NCMMSC 2024 … . 2024.
Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun, J Pan, W Bian, S He, W Xue, .... Flashspeech: Efficient zero-shot speech synthesis. Proceedings of the 32nd ACM International Conference on Multimedia, 6998-7007 . (ACM MM 2024)
P Shen, S He, X Zhang. ExARN: Target Speaker Extraction with Attentive Recurrent Networks. National Conference on Man-Machine Speech Communication, 238-249 . (NCMMSC 2024)
S He, H Zhang, W Rao, K Zhang, Y Ju, Y Yang, X Zhang. Hierarchical speaker representation for target speaker extraction. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2024)
C Zhao, S He, X Zhang. Sicrn: Advancing speech enhancement through state space model and inplace convolution techniques. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2024)
S He, J Liu, H Li, Y Yang, F Chen, X Zhang. 3S-TSE: Efficient three-stage target speaker extraction for real-time and low-resource applications. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2024)
F Zhao, C Zhang, S He, J Liu, X Zhang. Deep Echo Path Modeling for Acoustic Echo Cancellation.. Interspeech . (INTERSPEECH 2024)
T Wu, S He, J Pan, H Huang, Z Mo, X Zhang. Unified Audio Visual Cues for Target Speaker Extraction.. INTERSPEECH . (INTERSPEECH 2024)
T Wu, S He, H Zhang, XL Zhang. ScaleFormer: Transformer-based speech enhancement in the multi-scale time domain. 2023 Asia Pacific Signal and Information Processing Association Annual … . 2023.
J Pan, S He, T Wu, H Zhang, X Zhang. PDPCRN: Parallel Dual-Path CRN with Bi-directional Inter-Branch Interactions for Multi-Channel Speech Enhancement. arXiv preprint arXiv:2309.10379 . (arXiv 2023)
J Pan, S He, H Zhang, X Zhang. Hierarchical Modeling of Spatial Cues via Spherical Harmonics for Multi-Channel Speech Enhancement. arXiv preprint arXiv:2309.10393 . (arXiv 2023)
J Chen, W Rao, Z Wang, J Lin, Y Ju, S He, Y Wang, Z Wu. Mc-spex: Towards effective speaker extraction with multi-scale interfusion and conditional speaker modulation. arXiv preprint arXiv:2306.16250 . (arXiv 2023)
W Liu, Y Shi, J Chen, W Rao, S He, A Li, Y Wang, Z Wu. Gesper: A restoration-enhancement framework for general speech reconstruction. arXiv preprint arXiv:2306.08454 . (arXiv 2023)
J Chen, Y Shi, W Liu, W Rao, S He, A Li, Y Wang, Z Wu, S Shang, .... Gesper: A unified framework for general speech restoration. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2023)
Y Ju, J Chen, S Zhang, S He, W Rao, W Zhu, Y Wang, T Yu, S Shang. Tea-pse 3.0: Tencent-ethereal-audio-lab personalized speech enhancement system for icassp 2023 dns-challenge. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2023)
S He, W Rao, J Liu, J Chen, Y Ju, X Zhang, Y Wang, S Shang. Speech enhancement with intelligent neural homomorphic synthesis. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2023)
K Zhang, S He, H Li, X Zhang. A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement. APSIPA Transactions on Signal and Information Processing 12 (3) . (APSIPA Trans 2023)
T Zhang, S He, H Li, X Zhang. RAT: RNN-Attention Transformer for Speech Enhancement. 2022 13th International Symposium on Chinese Spoken Language Processing … . (ISCSLP 2022)
S He, H Li, X Zhang. Speakerfilter-pro: an improved target speaker extractor combines the time domain and frequency domain. 2022 13th International Symposium on Chinese Spoken Language Processing … . (ISCSLP 2022)
P Shen, S He, X Zhang. ExARN: self-attending RNN for target speaker extraction. arXiv preprint arXiv:2212.01106 . (arXiv 2022)
S He, W Rao, K Zhang, Y Ju, Y Yang, X Zhang, Y Wang, S Shang. Local-global speaker representation for target speaker extraction. arXiv preprint arXiv:2210.15849 . (arXiv 2022)
K Zhang, S Liang, S Nie, S He, J Pan, X Zhang, H Ma, J Yi. A robust deep audio splicing detection method via singularity detection feature. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2022)
J Pan, S Nie, H Zhang, S He, K Zhang, S Liang, X Zhang, J Tao. Speaker recognition-assisted robust audio deepfake detection.. Interspeech, 4202-4206 . (INTERSPEECH 2022)
K Zhang, S He, H Li, X Zhang. DBNet: A dual-branch network architecture processing on spectrum and waveform for single-channel speech enhancement. arXiv preprint arXiv:2105.02436 . (arXiv 2021)
S He, H Li, X Zhang. Speakerfilter: Deep learning-based target speaker extraction using anchor speech. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and … . (ICASSP 2020)

Last updated: 2026-03-16

Shulin He

Publications