Publications

You can also find my articles on my Google Scholar profile.
  1. T Ling, S He, P Shen, ZQ Wang. MC-LExt: Multi-Channel Target Speaker Extraction with Onset-Prompted Speaker Conditioning Mechanism. arXiv preprint arXiv:2510.15437 . (arXiv 2025)
  2. S He, ZQ Wang. VM-UNSSOR: Unsupervised Neural Speech Separation Enhanced by Higher-SNR Virtual Microphone Arrays. arXiv preprint arXiv:2510.08914 . (arXiv 2025)
  3. J Sun, S He, R Pang, ZQ Wang. Neural Forward Filtering for Speaker-Image Separation. arXiv preprint arXiv:2510.05757 . (arXiv 2025)
  4. K Li, G Chen, W Sang, Y Luo, Z Chen, S Wang, S He, ZQ Wang, A Li, .... Advances in speech separation: Techniques, challenges, and future trends. arXiv preprint arXiv:2508.10830 . (arXiv 2025)
  5. S He, W Xue, Y Yang, H Zhang, J Pan, X Zhang. Enhancing target speaker extraction with Hierarchical Speaker Representation Learning. Neural Networks 188, 107388 . (NN 2025)
  6. F Zhao, S He, X Zhang. Room Impulse Response as a Prompt for Acoustic Echo Cancellation. arXiv preprint arXiv:2505.19480 . (arXiv 2025)
  7. P Shen, K Chen, S He, P Chen, S Yuan, H Kong, X Zhang, ZQ Wang. Listen to Extract: Onset-Prompted Target Speaker Extraction. arXiv preprint arXiv:2505.05114 . (arXiv 2025)
  8. Z Li, S He, X Zhang. Robust Target Speaker Direction of Arrival Estimation. ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2025)
  9. X Liu, S He, X Zhang. HWB-Net: A Novel High-Performance and Efficient Hybrid Waveform Bandwidth Extension Method. Proc. Interspeech 2025, 4088-4092 . (INTERSPEECH 2025)
  10. Z Li, S He, J Bai, X Zhang. TF-SkiMNet: Speech Enhancement Based on Inplace Modeling and Skipping Memory in Time-Frequency Domain. Proc. Interspeech 2025, 5143-5147 . (INTERSPEECH 2025)
  11. P Shen, S He, X Zhang. ExARN: Target Speaker Extraction. Man-Machine Speech Communication: 19th National Conference, NCMMSC 2024, Urumqi, China, August 15–18, Proceedings. 2024.
  12. Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun, J Pan, W Bian, S He, W Xue, .... Flashspeech: Efficient zero-shot speech synthesis. Proceedings of the 32nd ACM International Conference on Multimedia, 6998-7007 . (ACM MM 2024)
  13. P Shen, S He, X Zhang. ExARN: Target Speaker Extraction with Attentive Recurrent Networks. National Conference on Man-Machine Speech Communication, 238-249 . (NCMMSC 2024)
  14. S He, H Zhang, W Rao, K Zhang, Y Ju, Y Yang, X Zhang. Hierarchical speaker representation for target speaker extraction. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2024)
  15. C Zhao, S He, X Zhang. Sicrn: Advancing speech enhancement through state space model and inplace convolution techniques. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2024)
  16. S He, J Liu, H Li, Y Yang, F Chen, X Zhang. 3s-tse: Efficient three-stage target speaker extraction for real-time and low-resource applications. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2024)
  17. F Zhao, C Zhang, S He, J Liu, X Zhang. Deep echo path modeling for acoustic echo cancellation. Interspeech 2024, 612-616 . (INTERSPEECH 2024)
  18. T Wu, S He, J Pan, H Huang, Z Mo, X Zhang. Unified audio visual cues for target speaker extraction. Proc. Interspeech 2024, 4343-4347 . (INTERSPEECH 2024)
  19. T Wu, S He, H Zhang, XL Zhang. ScaleFormer: Transformer-based speech enhancement in the multi-scale time domain. 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). (APSIPA ASC 2023)
  20. J Pan, S He, T Wu, H Zhang, X Zhang. PDPCRN: Parallel Dual-Path CRN with Bi-directional Inter-Branch Interactions for Multi-Channel Speech Enhancement. arXiv preprint arXiv:2309.10379 . (arXiv 2023)
  21. J Pan, S He, H Zhang, X Zhang. Hierarchical Modeling of Spatial Cues via Spherical Harmonics for Multi-Channel Speech Enhancement. arXiv preprint arXiv:2309.10393 . (arXiv 2023)
  22. J Chen, W Rao, Z Wang, J Lin, Y Ju, S He, Y Wang, Z Wu. Mc-spex: Towards effective speaker extraction with multi-scale interfusion and conditional speaker modulation. arXiv preprint arXiv:2306.16250 . (arXiv 2023)
  23. W Liu, Y Shi, J Chen, W Rao, S He, A Li, Y Wang, Z Wu. Gesper: A restoration-enhancement framework for general speech reconstruction. arXiv preprint arXiv:2306.08454 . (arXiv 2023)
  24. J Chen, Y Shi, W Liu, W Rao, S He, A Li, Y Wang, Z Wu, S Shang, .... Gesper: A unified framework for general speech restoration. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2023)
  25. Y Ju, J Chen, S Zhang, S He, W Rao, W Zhu, Y Wang, T Yu, S Shang. Tea-pse 3.0: Tencent-ethereal-audio-lab personalized speech enhancement system for icassp 2023 dns-challenge. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2023)
  26. S He, W Rao, J Liu, J Chen, Y Ju, X Zhang, Y Wang, S Shang. Speech enhancement with intelligent neural homomorphic synthesis. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2023)
  27. K Zhang, S He, H Li, X Zhang. A Dual-branch Convolutional Network Architecture Processing on both Frequency and Time Domain for Single-channel Speech Enhancement. APSIPA Transactions on Signal and Information Processing 12 (3) . (APSIPA Trans 2023)
  28. T Zhang, S He, H Li, X Zhang. RAT: RNN-Attention Transformer for Speech Enhancement. 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP). (ISCSLP 2022)
  29. S He, H Li, X Zhang. Speakerfilter-pro: an improved target speaker extractor combines the time domain and frequency domain. 2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP). (ISCSLP 2022)
  30. P Shen, S He, X Zhang. ExARN: self-attending RNN for target speaker extraction. arXiv preprint arXiv:2212.01106 . (arXiv 2022)
  31. S He, W Rao, K Zhang, Y Ju, Y Yang, X Zhang, Y Wang, S Shang. Local-global speaker representation for target speaker extraction. arXiv preprint arXiv:2210.15849 . (arXiv 2022)
  32. K Zhang, S Liang, S Nie, S He, J Pan, X Zhang, H Ma, J Yi. A robust deep audio splicing detection method via singularity detection feature. ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2022)
  33. J Pan, S Nie, H Zhang, S He, K Zhang, S Liang, X Zhang, J Tao. Speaker recognition-assisted robust audio deepfake detection.. Interspeech, 4202-4206 . (INTERSPEECH 2022)
  34. K Zhang, S He, H Li, X Zhang. DBNet: A dual-branch network architecture processing on spectrum and waveform for single-channel speech enhancement. arXiv preprint arXiv:2105.02436 . (arXiv 2021)
  35. S He, H Li, X Zhang. Speakerfilter: Deep learning-based target speaker extraction using anchor speech. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (ICASSP 2020)
Last updated: 2026-01-26