Baseline Comparison
Audio Quality Comparison
Comparison of our method with state-of-the-art streaming anonymization approaches. Samples are used from Dataset (CMU-ARCTIC corpus): http://www.festvox.org/cmu_arctic/ to compare with DarkStream.
| Speaker | Transcript | Original | DarkStream (Wav+CL+KM) |
Stream-Voice-Anon (Ours) |
|---|---|---|---|---|
| BDL | "For the twentieth time that evening the two men shook hands." |
|
|
|
| CLB | "God bless 'em I hope I will go on seeing them forever." |
|
|
|
| RMS | "He turned sharply and faced Gregson across the table." |
|
|
|
| SLT | "Gregson shoved back his chair and rose his feet." |
|
|
|
Dynamic Delay Control
Audio Comparison
Our system enables dynamic delay control for adjustable latency-quality trade-offs at inference time without retraining. We show examples with different delays: d=0 (90ms), d=1 (130ms), d=2 (180ms), and d=8 (440ms).
| Speaker | Transcript | Original | delay=0 (90ms) |
delay=1 (130ms) |
delay=2 (180ms) |
delay=8 (440ms) |
|---|---|---|---|---|---|---|
| BDL | "Not at this particular case Tom apologized Whittemore." |
|
|
|
|
|
Citation
If you find this work useful, please cite:
@misc{kuzmin2026streamvoiceanonenhancingutilityrealtime,
title={Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models},
author={Nikita Kuzmin and Songting Liu and Kong Aik Lee and Eng Siong Chng},
year={2026},
eprint={2601.13948},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2601.13948},
}
Links & Contact
For questions or collaboration inquiries, please open an issue on GitHub or contact: s220028@e.ntu.edu.sg
lius0114@e.ntu.edu.sg