Publications

(2025). SLASH: Self-Supervised Speech Pitch Estimation Leveraging DSP-derived Absolute Pitch. In Proc. Interspeech 2025.
(2025). Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments. In Proc. Interspeech 2025.
(2025). BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing. In Proc. Interspeech 2025.
(2025). Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control. In Proc. ICASSP2025.
(2024). LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning. In Proc. Interspeech 2024.
(2023). PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions. In Proc. ICASSP2024.
(2023). Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform. In Proc. ICASSP2023.
(2022). 混合Differentiable Digital Signal Processingモデルによる合成パラメータ抽出のためのラウドネスの時間変動に基づくロス関数の設計. 日本音響学会 2022年秋季研究発表会.
(2022). Differentiable Digital Signal Processing Mixture Model for Synthesis Parameter Extraction from Mixture of Harmonic Sounds. In Proc. ICASSP2022.
(2022). 混合Differentiable DSPモデルによる混合楽器音からの合成パラメータ抽出の実験的評価. 日本音響学会 2022年春季研究発表会.