ResearchHub | Open Science Community

LX

Lei Xie

Author with expertise in Speech Enhancement Techniques

Achievements

Cited Author

Key Stats

Upvotes received:

0

Publications:

16

(25% Open Access)

Cited by:

445

h-index:

37

/

i10-index:

158

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

Show more

How is this calculated?

Publications

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement

Yanxin Hu et al.Oct 25, 2020

Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution neural network (CNN) or recurrent neural network (RNN). Some recent studies use complex-valued spectrogram as a training target but train in a real-valued network, predicting the magnitude and phase component or real and imaginary part, respectively. Particularly, convolution recurrent network (CRN) integrates a convolutional encoder-decoder (CED) structure and long short-term memory (LSTM), which has been proven to be helpful for complex targets. In order to train the complex target more effectively, in this paper, we design a new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN structures can handle complex-valued operation. The proposed DCCRN models are very competitive over other previous networks, either on objective or subjective metric. With only 3.7M parameters, our DCCRN models submitted to the Interspeech 2020 Deep Noise Suppression (DNS) challenge ranked first for the real-time-track and second for the non-real-time track in terms of Mean Opinion Score (MOS).

Artificial Intelligence

Signal Processing

0

Paper

Artificial Intelligence

Save

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

Bingshen Mu et al.Jan 1, 2024

Artificial Intelligence

Signal Processing

0

Paper

Artificial Intelligence

Signal Processing

Save

Freeze start of proton exchange membrane fuel cell systems with closed-loop purging and improved voltage consistency

Haisong Xu et al.Nov 1, 2024

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

Lesen: Label-Efficient Deep Learning for Multi-Parametric Mri-Based Visual Pathway Segmentation

Alou Diakite et al.May 27, 2024

Artificial Intelligence

Radiology, Nuclear Medicine And Imaging

0

Paper

Artificial Intelligence

Radiology, Nuclear Medicine And Imaging

Save

Text-aware and Context-aware Expressive Audiobook Speech Synthesis

Dake Guo et al.Sep 1, 2024

Recent advances in text-to-speech have significantly improved the expressiveness of synthetic speech.However, a major challenge remains in generating speech that captures the diverse styles exhibited by professional narrators in audiobooks,without relying on manual labele or reference speech. To address this, we propose a text-aware and context-aware(TACA)style modeling approach for expressive audiobook speech synthesis. We first establish a text-aware style space to cover diverse styles via contrastive learning with the supervision of the speech-style space. Meanwhile, we adopt a context encoder to incorporate cross-sentence information and the style embedding obtained from text. Finally, we introduce the context encoder to two typical TTS models, including VITS-based TTS and language model-based TTS. Experimental results show that our proposed approach can effectively capture diverse styles and coherent prosody,and thus improve naturalness and expressiveness in audiobook speech synthesis

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

Whisper-SV: Adapting Whisper for low-data-resource speaker verification

Li Zhang et al.Jul 1, 2024

Artificial Intelligence

Signal Processing

0

Paper

Artificial Intelligence

Signal Processing

Save

An Audio-Quality-Based Multi-Strategy Approach For Target Speaker Extraction in the Misp 2023 Challenge

Runduo Han et al.Apr 14, 2024

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

Bs-Plcnet: Band-Split Packet Loss Concealment Network with Multi-Task Learning Framework and Multi-Discriminators

Zihan Zhang et al.Apr 14, 2024

Artificial Intelligence

Electrical And Electronic Engineering

0

Paper

Artificial Intelligence

Electrical And Electronic Engineering

Save

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

Zhichao Wang et al.Jan 1, 2024

Zero-shot voice conversion (VC) converts source speech into the voice of any desired speaker using only one utterance of the speaker without requiring additional model updates. Typical methods use a speaker representation from a pre-trained speaker verification (SV) model or learn speaker representation during VC training to achieve zero-shot VC. However, existing speaker modeling methods overlook the variation of speaker information richness in temporal and frequency channel dimensions of speech. This insufficient speaker modeling hampers the ability of the VC model to accurately represent unseen speakers who are not in the training dataset. In this study, we present a robust zero-shot VC model with m ulti-level t emporal- c hannel r etrieval , referred to as MTCR-VC. Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called t emporal- c hannel r etrieval (TCR) , to find out when and where speaker information appears in speech. It retrieves variable-length speaker representation from both temporal and channel dimensions under the guidance of a pre-trained SV model. Besides, inspired by the hierarchical process of human speech production, the MTCR speaker module stacks several TCR blocks to extract speaker representations from multi-granularity levels. Furthermore, we introduce a cycle-based training strategy to simulate zero-shot inference recurrently to achieve better speech disentanglement and reconstruction. To drive this process, we adopt perceptual constraints on three aspects: content, style, and speaker. Experiments demonstrate that MTCR-VC is superior to the previous zero-shot VC methods in modeling speaker timbre while maintaining good speech naturalness.

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

He Wang et al.Jul 15, 2024

Artificial Intelligence

0

Paper

Artificial Intelligence

Save

Load More