×
Well done. You've clicked the tower. This would actually achieve something if you had logged in first. Use the key for that. The name takes you home. This is where all the applicables sit. And you can't apply any changes to my site unless you are logged in.

Our policy is best summarized as "we don't care about _you_, we care about _them_", no emails, so no forgetting your password. You have no rights. It's like you don't even exist. If you publish material, I reserve the right to remove it, or use it myself.

Don't impersonate. Don't name someone involuntarily. You can lose everything if you cross the line, and no, I won't cancel your automatic payments first, so you'll have to do it the hard way. See how serious this sounds? That's how serious you're meant to take these.

×
Register


Required. 150 characters or fewer. Letters, digits and @/./+/-/_ only.
  • Your password can’t be too similar to your other personal information.
  • Your password must contain at least 8 characters.
  • Your password can’t be a commonly used password.
  • Your password can’t be entirely numeric.

Enter the same password as before, for verification.
Login

Grow A Dic
Define A Word
Make Space
Set Task
Mark Post
Apply Votestyle
Create Votes
(From: saved spaces)
Exclude Votes
Apply Dic
Exclude Dic

Click here to flash read.

This paper presents a novel neural vocoder named APNet which reconstructs
speech waveforms from acoustic features by predicting amplitude and phase
spectra directly. The APNet vocoder is composed of an amplitude spectrum
predictor (ASP) and a phase spectrum predictor (PSP). The ASP is a residual
convolution network which predicts frame-level log amplitude spectra from
acoustic features. The PSP also adopts a residual convolution network using
acoustic features as input, then passes the output of this network through two
parallel linear convolution layers respectively, and finally integrates into a
phase calculation formula to estimate frame-level phase spectra. Finally, the
outputs of ASP and PSP are combined to reconstruct speech waveforms by inverse
short-time Fourier transform (ISTFT). All operations of the ASP and PSP are
performed at the frame level. We train the ASP and PSP jointly and define
multilevel loss functions based on amplitude mean square error, phase
anti-wrapping error, short-time spectral inconsistency error and time domain
reconstruction error. Experimental results show that our proposed APNet vocoder
achieves an approximately 8x faster inference speed than HiFi-GAN v1 on a CPU
due to the all-frame-level operations, while its synthesized speech quality is
comparable to HiFi-GAN v1. The synthesized speech quality of the APNet vocoder
is also better than that of several equally efficient models. Ablation
experiments also confirm that the proposed parallel phase estimation
architecture is essential to phase modeling and the proposed loss functions are
helpful for improving the synthesized speech quality.

Click here to read this post out
ID: 130105; Unique Viewers: 0
Voters: 0
Latest Change: May 16, 2023, 7:32 a.m. Changes:
Dictionaries:
Words:
Spaces:
Comments:
Newcom
<0:100>