Refine Search

New Search

Results: 5

(searched for: doi:10.13176/11.112)
Save to Scifeed
Page of 1
Articles per Page
by
Show export options
  Select all
Published: 25 January 2019
Zeitschrift für Medizinische Physik, Volume 29, pp 86-101; https://doi.org/10.1016/j.zemedi.2018.12.003

Abstract:
This paper tries to give a gentle introduction to deep learning in medical image processing, proceeding from theoretical foundations to applications. We first discuss general reasons for the popularity of deep learning, including several major breakthroughs in computer science. Next, we start reviewing the fundamental basics of the perceptron and neural networks, along with some fundamental theory that is often omitted. Doing so allows us to understand the reasons for the rise of deep learning in many application domains. Obviously medical image processing is one of these areas which has been largely affected by this rapid progress, in particular in image detection and recognition, image segmentation, image registration, and computer-aided diagnosis. There are also recent trends in physical simulation, modeling, and reconstruction that have led to astonishing results. Yet, some of these approaches neglect prior knowledge and hence bear the risk of producing implausible results. These apparent weaknesses highlight current limitations of deep ()learning. However, we also briefly discuss promising approaches that might be able to resolve these problems in the future.
, Sabrina Dorn, Michael Lell, Marc Kachelries, Andreas Maier
Published: 21 February 2018
Informatik aktuell pp 269-274; https://doi.org/10.1007/978-3-662-56537-7_70

The publisher has not yet granted permission to display this abstract.
Junichi Yamagishi, Bela Usabaev, , Oliver Watts, John Dines, , Yong Guan, , Keiichiro Oura, Yi-Jian Wu, et al.
IEEE Transactions on Audio, Speech, and Language Processing, Volume 18, pp 984-1004; https://doi.org/10.1109/tasl.2010.2045237

Abstract:
In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high. On the other hand, our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an “average voice model” plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack phonetic balance. This enables us to consider building high-quality voices on “non-TTS” corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper, we demonstrate the thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal (WSJ0, WSJ1, and WSJCAM0), Resource Management, Globalphone, and SPEECON databases. We also present the results of associated analysis based on perceptual evaluation, and discuss remaining issues.
The Journal of the Acoustical Society of America, Volume 126, pp 2589-2602; https://doi.org/10.1121/1.3216913

Abstract:
Speech of children with cleft lip and palate (CLP) is sometimes still disordered even after adequate surgical and nonsurgical therapies. Such speech shows complex articulation disorders, which are usually assessed perceptually, consuming time and manpower. Hence, there is a need for an easy to apply and reliable automatic method. To create a reference for an automatic system, speech data of 58 children with CLP were assessed perceptually by experienced speech therapists for characteristic phonetic disorders at the phoneme level. The first part of the article aims to detect such characteristics by a semiautomatic procedure and the second to evaluate a fully automatic, thus simple, procedure. The methods are based on a combination of speech processing algorithms. The semiautomatic method achieves moderate to good agreement (kappa approximately 0.6) for the detection of all phonetic disorders. On a speaker level, significant correlations between the perceptual evaluation and the automatic system of 0.89 are obtained. The fully automatic system yields a correlation on the speaker level of 0.81 to the perceptual evaluation. This correlation is in the range of the inter-rater correlation of the listeners. The automatic speech evaluation is able to detect phonetic disorders at an experts'level without any additional human postprocessing.
Page of 1
Articles per Page
by
Show export options
  Select all
Back to Top Top