IEICE Transactions on Information and Systems

Journal Information
ISSN / EISSN : 0916-8532 / 1745-1361
Total articles ≅ 6,047
Current Coverage
Archived in

Latest articles in this journal

Ran Li, Huibiao Zhu, Jiaqi Yin
IEICE Transactions on Information and Systems, pp 2154-2163;

Ceph is an object-based parallel distributed file system that provides excellent performance, reliability, and scalability. Additionally, Ceph provides its Cephx authentication system to authenticate users, so that it can identify users and realize authentication. In this paper, we first model the basic architecture of Ceph using process algebra CSP (Communicating Sequential Processes). With the help of the model checker PAT (Process Analysis Toolkit), we feed the constructed model to PAT and then verify several related properties, including Deadlock Freedom, Data Reachability, Data Write Integrity, Data Consistency and Authentication. The verification results show that the original model cannot cater to the Authentication property. Therefore, we formalize a new model of Ceph where Cephx is adopted. In the light of the new verification results, it can be found that Cephx satisfies all these properties.
Miho Yamakura, Ryousei Takano, Akram BEN Ahmed, Midori Sugaya, Hideharu Amano
IEICE Transactions on Information and Systems, pp 2078-2088;

FPGA (Field Programmable Gate Array) based accelerators are attracting significant interest in cloud computing systems. Combining multi-FPGA systems with cloud computing brings a new perspective to the reconfigurable computing research. However, the multi-tenancy of a multi-FPGA system has not been fully discussed in the previous researches. In this paper, we propose a multi-tenant resource management system, named FiC-RM, for a multi-FPGA cloud system. FiC-RM provides users with a set of FPGA resources according to their requirements and allows them to exclusively access FPGA boards and the interconnection network. To achieve this, we propose a placement algorithm which is a key to efficiently share the limited resources. We demonstrate FiC-RM controls a practical scale multi-FPGA system. Moreover, Our simulation study shows that our placement algorithm achieved 3 to 4% improvement in the average resource usage and a 20-second reduction in the response time, compared to other existing naive algorithms.
Tomoya Itsubo, Michihiro Koibuchi, Hideharu Amano, Hiroki Matsutani
IEICE Transactions on Information and Systems, pp 2057-2067;

Since deep learning workloads perform a large number of matrix operations on training data, GPUs (Graphics Processing Units) are efficient especially for the training phase. A cluster of computers each of which equips multiple GPUs can significantly accelerate the deep learning workloads. More specifically, a back-propagation algorithm following a gradient descent approach is used for the training. Although the gradient computation is still a major bottleneck of the training, gradient aggregation and optimization impose both communication and computation overheads, which should also be reduced for further shortening the training time. To address this issue, in this paper, multiple GPUs are interconnected with a PCI Express (PCIe) over 10Gbit Ethernet (10GbE) technology. Since these remote GPUs are interconnected with network switches, gradient aggregation and optimizers (e.g., SGD, AdaGrad, Adam, and SMORMS3) are offloaded to FPGA-based 10GbE switches between remote GPUs; thus, the gradient aggregation and parameter optimization are completed in the network. The proposed FPGA-based 10GbE switches with the four optimizers are implemented on NetFPGA-SUME board. Their resource utilizations are increased by PEs for the optimizers, and they consume up to 56% of the resources. Evaluation results using four remote GPUs connected via the proposed FPGA-based switch demonstrate that these optimizers are accelerated by up to 3.0x and 1.25x compared to CPU and GPU implementations, respectively. Also, the gradient aggregation throughput by the FPGA-based switch achieves up to 98.3% of the 10GbE line rate.
Hiroki Okada, Masato Yoshimi, Celimuge Wu, Tsutomu Yoshinaga
IEICE Transactions on Information and Systems, pp 2121-2130;

In this study, we propose a mechanism called adaptive failsoft control to address peak traffic in mobile live streaming, using a chasing playback function. Although a cache system is avaliable to support the chasing playback function for live streaming in a base station and device-to-device communication, the request concentration by highlight scenes influences the traffic load owing to data unavailability. To avoid data unavailability, we adapted two live streaming features: (1) streaming data while switching the video quality, and (2) time variability of the number of requests. The second feature enables a fallback mechanism for the cache system by prioritizing cache eviction and terminating the transfer of cache-missed requests. This paper discusses the simulation results of the proposed mechanism, which adopts a request model appropriate for (a) avoiding peak traffic and (b) maintaining continuity of service.
Ryosuke Kuramochi, Hiroki Nakahara
IEICE Transactions on Information and Systems, pp 2068-2077;

Convolutional neural networks (CNNs) are widely used for image processing tasks in both embedded systems and data centers. In data centers, high accuracy and low latency are desired for various tasks such as image processing of streaming videos. We propose an FPGA-based low-latency CNN inference for randomly wired convolutional neural networks (RWCNNs), whose layer structures are based on random graph models. Because RWCNNs have several convolution layers that have no direct dependencies between them, our architecture can process them efficiently using a pipeline method. At each layer, we need to use the calculation results of multiple layers as the input. We use an FPGA with HBM2 to enable parallel access to the input data with multiple HBM2 channels. We schedule the order of execution of the layers to improve the pipeline efficiency. We build a conflict graph using the scheduling results. Then, we allocate the calculation results of each layer to the HBM2 channels by coloring the graph. Because the pipeline execution needs to be properly controlled, we developed an automatic generation tool for hardware functions. We implemented the proposed architecture on the Alveo U50 FPGA. We investigated a trade-off between latency and recognition accuracy for the ImageNet classification task by comparing the inference performances for different input image sizes. We compared our accelerator with a conventional accelerator for ResNet-50. The results show that our accelerator reduces the latency by 2.21 times. We also obtained 12.6 and 4.93 times better efficiency than CPU and GPU, respectively. Thus, our accelerator for RWCNNs is suitable for low-latency inference.
Hongcui Wang, Pierre Roussel, Bruce Denby
IEICE Transactions on Information and Systems, pp 2209-2217;

A Silent Speech Interface (SSI) is a sensor-based, Artificial Intelligence (AI) enabled system in which articulation is performed without the use of the vocal chords, resulting in a voice interface that conserves the ambient audio environment, protects private data, and also functions in noisy environments. Though portable SSIs based on ultrasound imaging of the tongue have obtained Word Error Rates rivaling that of acoustic speech recognition, SSIs remain relegated to the laboratory due to stability issues. Indeed, reliable extraction of acoustic features from ultrasound tongue images in real-life situations has proven elusive. Recently, Representation Learning has shown considerable success in learning underlying structure in noisy, high-dimensional raw data. In its unsupervised form, Representation Learning is able to reveal structure in unlabeled data, thus greatly simplifying the data preparation task. In the present article, a 3D Convolutional Neural Network architecture is applied to unlabeled ultrasound images, and is shown to reliably predict future tongue configurations. By comparing the 3DCNN to a simple previous-frame predictor, it is possible to recognize tongue trajectories comprising transitions between regions of stability that correlate with formant trajectories in a spectrogram of the signal. Prospects for using the underlying structural representation to provide features for subsequent speech processing tasks are presented.
Ryoma Senda, Yoshiaki Takata, Hiroyuki Seki
IEICE Transactions on Information and Systems, pp 2131-2144;

A pushdown system (PDS) is known as an abstract model of recursive programs. For PDS, model checking methods have been studied and applied to various software verification such as interprocedural data flow analysis and malware detection. However, PDS cannot manipulate data values from an infinite domain. A register PDS (RPDS) is an extension of PDS by adding registers to deal with data values in a restricted way. This paper proposes algorithms for LTL model checking problems for RPDS with simple and regular valuations, which are labelings of atomic propositions to configurations with reasonable restriction. First, we introduce RPDS and related models, and then define the LTL model checking problems for RPDS. Second, we give algorithms for solving these problems and also show that the problems are EXPTIME-complete. As practical examples, we show solutions of a malware detection and an XML schema checking in the proposed framework.
Zifen He, Shouye Zhu, Ying Huang, Yinhui Zhang
IEICE Transactions on Information and Systems, pp 2237-2243;

This paper presents a novel method for weakly supervised semantic segmentation of 3D point clouds using a novel graph and edge convolutional neural network (GECNN) towards 1% and 10% point cloud with labels. Our general framework facilitates semantic segmentation by encoding both global and local scale features via a parallel graph and edge aggregation scheme. More specifically, global scale graph structure cues of point clouds are captured by a graph convolutional neural network, which is propagated from pairwise affinity representation over the whole graph established in a d-dimensional feature embedding space. We integrate local scale features derived from a dynamic edge feature aggregation convolutional neural networks that allows us to fusion both global and local cues of 3D point clouds. The proposed GECNN model is trained by using a comprehensive objective which consists of incomplete, inexact, self-supervision and smoothness constraints based on partially labeled points. The proposed approach enforces global and local consistency constraints directly on the objective losses. It inherently handles the challenges of segmenting sparse 3D point clouds with limited annotations in a large scale point cloud space. Our experiments on the ShapeNet and S3DIS benchmarks demonstrate the effectiveness of the proposed approach for efficient (within 20 epochs) learning of large scale point cloud semantics despite very limited labels.
Wenyi Ge, Yi Lin, Zhitao Wang, Guigui Wang, Shihan Tan
IEICE Transactions on Information and Systems, pp 2218-2225;

In this paper, we present a simple yet powerful deep neural network for natural image dehazing. The proposed method is designed based on U-Net architecture and we made some design changes to make it better. We first use Group Normalization to replace Batch Normalization to solve the problem of insufficient batch size due to hardware limitations. Second, we introduce FReLU activation into the U-Net block, which can achieve capturing complicated visual layouts with regular convolutions. Experimental results on public benchmarks demonstrate the effectiveness of the modified components. On the SOTS Indoor and Outdoor datasets, it obtains PSNR of 32.23 and 31.64 respectively, which are comparable performances with state-of-the-art methods. The code is publicly available online soon.
Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
IEICE Transactions on Information and Systems, pp 2195-2208;

Real-time machine speech translation systems mimic human interpreters and translate incoming speech from a source language to the target language in real-time. Such systems can be achieved by performing low-latency processing in ASR (automatic speech recognition) module before passing the output to MT (machine translation) and TTS (text-to-speech synthesis) modules. Although several studies recently proposed sequence mechanisms for neural incremental ASR (ISR), these frameworks have a more complicated training mechanism than the standard attention-based ASR because they have to decide the incremental step and learn the alignment between speech and text. In this paper, we propose attention-transfer ISR (AT-ISR) that learns the knowledge from attention-based non-incremental ASR for a low delay end-to-end speech recognition. ISR comes with a trade-off between delay and performance, so we investigate how to reduce AT-ISR delay without a significant performance drop. Our experiment shows that AT-ISR achieves a comparable performance to the non-incremental ASR when the incremental recognition begins after the speech utterance reaches 25% of the complete utterance length. Additional experiments to investigate the effect of ISR on translation tasks are also performed. The focus is to find the optimum granularity of the output unit. The results reveal that our end-to-end subword-level ISR resulted in the best translation quality with the lowest WER and the lowest uncovered-word rate.
Back to Top Top