Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Srinivas Eswar, Koby Hayashi, Grey Ballard, Ramakrishnan Kannan, Michael A. Matheson, Haesun Park

We consider the problem of low-rank approximation of massive dense nonnegative tensor data, for example, to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input data across the memories of multiple nodes, and performing efficient and scalable parallel algorithms to compute the low-rank approximation. We present a software package called Parallel Low-rank Approximation with Nonnegativity Constraints, which implements our solution and allows for extension in terms of data (dense or sparse, matrices or tensors of any order), algorithm (e.g., from multiplicative updating techniques to alternating direction method of multipliers), and architecture (we exploit GPUs to accelerate the computation in this work). PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Ahmad Abdelfattah, Timothy Costa, Jack Dongarra, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Mawussi Zounon

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms (Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small matrices that are grouped together and processed by a single routine, called a Batched BLAS routine. The matrices are grouped together in uniformly sized groups, with just one group if all the matrices are of equal size. The aim is to provide more efficient, but portable, implementations of algorithms on high-performance many-core platforms. These include multicore and many-core CPU processors, GPUs and coprocessors, and other hardware accelerators with floating-point compute facility. As well as the standard types of single and double precision, we also include half and quadruple precision in the standard. PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Henrik Barthels, Christos Psarras, Paolo Bientinesi

The translation of linear algebra computations into efficient sequences of library calls is a non-trivial task that requires expertise in both linear algebra and high-performance computing. Almost all high-level languages and libraries for matrix computations (e.g., Matlab, Eigen) internally use optimized kernels such as those provided by BLAS and LAPACK; however, their translation algorithms are often too simplistic and thus lead to a suboptimal use of said kernels, resulting in significant performance losses. To combine the productivity offered by high-level languages, and the performance of low-level kernels, we are developing Linnea, a code generator for linear algebra problems. As input, Linnea takes a high-level description of a linear algebra problem; as output, it returns an efficient sequence of calls to high-performance kernels. PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Carmen Campos, Jose E. Roman

SLEPc is a parallel library for the solution of various types of large-scale eigenvalue problems. Over the past few years, we have been developing a module within SLEPc, called NEP, that is intended for solving nonlinear eigenvalue problems. These problems can be defined by means of a matrix-valued function that depends nonlinearly on a single scalar parameter. We do not consider the particular case of polynomial eigenvalue problems (which are implemented in a different module in SLEPc) and focus here on rational eigenvalue problems and other general nonlinear eigenproblems involving square roots or any other nonlinear function. The article discusses how the NEP module has been designed to fit the needs of applications and provides a description of the available solvers, including some implementation details such as parallelization. PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Zdeněk Průůa, Nicki Holighaus, Peter Balazs

Finding the best K-sparse approximation of a signal in a redundant dictionary is an NP-hard problem. Suboptimal greedy matching pursuit algorithms are generally used for this task. In this work, we present an acceleration technique and an implementation of the matching pursuit algorithm acting on a multi-Gabor dictionary, i.e., a concatenation of several Gabor-type time-frequency dictionaries, each of which consists of translations and modulations of a possibly different window and time and frequency shift parameters. The technique is based on pre-computing and thresholding inner products between atoms and on updating the residual directly in the coefficient domain, i.e., without the round-trip to the signal domain. PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Patrick E. Farrell, Matthew G. Knepley, Lawrence Mitchell, Florian Wechsung

Effective relaxation methods are necessary for good multigrid convergence. For many equations, standard Jacobi and Gauß–Seidel are inadequate, and more sophisticated space decompositions are required; examples include problems with semidefinite terms or saddle point structure. In this article, we present a unifying software abstraction, PCPATCH, for the topological construction of space decompositions for multigrid relaxation methods. Space decompositions are specified by collecting topological entities in a mesh (such as all vertices or faces) and applying a construction rule (such as taking all degrees of freedom in the cells around each entity). The software is implemented in PETSc and facilitates the elegant expression of a wide range of schemes merely by varying solver options at runtime. PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

In this article, we propose the Fast Algorithms for Maxwell’s Equations (FAME) package for solving Maxwell’s equations for modeling three-dimensional photonic crystals. FAME combines the null-space free method with fast Fourier transform (FFT)-based matrix-vector multiplications to solve the generalized eigenvalue problems (GEPs) arising from Yee’s discretization. The GEPs are transformed into a null-space free standard eigenvalue problem with a Hermitian positive-definite coefficient matrix. The computation times for FFT-based matrix-vector multiplications with matrices of dimension 7 million are only 0.33 and 3.6 × 10 − 3 seconds using MATLAB with an Intel Xeon CPU and CUDA C++ programming with a single NVIDIA Tesla P100 GPU, respectively. PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Dounia Lakhmiri, Sébastien Le Digabel, Christophe Tribes

The performance of deep neural networks is highly sensitive to the choice of the hyperparameters that define the structure of the network and the learning process. When facing a new application, tuning a deep neural network is a tedious and time-consuming process that is often described as a “dark art.” This explains the necessity of automating the calibration of these hyperparameters. Derivative-free optimization is a field that develops methods designed to optimize time-consuming functions without relying on derivatives. This work introduces the HyperNOMAD package, an extension of the NOMAD software that applies the MADS algorithm [7] to simultaneously tune the hyperparameters responsible for both the architecture and the learning process of a deep neural network (DNN). PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Jure Slak, Gregor Kosec

Medusa, a novel library for implementation of non-particle strong form mesh-free methods, such as GFDM or RBF-FD, is described. We identify and present common parts and patterns among many such methods reported in the literature, such as node positioning, stencil selection, and stencil weight computation. Many different algorithms exist for each part and the possible combinations offer a plethora of possibilities for improvements of solution procedures that are far from fully understood. As a consequence there are still many unanswered questions in the mesh-free community resulting in vivid ongoing research in the field. Medusa implements the core mesh-free elements as independent blocks, which offers users great flexibility in experimenting with the method they are developing, as well as easily comparing it with other existing methods. PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.

Abstract: Pavel Škrabánek, Natália Martínková

Fuzzy regression provides an alternative to statistical regression when the model is indefinite, the relationships between model parameters are vague, the sample size is low, or the data are hierarchically structured. Such cases allow to consider the choice of a regression model based on the fuzzy set theory. In fuzzyreg, we implement fuzzy linear regression methods that differ in the expectations of observational data types, outlier handling, and parameter estimation method. We provide a wrapper function that prepares data for fitting fuzzy linear models with the respective methods from a syntax established in R for fitting regression models. The function fuzzylm thus provides a novel functionality for R through standardized operations with fuzzy numbers. PubDate: Sat, 26 Jun 2021 00:00:00 GMT

Please help us test our new pre-print finding feature by giving the pre-print link a rating. A 5 star rating indicates the linked pre-print has the exact same content as the published article.