ResearchHub | Open Science Community

Rajesh Gupta

Author with expertise in Parallel Computing and Performance Optimization

Achievements

Cited Author

Open Access Advocate

Key Stats

Upvotes received:

Publications:

(57% Open Access)

Cited by:

5,299

h-index:

i10-index:

234

Reputation

Biology

< 1%

Chemistry

< 1%

Economics

< 1%

How is this calculated?

Publications

Sparse Signal Recovery With Temporally Correlated Source Vectors Using Sparse Bayesian Learning

Zhilin Zhang et al.Jun 21, 2011

We address the sparse signal recovery problem in the context of multiple measurement vectors (MMV) when elements in each nonzero row of the solution matrix are temporally correlated. Existing algorithms do not consider such temporal correlations and thus their performance degrades significantly with the correlations. In this work, we propose a block sparse Bayesian learning framework which models the temporal correlations. In this framework we derive two sparse Bayesian learning (SBL) algorithms, which have superior recovery performance compared to existing algorithms, especially in the presence of high temporal correlations. Furthermore, our algorithms are better at handling highly underdetermined problems and require less row-sparsity on the solution matrix. We also provide analysis of the global and local minima of their cost function, and show that the SBL cost function has the very desirable property that the global minimum is at the sparsest solution to the MMV problem. Extensive experiments also provide some interesting results that motivate future theoretical research on the MMV model.

Artificial Intelligence

Paleontology

Paper

Artificial Intelligence

797

Save

Hardware-software cosynthesis for digital systems

Rajesh Gupta et al.Sep 1, 1993

As system design grows increasingly complex, the use of predesigned components, such as general-purpose microprocessors can simplify synthesized hardware. While the problems in designing systems that contain processors and application-specific integrated circuit chips are not new, computer-aided synthesis of such heterogeneous or mixed systems poses unique problems. The authors demonstrate the feasibility of synthesizing heterogeneous systems by using timing constraints to delegate tasks between hardware and software so that performance requirements can be met. System functionality is captured using the HardwareC hardware description language. The synthesis of an Ethernet-based network coprocessor is discussed as an example.< >

Software

Computer Networks And Communications

Paper

Software

615

Save

Occupancy-driven energy management for smart building automation

Yuvraj Agarwal et al.Nov 2, 2010

Buildings are among the largest consumers of electricity in the US. A significant portion of this energy use in buildings can be attributed to HVAC systems used to maintain comfort for occupants. In most cases these building HVAC systems run on fixed schedules and do not employ any fine grained control based on detailed occupancy information. In this paper we present the design and implementation of a presence sensor platform that can be used for accurate occupancy detection at the level of individual offices. Our presence sensor is low-cost, wireless, and incrementally deployable within existing buildings. Using a pilot deployment of our system across ten offices over a two week period we identify significant opportunities for energy savings due to periods of vacancy. Our energy measurements show that our presence node has an estimated battery lifetime of over five years, while detecting occupancy accurately. Furthermore, using a building simulation framework and the occupancy information from our testbed, we show potential energy savings from 10% to 15% using our system.

Mechanical Engineering

Automotive Engineering

Paper

Mechanical Engineering

545

Save

Leakage aware dynamic voltage scaling for real-time embedded systems

Ravindra Jejurikar et al.Jun 7, 2004

A five-fold increase in leakage current is predicted with each technology generation. While Dynamic Voltage Scaling (DVS) is known to reduce dynamic power consumption, it also causes increased leakage energy drain by lengthening the interval over which a computation is carried out. Therefore, for minimization of the total energy, one needs to determine an operating point, called the critical speed. We compute processor slowdown factors based on the critical speed for energy minimization. Procrastination scheduling attempts to maximize the duration of idle intervals by keeping the processor in a sleep/shutdown state even if there are pending tasks, within the constraints imposed by performance requirements. Our simulation experiments show that the critical speed slowdown results in up to 5% energy gains over a leakage oblivious dynamic voltage scaling. Procrastination scheduling scheme extends the sleep intervals to up to 5 times, resulting in up to an additional 18% energy gains, while meeting all timing requirements.

Law

Hardware And Architecture

Paper

Law

493

Save

Extension of SBL Algorithms for the Recovery of Block Sparse Signals With Intra-Block Correlation

Zhilin Zhang et al.Jan 18, 2013

We examine the recovery of block sparse signals and extend the framework in two important directions; one by exploiting signals' intra-block correlation and the other by generalizing signals' block structure. We propose two families of algorithms based on the framework of block sparse Bayesian learning (BSBL). One family, directly derived from the BSBL framework, requires knowledge of the block structure. Another family, derived from an expanded BSBL framework, is based on a weaker assumption on the block structure, and can be used when the block structure is completely unknown. Using these algorithms we show that exploiting intra-block correlation is very helpful in improving recovery performance. These algorithms also shed light on how to modify existing algorithms or design new ones to exploit such correlation and improve performance.

Signal Processing

Computational Mechanics

Paper

Signal Processing

491

Save

Hardware/software co-design

G. Michell et al.Mar 1, 1997

Most electronic systems, whether self contained or embedded, have a predominant digital component consisting of a hardware platform which executes software application programs. Hardware/software co-design means meeting system level objectives by exploiting the synergism of hardware and software through their concurrent design. Co-design problems have different flavors according to the application domain, implementation technology and design methodology. Digital hardware design has increasingly more similarities to software design. Hardware circuits are often described using modeling or programming languages, and they are validated and implemented by executing software programs, which are sometimes conceived for the specific hardware design. Current integrated circuits can incorporate one (or more) processor core(s) and memory array(s) on a single substrate. These "systems on silicon" exhibit a sizable amount of embedded software, which provides flexibility for product evolution and differentiation purposes. Thus the design of these systems requires designers to be knowledgeable in both hardware and software domains to make good design tradeoffs. The paper introduces various aspects of co-design. We highlight the commonalities and point out the differences in various co-design problems in some application areas. Co-design issues and their relationship to classical system implementation tasks are discussed to help develop a perspective on modern digital system design that relies on computer aided design (CAD) tools and methods.

Software

Computer Networks And Communications

Paper

Software

401

Save

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Ritchie Zhao et al.Feb 2, 2017

Convolutional neural networks (CNN) are the current stateof-the-art for many computer vision tasks. CNNs outperform older methods in accuracy, but require vast amounts of computation and memory. As a result, existing CNN applications are typically run on clusters of CPUs or GPUs. Studies into the FPGA acceleration of CNN workloads has achieved reductions in power and energy consumption. However, large GPUs outperform modern FPGAs in throughput, and the existence of compatible deep learning frameworks give GPUs a significant advantage in programmability. Recent research in machine learning demonstrates the potential of very low precision CNNs -- i.e., CNNs with binarized weights and activations. Such binarized neural networks (BNNs) appear well suited for FPGA implementation, as their dominant computations are bitwise logic operations and their memory requirements are reduced. A combination of low-precision networks and high-level design methodology may help address the performance and productivity gap between FPGAs and GPUs. In this paper, we present the design of a BNN accelerator that is synthesized from C++ to FPGA-targeted Verilog. The accelerator outperforms existing FPGA-based CNN accelerators in GOPS as well as energy and resource efficiency.

Artificial Intelligence

Electrical And Electronic Engineering

Paper

Artificial Intelligence

395

Save

CoolSpots

Trevor Pering et al.Jun 19, 2006

CoolSpots enable a wireless mobile device to automatically switch between multiple radio interfaces, such as WiFi and Bluetooth, in order to increase battery lifetime. The main contribution of this work is an exploration of the policies that enable a system to switch among these interfaces, each with diverse radio characteristics and different ranges, in order to save power - supported by detailed quantitative measurements. The system and policies do not require any changes to the mobile applications themselves, and changes required to existing infrastructure are minimal. Results are reported for a suite of commonly used applications, such as file transfer, web browsing, and streaming media, across a range of operating conditions. Experimental validation of the CoolSpot system on a mobile research platform shows substantial energy savings: more than a 50% reduction in energy consumption of the wireless subsystem is possible, with an associated increase in the effective battery lifetime.

History

Computer Networks And Communications

Paper

History

392

Save

SPARK: a high-level synthesis framework for applying parallelizing compiler transformations

Sumit Gupta et al.Aug 27, 2003

This paper presents a modular and extensible high-level synthesis research system, called SPARK, that takes a behavioral description in ANSI-C as input and produces synthesizable register-transfer level VHDL. SPARK uses parallelizing compiler technology, developed previously, to enhance instruction-level parallelism and re-instruments it for high-level synthesis by incorporating ideas of mutual exclusivity of operations, resource sharing and hardware cost models. In this paper, we present the design flow through the SPARK system, a set of transformations that include speculative code motions and dynamic transformations and show how these transformations and other optimizing synthesis and compiler techniques are employed by a scheduling heuristic. Experiments are performed on two moderately complex industrial applications, namely MPEG-1 and the GIMP image processing tool. The results show that the various code transformations lead to up to 70 % improvements in performance without any increase in the overall area and critical path of the final synthesized design.

Computer Networks And Communications

Hardware And Architecture

Paper

Computer Networks And Communications

392

Save

Sentinel

Bharathan Balaji et al.Oct 22, 2013

Commercial buildings contribute to 19% of the primary energy consumption in the US, with HVAC systems accounting for 39.6% of this usage. To reduce HVAC energy use, prior studies have proposed using wireless occupancy sensors or even cameras for occupancy based actuation showing energy savings of up to 42%. However, most of these solutions require these sensors and the associated network to be designed, deployed, tested and maintained within existing buildings which is significantly costly.

Mechanical Engineering

Automotive Engineering

Paper

Mechanical Engineering

309

Save