ResearchHub | Open Science Community

Larrabee

Larry Seiler et al.Aug 1, 2008

This paper presents a many-core visual computing architecture code named Larrabee, a new software rendering pipeline, a manycore programming model, and performance analysis for several applications. Larrabee uses multiple in-order x86 CPU cores that are augmented by a wide vector processor unit, as well as some fixed function logic blocks. This provides dramatically higher performance per watt and per unit of area than out-of-order CPUs on highly parallel workloads. It also greatly increases the flexibility and programmability of the architecture as compared to standard GPUs. A coherent on-die 2 nd level cache allows efficient inter-processor communication and high-bandwidth local data access by CPU cores. Task scheduling is performed entirely with software in Larrabee, rather than in fixed function logic. The customizable software graphics rendering pipeline for this architecture uses binning in order to reduce required memory bandwidth, minimize lock contention, and increase opportunities for parallelism relative to standard GPUs. The Larrabee native programming model supports a variety of highly parallel applications that use irregular data structures. Performance analysis on those applications demonstrates Larrabee's potential for a broad range of parallel computation.

Software

Computer Networks And Communications

0

Paper

Save

Mathematical Properties of the Banzhaf Power Index

Pradeep Dubey et al.May 1, 1979

The Banzhaf index of power in a voting situation depends on the number of ways in which each voter can effect a “swing” in the outcome. It is comparable—but not actually equivalent—to the better-known Shapley-Shubik index, which depends on the number of alignments or “orders of support” in which each voter is pivotal. This paper investigates some properties of the Banzhaf index, the main topics being its derivation from axioms and its behavior in weighted-voting models when the number of small voters tends to infinity. These matters have previously been studied from the Shapley-Shubik viewpoint, but the present work reveals some striking differences between the two indices. The paper also attempts to promote better communication and less duplication of mathematical effort by identifying and describing several other theories, formally equivalent to Banzhaf’s, that are found in fields ranging from sociology to electrical engineering. An extensive bibliography is provided.

Law

Mechanical Engineering

0

Paper

Save

Debunking the 100X GPU vs. CPU myth

Victor Lee et al.Jun 19, 2010

Recent advances in computing have led to an explosion in the amount of data being generated. Processing the ever-growing data in a timely manner has made throughput computing an important aspect for emerging applications. Our analysis of a set of important throughput computing kernels shows that there is an ample amount of parallelism in these kernels which makes them suitable for today's multi-core CPUs and GPUs. In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, we perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average. In this paper, we discuss optimization techniques for both CPU and GPU, analyze what architecture features contributed to performance differences between the two architectures, and recommend a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.

Computer Networks And Communications

Hardware And Architecture

0

Paper

Computer Networks And Communications

623

0

Save

0

Default and Punishment in General Equilibrium1

Pradeep Dubey et al.Dec 3, 2004

We extend the standard model of general equilibrium with incomplete markets to allow for default and punishment by thinking of assets as pools. The equilibrating variables include expected delivery rates, along with the usual prices of assets and commodities. By reinterpreting the variables, our model encompasses a broad range of adverse selection and signalling phenomena in a perfectly competitive, general equilibrium framework. Perfect competition eliminates the need for lenders to compute how the size of their loan or the price they quote might affect default rates. It also makes for a simple equilibrium refinement, which we propose in order to rule out irrational pessimism about deliveries of untraded assets. We show that refined equilibrium always exists in our model, and that default, in conjunction with refinement, opens the door to a theory of endogenous assets. The market chooses the promises, default penalties, and quantity constraints of actively traded assets.

Ecology

Finance

0

Paper

Save

Inefficiency of Nash Equilibria

Pradeep DubeyFeb 1, 1986

It is shown that Nash Equilibria of smooth games generally tend to be inefficient in the Pareto sense.

Modeling And Simulation

Economics And Econometrics

0

Paper

Modeling And Simulation

328

0

Save

0

FAST

Changkyu Kim et al.Jun 6, 2010

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal.

Artificial Intelligence

Theoretical Computer Science

0

Paper

Artificial Intelligence

297

0

Save

0

On the uniqueness of the Shapley value

Pradeep DubeySep 1, 1975

Philosophy

Paleontology

0

Paper

Save

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs

Anthony Nguyen et al.Nov 1, 2010

Stencil computation sweeps over a spatial grid over multiple time steps to perform nearest-neighbor computations. The bandwidth-to-compute requirement for a large class of stencil kernels is very high, and their performance is bound by the available memory bandwidth. Since memory bandwidth grows slower than compute, the performance of stencil kernels will not scale with increasing compute density. We present a novel 3.5D-blocking algorithm that performs 2.5D-spatial and temporal blocking of the input grid into on-chip memory for both CPUs and GPUs. The resultant algorithm is amenable to both thread- level and data-level parallelism, and scales near-linearly with the SIMD width and multiple-cores. Our performance numbers are faster or comparable to state-of-the-art-stencil implementations on CPUs and GPUs. Our implementation of 7-point-stencil is 1.5X-faster on CPUs, and 1.8X faster on GPUs for single- precision floating point inputs than previously reported numbers. For Lattice Boltzmann methods, the corresponding speedup number on CPUs is 2.1X.

Computational Mechanics

Computer Networks And Communications

0

Paper

Computational Mechanics

291

0

Save

0

ClearPath

Stephen Guy et al.Aug 1, 2009

We present a new local collision avoidance algorithm between multiple agents for real-time simulations. Our approach extends the notion of velocity obstacles from robotics and formulates the conditions for collision free navigation as a quadratic optimization problem. We use a discrete optimization method to efficiently compute the motion of each agent. This resulting algorithm can be parallelized by exploiting data-parallelism and thread-level parallelism. The overall approach, ClearPath, is general and can robustly handle dense scenarios with tens or hundreds of thousands of heterogeneous agents in a few milli-seconds. As compared to prior collision avoidance algorithms, we observe more than an order of magnitude performance improvement.

Artificial Intelligence

Automotive Engineering

0

Paper

Artificial Intelligence

291

0

Save

0

Sort vs. Hash revisited

Changkyu Kim et al.Aug 1, 2009

Join is an important database operation. As computer architectures evolve, the best join algorithm may change hand. This paper re-examines two popular join algorithms -- hash join and sort-merge join -- to determine if the latest computer architecture trends shift the tide that has favored hash join for many years. For a fair comparison, we implemented the most optimized parallel version of both algorithms on the latest Intel Core i7 platform. Both implementations scale well with the number of cores in the system and take advantages of latest processor features for performance. Our hash-based implementation achieves more than 100M tuples per second which is 17X faster than the best reported performance on CPUs and 8X faster than that reported for GPUs. Moreover, the performance of our hash join implementation is consistent over a wide range of input data sizes from 64K to 128M tuples and is not affected by data skew. We compare this implementation to our highly optimized sort-based implementation that achieves 47M to 80M tuples per second. We developed analytical models to study how both algorithms would scale with upcoming processor architecture trends. Our analysis projects that current architectural trends of wider SIMD, more cores, and smaller memory bandwidth per core imply better scalability potential for sort-merge join. Consequently, sort-merge join is likely to outperform hash join on upcoming chip multiprocessors. In summary, we offer multicore implementations of hash join and sort-merge join which consistently outperform all previously reported results. We further conclude that the tide that favors the hash join algorithm has not changed yet, but the change is just around the corner.

Artificial Intelligence

Computer Networks And Communications

0

Paper

Artificial Intelligence

278

0

Save

Larrabee

Mathematical Properties of the Banzhaf Power Index

Debunking the 100X GPU vs. CPU myth

Default and Punishment in General Equilibrium1

Inefficiency of Nash Equilibria

FAST

On the uniqueness of the Shapley value

3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs

ClearPath

Sort vs. Hash revisited

Scan to connect with one of our mobile apps

Coinbase Wallet app

Coinbase app

Or try the Coinbase Wallet browser extension