ResearchHub | Open Science Community

0

Mastering the game of Go with deep neural networks and tree search

David Silver et al.Jan 26, 2016

Artificial Intelligence

Law

0

Paper

Artificial Intelligence

14,315

0

Save

0

Mastering the game of Go without human knowledge

David Silver et al.Oct 1, 2017

Artificial Intelligence

Law

0

Paper

Artificial Intelligence

8,183

0

Save

0

A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

David Silver et al.Dec 7, 2018

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.

Artificial Intelligence

Social Psychology

0

Paper

Artificial Intelligence

2,897

0

Save

0

Grandmaster level in StarCraft II using multi-agent reinforcement learning

Oriol Vinyals et al.Oct 30, 2019

Artificial Intelligence

Law

0

Paper

Artificial Intelligence

2,758

0

Save

0

Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

Shixiang Gu et al.May 1, 2017

Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations.

Artificial Intelligence

Electrical And Electronic Engineering

0

Paper

Artificial Intelligence

1,253

0

Save

0

Mastering Atari, Go, chess and shogi by planning with a learned model

Julian Schrittwieser et al.Dec 23, 2020

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

Artificial Intelligence

Developmental And Educational Psychology

0

Paper

Artificial Intelligence

1,196

0

Save

0

Why Copy Others? Insights from the Social Learning Strategies Tournament

Luke Rendell et al.Apr 8, 2010

It Pays to Be a Copy Cat Does it pay to copy what others do? Rendell et al. (p. 208 ) elected to copy Robert Axelrod's 1979 tournament in which strategies for playing the iterated prisoner's dilemma game were pitted against each other until an overall winner emerged—the tit-for-tat strategy. In the 2008 tournament, 100 social learning strategies designed to cope with a changing environment competed against each other; the winning strategy involved sampling the behaviors of other players periodically, rather than exploring the environment alone.

Cultural Studies

Sociology And Political Science

0

Paper

Save

A deep learning framework for neuroscience

Blake Richards et al.Oct 28, 2019

Systems neuroscience seeks explanations for how the brain implements a wide variety of perceptual, cognitive and motor tasks. Conversely, artificial intelligence attempts to design computational systems based on the tasks they will have to solve. In artificial neural networks, the three components specified by design are the objective functions, the learning rules and the architectures. With the growing success of deep learning, which utilizes brain-inspired architectures, these three designed components have increasingly become central to how we model, engineer and optimize complex artificial learning systems. Here we argue that a greater focus on these components would also benefit systems neuroscience. We give examples of how this optimization-based framework can drive theoretical and experimental progress in neuroscience. We contend that this principled perspective on systems neuroscience will help to generate more rapid progress. A deep network is best understood in terms of components used to design it—objective functions, architecture and learning rules—rather than unit-by-unit computation. Richards et al. argue that this inspires fruitful approaches to systems neuroscience.

Artificial Intelligence

Cognitive Neuroscience

6

Paper

Artificial Intelligence

692

0

Save

0

Random synaptic feedback weights support error backpropagation for deep learning

Timothy Lillicrap et al.Nov 8, 2016

Abstract The brain processes information through multiple layers of neurons. This deep architecture is representationally powerful, but complicates learning because it is difficult to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame by multiplying error signals with all the synaptic weights on each neuron’s axon and further downstream. However, this involves a precise, symmetric backward connectivity pattern, which is thought to be impossible in the brain. Here we demonstrate that this strong architectural constraint is not required for effective error propagation. We present a surprisingly simple mechanism that assigns blame by multiplying errors by even random synaptic weights. This mechanism can transmit teaching signals across multiple layers of neurons and performs as effectively as backpropagation on a variety of tasks. Our results help reopen questions about how the brain could use error signals and dispel long-held assumptions about algorithmic constraints on learning.

Philosophy

Artificial Intelligence

0

Paper

Save

Vector-based navigation using grid-like representations in artificial agents

Andrea Banino et al.May 1, 2018

Deep neural networks have achieved impressive successes in fields ranging from object recognition to complex games such as Go1,2. Navigation, however, remains a substantial challenge for artificial agents, with deep neural networks trained by reinforcement learning3–5 failing to rival the proficiency of mammalian spatial behaviour, which is underpinned by grid cells in the entorhinal cortex6. Grid cells are thought to provide a multi-scale periodic representation that functions as a metric for coding space7,8 and is critical for integrating self-motion (path integration)6,7,9 and planning direct trajectories to goals (vector-based navigation)7,10,11. Here we set out to leverage the computational functions of grid cells to develop a deep reinforcement learning agent with mammal-like navigational abilities. We first trained a recurrent network to perform path integration, leading to the emergence of representations resembling grid cells, as well as other entorhinal cell types12. We then showed that this representation provided an effective basis for an agent to locate goals in challenging, unfamiliar, and changeable environments—optimizing the primary objective of navigation through deep reinforcement learning. The performance of agents endowed with grid-like representations surpassed that of an expert human and comparison agents, with the metric quantities necessary for vector-based navigation derived from grid-like units within the network. Furthermore, grid-like representations enabled agents to conduct shortcut behaviours reminiscent of those performed by mammals. Our findings show that emergent grid-like representations furnish agents with a Euclidean spatial metric and associated vector operations, providing a foundation for proficient navigation. As such, our results support neuroscientific theories that see grid cells as critical for vector-based navigation7,10,11, demonstrating that the latter can be combined with path-based strategies to support navigation in challenging environments. Grid-like representations emerge spontaneously within a neural network trained to self-localize, enabling the agent to take shortcuts to destinations using vector-based navigation.

Artificial Intelligence

Cognitive Neuroscience

0

Paper

Artificial Intelligence

602

0

Save