"Osband, Ian. " | Search | Stanford Libraries

Catalog

See 2 results

Physical and digital books, media, journals, archives, and databases.

Deep exploration via randomized value functions [electronic resource]

Osband, Ian.

2016.

The "Big Data" revolution is spawning systems designed to make decisions from data. Statistics and machine learning has made great strides in prediction and estimation from any fixed dataset. However, if you want to learn to take actions where your choices can affect both the underlying system and the data you observe, you need reinforcement learning. Reinforcement learning builds upon learning from datasets, but also addresses the issues of partial feedback and long term consequences. In a reinforcement learning problem the decisions you make may affect the data you get, and even alter the underlying system for future timesteps. Statistically efficient reinforcement learning requires "deep exploration" or the ability to plan to learn. Previous approaches to deep exploration have not been computationally tractable beyond small scale problems. For this reason, most practical implementations use statistically inefficient methods for exploration such as epsilon-greedy dithering, which can lead to exponentially slower learning. In this dissertation we present an alternative approach to deep exploration through the use of randomized value functions. Our work is inspired by the Thompson sampling heuristic for multi-armed bandits which suggests, at a high level, to "randomly select a policy according to the probability that it is optimal". We provide insight into why this algorithm can be simultaneously more statistically efficient and more computationally efficient than existing approaches. We leverage these insights to establish several state of the art theoretical results and performance guarantees. Importantly, and unlike previous approaches to deep exploration, this approach also scales gracefully to complex domains with generalization. We complement our analysis with extensive empirical experiments; these include several didactic examples as well as a recommendation system, Tetris, and Atari 2600 games.
A Tutorial on Thompson Sampling

Russo, Daniel J.

Hanover, MA : Now Publishers Inc., 2018

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use. A Tutorial on Thompson Sampling covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, product recommendation, assortment, active learning with neural networks, and reinforcement learning in Markov decision processes. Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions. It also discusses when and why Thompson sampling is or is not effective and relations to alternative algorithms.

Online ProQuest Ebook Central

Guides

Course- and topic-based guides to collections, tools, and services.

No guide results found... Try a different search

Library website

Library info; guides & content by subject specialists

No website results found... Try a different search

Exhibits

Digital showcases for research and teaching.

No exhibits results found... Try a different search

EarthWorks

Geospatial content, including GIS datasets, digitized maps, and census data.

No earthworks results found... Try a different search

More search tools

Tools to help you discover resources at Stanford and beyond.

Search all library resources