Physical and digital books, media, journals, archives, and databases.
Results include
  1. Robust learning : information theory and algorithms

    Steinhardt, Jacob
    [Stanford, California] : [Stanford University], 2018.

    We study the problem of robust learning in the presence of outliers when the dimensionality of the underlying space is large. We first develop a criterion, called resilience, under which robust learning is information-theoretically possible. We show that resilience gives tight bounds in many cases, and study its finite-sample behavior. Next, we turn our attention to efficient algorithms. We present two classes of algorithms---based on moment estimation and duality, respectively---that provide robust estimates as long as certain moments of the data are bounded. We apply these algorithms to mean estimation, stochastic optimization, and clustering.

  2. A neural network with feature sparsity

    Lemhadri, Ismael
    [Stanford, California] : [Stanford University], 2021

    First, we propose a neural network model with a separate linear (residual) term, that explicitly bounds the input layer weights for a feature by the linear weight for that feature. The model can be seen as a modification of so-called residual neural networks to produce a path of models that are feature- sparse, that is, use only a subset of the features. This is analogous to the solution path from the usual Lasso (L1-regularized) linear regression. We call the proposed procedure LassoNet and develop a projected proximal gradient algorithm for its optimization. This approach can sometimes give as low or lower test error than a standard neural network, and its feature selection provides more interpretable solutions. This thesis illustrates the method using both simulated and real data examples, and shows that it is often able to achieve competitive performance with a much smaller number of input features. We also discuss extensions of this work beyond supervised learning, which includes unsupervised learning, matrix completion, and sparsity in learned features. Second, we consider the problem of local feature attribution and selection for arbitrary black-box models. We introduce a geometric method which we call RbX, for Region-Based Explanations. This method relies on approximating the prediction model's level sets by convex polytopes, thus helping to simplify and interpret the model. We demonstrate the effectiveness of the method on a variety of synthetic and real data sets

  3. Lightweight statistical learning [electronic resource] : accelerating and avoiding empirical risk minimization

    Frostig, Roy.
    2017.

    In statistical machine learning, the goal is to train a model that, once deployed in the world, continues to predict accurately on fresh data. A unifying training principle is empirical risk minimization (ERM): globally minimizing the average prediction error incurred on a representative training set. Essentially, ERM prescribes optimization as a proxy for statistical estimation. Learning tasks for which ERM is computationally tractable call for optimization algorithms that scale as efficiently as possible as training set dimensions grow. Other tasks---namely, neural network learning and learning with indirect supervision---elude a general, tractable ERM algorithm in theory, motivating a reformulation of the training problem altogether. In this thesis, we first focus on efficient algorithms for empirical risk minimization in the overdetermined, convex setting with fully-observed data, where ERM is tractable and the aim is optimal computational efficiency. We improve the guaranteed running time for a broad range of problems (including basic regression problems) in terms of the condition number induced by the underlying training set. Atop these methods, we develop an algorithm for principal component regression. Next, we move to settings where general tractability of ERM is not guaranteed, but still guides algorithm design in practice. Specifically, we study two learning frameworks: convolutional neural network learning, and learning conditional random fields from indirect supervision. In either context, the prevalent replacement to global ERM is gradient descent. Since descent could converge to arbitrarily bad local optima, it renders initialization more relevant. In each setting, we develop lightweight training algorithms based on convex procedures. For neural networks, we consider replacing optimization of hidden layers with randomization. For indirect supervision, we reduce estimation to solving a linear system followed by a convex maximum-likelihood step. Via the role of initialization, both algorithms imply conditions under which local descent ought to fare at least as well as they do.

Guides

Course- and topic-based guides to collections, tools, and services.
No guide results found... Try a different search

Library website

Library info; guides & content by subject specialists
No website results found... Try a different search

Exhibits

Digital showcases for research and teaching.
No exhibits results found... Try a different search

EarthWorks

Geospatial content, including GIS datasets, digitized maps, and census data.
No earthworks results found... Try a different search

More search tools

Tools to help you discover resources at Stanford and beyond.