Physical and digital books, media, journals, archives, and databases.
Results include
  1. Effective data visualization : from design fundamentals to big data techniques

    Heer, Jeffrey Michael
    [Place of publication not identified] : O'Reilly Media, [2014]

    "Learn the methods you need to bring data to life through effective visualizations. In this video course, host Jeffrey Heer--co-founder of Trifacta--takes you through best practices for designing interactive visualizations, performing exploratory data analysis, and examining multidimensional data. You'll begin by learning the value of visualization, through design principles drawn from graphic design, visual art, perceptual psychology, and cognitive science. Using Trifacta's data transformation tools to illustrate some of the concepts, you'll also learn techniques for scaling visualizations to extremely large data sets."--Resource description page.

    Online Safari Books Online

  2. Software tools to facilitate research programming [electronic resource]

    Guo, Philip Jia.

    Research programming is a type of programming activity where the goal is to write computer programs to obtain insights from data. Millions of professionals in fields ranging from science, engineering, business, finance, public policy, and journalism, as well as numerous students and computer hobbyists, all perform research programming on a daily basis. My thesis is that by understanding the unique challenges faced during research programming, it becomes possible to apply techniques from dynamic program analysis, mixed-initiative recommendation systems, and OS-level tracing to make research programmers more productive. This dissertation characterizes the research programming process, describes typical challenges faced by research programmers, and presents five software tools that I have developed to address some key challenges. 1.) Proactive Wrangler is an interactive graphical tool that helps research programmers reformat and clean data prior to analysis. 2.) IncPy is a Python interpreter that speeds up the data analysis scripting cycle and helps programmers manage code and data dependencies. 3.) SlopPy is a Python interpreter that automatically makes existing scripts error-tolerant, thereby also speeding up the data analysis scripting cycle. 4.) Burrito is a Linux-based system that helps programmers organize, annotate, and recall past insights about their experiments. 5.) CDE is a software packaging tool that makes it easy to deploy, archive, and share research code. Taken together, these five tools enable research programmers to iterate and potentially discover insights faster by offloading the burdens of data management and provenance to the computer.

  3. Distantly supervised information extraction using bootstrapped patterns [electronic resource]

    Gupta, Sonal, 1985-

    Information extraction (IE) involves extracting information such as entities, relations, and events from unstructured text. Although most work in IE focuses on tasks that have abundant training data by exploiting supervised machine learning techniques, in practice, most IE problems do not have any supervised training data available. Learning conditional random fields (CRFs), a state-of-the-art supervised approach, is impractical for such real world applications because: (1) they require large and expensive labeled corpora, and (2) it is difficult to interpret them and analyze errors, an often-ignored but important feature. This dissertation focuses on information extraction for tasks that have no labeled data available, apart from some seed examples. Supervision using seed examples is usually easier to obtain than fully labeled sentences. In addition, for many tasks, the seed examples can be acquired using existing resources like Wikipedia and other human curated knowledge bases. I present Bootstrapped Pattern Learning (BPL), an iterative pattern and entity learning approach, as an effective and interpretable approach to entity extraction tasks with only seed examples as supervision. I propose two new tasks: (1) extracting key aspects from scientific articles to study the influence of sub-communities of a research community, and (2) extracting medical entities from online web forums. For the first task, I propose three new categories of key aspects and a new definition of influence based on the key aspects. This dissertation is the first work to address the second task of extracting drugs & treatments and symptoms & conditions entities from patient-authored text. Extracting these entities can aid in studying the efficacy and side effects of drugs and home remedies at a large scale. I show that BPL, using either dependency patterns or lexico-syntactic surface-word patterns, is an effective approach to solve both problems. It outperforms existing tools and CRFs. Similar to most bootstrapped or semi-supervised systems, BPL systems developed earlier either ignore the unlabeled data or make closed world assumptions about it, resulting in less accurate classifiers. To address this problem, I propose improvements to BPL's pattern and entity scoring functions by evaluating the unlabeled entities using unsupervised similarity measures, such as word embeddings and contrasting domain-specific and general text. I improve the entity classifier of BPL by expanding the training sets using similarity computed by distributed representations of entities. My systems successfully leverage unlabeled data and significantly outperform the baselines by not making closed world assumptions. Developing any learning system usually requires a developer-in-the-loop to tune the parameters. I utilize the interpretability of patterns to humans, a highly desirable attribute for industrial applications, to develop a new diagnostic tool for visualization of the output of multiple pattern-based entity learning systems. Such comparisons can help in diagnosing errors faster, resulting in a shorter and easier development cycle. I make source code of all tools developed in this dissertation publicly available.


Course- and topic-based guides to collections, tools, and services.
No guide results found... Try a different search

Library website

Library info; guides & content by subject specialists
No website results found... Try a different search


Digital showcases for research and teaching.
No exhibits results found... Try a different search


Geospatial content, including GIS datasets, digitized maps, and census data.
No earthworks results found... Try a different search

More search tools

Tools to help you discover resources at Stanford and beyond.