Physical and digital books, media, journals, archives, and databases.
Results include
  1. Interactive systems for data transformation and assessment [electronic resource]

    Kandel, Sean

    In spite of advances in technologies for processing and visualizing data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process often constitutes the most tedious and time-consuming aspect of analysis. This dissertation contributes novel techniques for coupling automated routines with interactive interfaces to enable more rapid data transformation and quality assessment. In this dissertation, we first present an interview study with enterprise data analysts. We characterize the process of industrial data analysis, document how organizational features of an enterprise impact analysis, describe recurring pain points, and discuss design implications for visual analysis tools. Next we introduce Wrangler, an interactive system for creating data transformation scripts. Wrangler combines direct manipulation of visualized data with automatic inference of relevant transforms, enabling analysts to iteratively explore the space of applicable operations and preview their effects. We present user study results showing that Wrangler significantly reduces specification time and promotes the use of robust, auditable transforms instead of manual editing. Underlying the Wrangler interface is a declarative data transformation language that supports code-generation of executable code in a variety of runtime platforms. For large data sets, an analyst can build and test a script on a sample of data before applying the script to the entire data set. Often times, errors or other anomalies will appear in the data set that did not appear in the sample. We introduce and evaluate two methods to aid more rapid debugging of large-scale transformation scripts. Surprise-based anomaly detection applies a model to classify output records as exceptions. Rule-based transform disambiguation generates example records to help analysts refine transformation scripts iv before applying them. After transforming a data set, an analyst often inspects the result for other data quality issues. We present Profiler, a visual analytic tool for assessing data quality issues. We present Profiler's architecture, including modular components for custom data types, anomaly detection routines and summary visualizations. The system contributes novel methods for integrated statistical and visual analysis, automatic view suggestion, and scalable visual summaries that support real-time interaction entirely in the browser with millions of data points. Taken together, this dissertation contributes novel methods for integrating automated routines with interaction and visualization techniques to improve the efficiency and scale at which data analysts can work.

  2. Principles of data wrangling : practical techniques for data preparation

    Sebastopol : O'Reilly, 2017.

    "A key task that any aspiring data-driven organization needs to learn is data wrangling, the process of converting raw data into something truly useful. This practical guide provides business analysts with an overview of various data wrangling techniques and tools, and puts the practice of data wrangling into context by asking, "What are you trying to do and why?""--Back cover.

    Online Safari Books Online


Course- and topic-based guides to collections, tools, and services.
No guide results found... Try a different search

Library website

Library info; guides & content by subject specialists
No website results found... Try a different search


Digital showcases for research and teaching.
No exhibits results found... Try a different search


Geospatial content, including GIS datasets, digitized maps, and census data.
No earthworks results found... Try a different search

More search tools

Tools to help you discover resources at Stanford and beyond.