If you have done data analysis with Pandas before, then you have likely encountered the pandas
.plot() API that renders static images using Matplotlib. The pandas
.plot API has emerged as a de-facto standard for high-level plotting APIs in Python, and is now supported by many different libraries (xarray, pandas bokeh, cufflinks, etc.) that use various underlying plotting engines to provide additional power and flexibility. Learning this API allows you to access capabilities provided by a wide variety of underlying tools, with relatively little additional effort.
In the first part of this tutorial we'll explore what is possible with the default
.plot() API and demonstrate the additional capabilities provided by hvPlot, a Python library that is part of the HoloViz ecosystem and built on top of powerful data visualization libraries like HoloViews, GeoViews, Datashader, Panel, and Bokeh. You will see how easy it is to:
- create interactive plots that help you understand your data fully,
- render large datasets faithfully to show both trends and outliers,
- obtain small multiples and widget selectors for exploring complex data,
- compose and link plots to show relationships.
In a data pipeline - a series of processing steps - each step can potentially accept multiple parameters. For instance, you may want to evaluate the impact of aggregating your data based on the mean or median of some column or dimension, or to select by individual values of some variable. In the end, the set of possible parameter values to explore can be quite large, even for a rather simple pipeline. This is when turning your pipeline into a web app can provide a huge benefit over a more traditional, cumbersome, and manual method of exploring data by editing Python code. A web app offers a clean, code-free view of the output, with parameters driven directly by buttons and sliders manipulated by the end user.
In the second part of this tutorial, we’ll introduce the new
.interactive() API of hvPlot that makes it trivial to turn data pipelines from PyData APIs you already use (pandas, xarray, Dask, cuDF, etc.) into web apps. This will include:
- an introduction to the concept of data pipelines,
- a brief introduction to Panel (a Python library for iteratively building polished web apps) which provides the widgets that can be used to replace arguments in a pipeline (e.g. a slider for a number) and components that can be used to display the output of a pipeline (e.g. tables, plots, BI indicators, etc.),
- an explanation of how
.interactive()works and replays your pipeline whenever a widget value changes, without you having to learn how to write and debug complex code with callbacks, and
- the iterative process of creating a polished web app from a typical data pipeline, whether from a Jupyter Notebook or your favorite editor.
We will cover the first part of a larger tutorial (that is about 4 hours long) that you can find at holoviz.org. The instructions to install the datasets and dependencies required to run the tutorial can be found on the Setup page.
Affiliation: Anaconda Inc.
visit the speaker at: Github
Software Engineer at Anaconda, maintaining and improving the open-source data viz libraries of the HoloViz ecosystem. Previously a civil engineer specialized in flood risk assessment, making flood maps with hydraulic simulation software.