Messy Sensor Data: A Programmer’s Cleaning Guide

Humans love to invent. From sundials to clocks, timers, and embedded computer sensors, humans have a symbiotic relationship with their creations. They create and maintain sensors like weather stations, and these sensors tell them what the weather is like outside. Inventions like the weather stations are so widely used that most people never imagine them to malfunction. Even worse, sensors might function correctly enough to record data, but also defective enough to record the wrong numbers. How do we make the best of our data? In this talk, we will discuss different ways to clean up inconsistent data formats and remove invalid samples, using weather station as an example. Then, we will present tools to efficiently snap our data points to a grid, and fill in gaps to tidy up our data. Finally, we will look at how to prepare and store data for use in data science. This article uses Python and open source packages, but the concept will be applicable to all general programming languages.

Presented by

Xavier Ho

Xavier is a curiosity-driven designer, researcher and software engineer. He works for CSIRO creating interactive data visualisations. Pursuing a PhD part-time at Design Lab, University of Sydney keeps him busy, and sometimes he wonders about machines and humans and that philosophical lot. Previously, Xavier worked in a Sydney startup doing computer vision work, freelanced as a videographer, and taught a handful of programming classes to university design students. His passion lies somewhere in the spectrum of chocolates, video games, and a better world. Twitter: @Xavier_Ho