Data and smoothing used in connectstats

ConnectStats let us apply basics statistics technics to analyse our fitness data. We use these techniques to answer questions about the activities. How well did I do on this run? Is my stamina improving? Did I push myself more today that last week?

Raw Data

Before doing any analyses, let’s review the raw data we have access to. ConnectStats relies on the collection of the data is done with a garmin device and a collection of sensors.

The raw data varies by activity type and device.
For outdoor activities, the raw data will be gps coordinates, a timestamp, speed and values from different sensors: Heart Rate beat per minutes, cadence (legs or pedal), Power delivered on the bike, and sometimes more advance running dynamics like Vertical Oscillation, ground contact time. For swimming the data collected will be the time for each length and the number of stroke.

Here is an example of what the raw data looks like Blog txt

Depending on the activity or the device the raw data will be collected at fixed intervals or at different intervals trying to reduce the amount of data saved. If you move in a straight line for 5s at constant speed, it isn’t necessary to collect the points every seconds as they can easily be interpolated.

From this raw data, there are two main directions for the analysis we can do. The first is analysing a single activities and the second is analysing summary of each activities over time.

Focus on a single activity

The summary statistics are quite simple. Mostly we will want to look at the average of the different measure, maximum and minimum. Where it gets more interesting is to look at the series of data. As for most statistics analysis a first phase will be cleaning the data.

Data Cleaning

Let’s look at the graph of the data collected by the gps. You can see it is quite noisy, there is also large part of the data that is not really interesting to look at to. Some is gps noise, area with bad reception, pause due to stop lights,

Rawdatabad

To clean the data, we can first use assumption on expected constraints on the data. For example, define a minimum speed considered valid running speed, or a minimum valid heart rate to remove unreasonable spikes. Another technique used by connectstats is to filter points that have unrealistics acceleration.

FilterBadValues

The last smoothing technic we can apply is a moving average of a few points. Usually the app will use 5 points moving average for data, but it can be controlled in the sliding left settings window.

Filterbadnormalsmoothing

In the case below the graph start to be more readable now, especially with the 2 minutes moving average overlay, which shows the two short faster run intervals around 30min and 40min.

2 thoughts on “Data and smoothing used in connectstats

  1. couldn’t find your email so i use this way to provide you an idea to think about.
    i saw apps which can connect to the calendar. It would be veautifull to see activities from connect stat with time, date and activity maybe even burned calories as calender entry 😊

    I like the http://www.swimmingwatchtools.com/upload connect editor. Not one swim training log is right, so almost all need correction. You might coud directly takeover the source code from that guy who is not selling that editor yet and he’s not providing it on ios. So that could become an in app selling option I would easily pay 5EUR for and might some serious swimmers too.

    WOuld be nice to get some comment from you on my proposal

    I like your app.

    Mathias

    • For the calendar I will look at apple API, see if it’s easy. Though I think most people use the calendar for future dates, here it would be in the past.
      For editing, the tricky part is to get the user interface simple and intuitive on a phone. I’ll add your proposal to my list of potential features to think about, but I have little time to work on the app, so not sure it will be done soon… sorry.

Leave a Reply to Brice Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.