biscotty's Workshop

biscotty's Workshop

GPS Mapping with R

A comparison with Python

Brian Carey's avatar
Brian Carey
Apr 15, 2025
∙ Paid
Share

I recently published a series of articles on analyzing GPS data from personal sport tracking software using Python. I’ve started using R again lately and, while I like Python, I really like R. R is not a general-purpose language like Python, and is therefore far less commonly studied. It was mission-built for this type of work, and, as is often the case with custom-built tools, it is powerful, comfortable, and, frankly, fun to work with. I thought it might be interesting to compare the process in Python described in the previous articles with the process in R.

Libraries

Python data frames are built on the pandas and numpy libraries, with matplotlib as the primary plotting tool. Data frames and vector processing are native to R. The dplyr library provides convenience functions for manipulating data. The amazing gplot2 provides the functionality of matplotlib and seaborn and more, with simple syntax. The sf package, which stands for Simple Features, provides the geometry and geospatial functionality which geopandas and shapely do in Python. I’ll use ggspatial here for basemap tiles where I used contextily in Python, and I’ll add patchwork for convenient side-by-side display.

library(gpx)
library(sf)
library(dplyr)
library(ggplot2)
library(ggspatial)
library(patchwork)

Syntax

A few notes about syntax to start off: R uses the <- arrow for assignment, but accepts = as well. Data frame slicing uses the same [row,column] approach with explicit or boolean values. Unlike pandas, which distinguishes between df.loc and df.iloc, in R you can slice using numeric indices or strings without distinction. On a similar topic, R uses one-indexing, but is inclusive of the end, so df[1:2] in R is equivalent to Python’s df.iloc[:2], or more explicitly df.iloc[0:2], and the second element is directly accessed with df[2] instead of df.iloc[1].

There’s an important twist to this which will catch you multiple times. Even though the syntax is [row,column], if you supply only one numeric index, without a comma, it will be interpreted as a column index. So df[1] gets the first column, while df[1, ] gets the first row.

The main syntactical difference is R’s extensive use of piping, using the pipe operator |>, sometimes written as %>%. This looks similar to accessing a series of an object’s methods through a chain of .s in Python, but it isn’t. The pipe in R works like the pipe in a Linux shell, simply passing the output of one function to the next function as its first argument. This is one of my personal favorite aspects of working in R, since it allows for natural expression of a series of steps which constitue a workflow. ggplot2 takes a similar syntactical approach, layering elements of the plot by chaining using the + operator.

Loading the data

Let’s get started. With Python, we needed to parse the raw gpx data, which is in an XML format, to a CSV formatted file, which could then be imported into a pandas data frame, and then turned that into a geopandas data frame. I used beautifulsoup to do so. R, fortunately, has a gpx library that allows us to go straight from gpx into a data frame. Let’s see what that looks like. The str() command will let us know what’s inside.

trek_data <- read_gpx("data/b3/Workout-2024-09-06-16-29-37.gpx")
str(trek_data)
List of 3
 $ routes   :List of 1
  ..$ :'data.frame':    0 obs. of  4 variables:
  .. ..$ Elevation: logi(0) 
  .. ..$ Time     : logi(0) 
  .. ..$ Latitude : logi(0) 
  .. ..$ Longitude: logi(0) 
 $ tracks   :List of 1
  ..$ River Vale:'data.frame':  137 obs. of  6 variables:
  .. ..$ Elevation : num [1:137] -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 ...
  .. ..$ Time      : POSIXct[1:137], format: "2024-09-06 16:29:37" "2024-09-06 16:29:37" ...
  .. ..$ Latitude  : num [1:137] 41 41 41 41 41 ...
  .. ..$ Longitude : num [1:137] -74 -74 -74 -74 -74 ...
  .. ..$ extensions: logi [1:137] NA NA NA NA NA NA ...
  .. ..$ Segment ID: int [1:137] 1 1 1 1 1 1 1 1 1 1 ...
 $ waypoints:List of 1
  ..$ :'data.frame':    0 obs. of  4 variables:
  .. ..$ Elevation: logi(0) 
  .. ..$ Time     : logi(0) 
  .. ..$ Latitude : logi(0) 
  .. ..$ Longitude: logi(0) 

The result is not a data frame, but a list of lists. The second one, called tracks, is the only one with observations, so we can start with that. Don’t forget that R does not zero-index lists, so we use 2 not 1, and extract it with double square brackets.

trek_tracks <- trek_data[[2]]
str(trek_tracks)
List of 1
 $ River Vale:'data.frame': 137 obs. of  6 variables:
  ..$ Elevation : num [1:137] -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 ...
  ..$ Time      : POSIXct[1:137], format: "2024-09-06 16:29:37" "2024-09-06 16:29:37" ...
  ..$ Latitude  : num [1:137] 41 41 41 41 41 ...
  ..$ Longitude : num [1:137] -74 -74 -74 -74 -74 ...
  ..$ extensions: logi [1:137] NA NA NA NA NA NA ...
  ..$ Segment ID: int [1:137] 1 1 1 1 1 1 1 1 1 1 ...

This gets us closer, now we have list of one. Let’s pull that out and display the first two rows.

trek <- trek_tracks[[1]]
trek[1:2,]
  Elevation                Time Latitude Longitude extensions Segment ID
1        -4 2024-09-06 16:29:37 41.01128  -74.0101         NA          1
2        -4 2024-09-06 16:29:37 41.01128  -74.0101         NA          1

Note the comma, which is very important. If only one value is supplied, it chooses columns instead of rows.

head(trek[1:2], 2)
  Elevation                Time
1        -4 2024-09-06 16:29:37
2        -4 2024-09-06 16:29:37

And the final frame looks like:

str(trek)
'data.frame':   137 obs. of  6 variables:
 $ Elevation : num  -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 ...
 $ Time      : POSIXct, format: "2024-09-06 16:29:37" "2024-09-06 16:29:37" ...
 $ Latitude  : num  41 41 41 41 41 ...
 $ Longitude : num  -74 -74 -74 -74 -74 ...
 $ extensions: logi  NA NA NA NA NA NA ...
 $ Segment ID: int  1 1 1 1 1 1 1 1 1 1 ...

Importing a collection of treks

Now that I know “where” the information is, I can go ahead and import a series of files and combine them into a single data frame. As I did with Python, I will assign a unique identifier to each trek, and then combine them. The R equiva

Keep reading with a 7-day free trial

Subscribe to biscotty's Workshop to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Brian Carey
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture