A beginners guide to downloading, processing, and plotting marine data with Python
I recently came across a really cool website https://oceanpython.org/ which has some cool tutorials on how to plot some marine science data in Python. It does have a couple of problems though. First, the tutorials are all in Python2 (Python3 came out in 2008!), and the code tends to just be vomited into a single file. Let’s see if we can clean it up and learn about marine science along the way!
As a general start, let’s begin by making a virtual environment
$ python -m venv .venv
$ source .venv/bin/activate
We’ll also need a requirements.txt
with the following dependencies
1numpy
2matplotlib
3scipy
4cartopy
Which we can install with
$ pip install -r requirements.txt
The first type of plot is a quiver plot. These plots display the current of an ocean in an intuitive way. The data will be sourced from http://lobo.satlantic.com/ which makes two measurements - the current across the channel (which we call u
) and the current along the channel (which we call v
).
The first step is to import what we’ll need plus define some constants
1from urllib.request import urlopen
2from io import StringIO
3import csv
4import numpy as np
5from datetime import datetime
6import matplotlib.pyplot as plt
7
8START = '20070501'
9END = '20070530'
START
and END
represent the data ranges we want to view.
We want to download the data from lobo and then convert it into a nice convenient format for us to work with, so let’s make a function to handle that
1def download_data():
2 # Read data from LOBO buoy
3 response = urlopen('http://lobo.satlantic.com/cgi-data/nph-data.cgi?min_date='
4 +START+'&max_date='+END+'&y=current_across1,current_along1')
5 data = StringIO(response.read().decode("utf-8"))
6
7 r = csv.DictReader(data, delimiter='\t')
8
9 return r
We now have the data in the format of a list of dictionaries. The next thing we’ll want to do is convert it into a more usable format for the plot. We can make another function for that
1def process_data(data):
2 d, u, v = [],[],[]
3 for row in data:
4 d.append(row['date [AST]'])
5 u.append(row['current across channel [m/s]'])
6 v.append(row['current along channel [m/s]'])
7
8 # Change the time strings into datetime objects
9 # ...and calculate 'decday' (i.e. "days from start")
10 datestart = datetime.strptime(d[0],"%Y-%m-%d %H:%M:%S")
11 date,decday = [],[]
12 for row in d:
13 daterow = datetime.strptime(row,"%Y-%m-%d %H:%M:%S")
14 date.append(daterow)
15 decday.append((daterow-datestart).total_seconds()/(3600*24))
16
17 # Convert lists to numpy arrays
18 u = np.array(u, dtype=float)
19 v = np.array(v, dtype=float)
20
21 return (date, decday, u, v)
We want 4 things from this: the absolute date, the number of days since the start of the range, as well as the two currents.
We actually have everything we need to start plotting now. We can plot with the following function
1def quiver_plot(date, decday, u, v):
2 # Plot the output ************************************************************
3 # Plot quiver
4 fig1, (ax1, ax2) = plt.subplots(2,1)
5 magnitude = (u**2 + v**2)**0.5
6 maxmag = max(magnitude)
7 ax1.set_ylim(-maxmag, maxmag)
8 fill1 = ax1.fill_between(decday, magnitude, 0, color='k', alpha=0.1)
9
10 # Fake 'box' to be able to insert a legend for 'Magnitude'
11 p = ax1.add_patch(plt.Rectangle((1,1),1,1,fc='k',alpha=0.1))
12 leg1 = ax1.legend([p], ["Current magnitude [m/s]"],loc='lower right')
13 leg1._drawFrame=False
14
15 # 1D Quiver plot
16 q = ax1.quiver(decday,0,u,v,
17 color='r',
18 units='y',
19 scale_units='y',
20 scale = 1,
21 headlength=1,
22 headaxislength=1,
23 width=0.004,
24 alpha=0.5)
25 qk = plt.quiverkey(q,0.2, 0.05, 0.2,
26 r'$0.2 \frac{m}{s}$',
27 labelpos='W',
28 fontproperties={'weight': 'bold'})
29
30 # Plot u and v components
31 ax1.axes.get_xaxis().set_visible(False)
32 ax1.set_xlim(0,max(decday)+0.5)
33 ax1.set_title("Current velocity from LOBO (Halifax, Canada)")
34 ax1.set_ylabel("Velocity (m/s)")
35 ax2.plot(decday, v, 'b-')
36 ax2.plot(decday, u, 'g-')
37 ax2.set_xlim(0,max(decday)+0.5)
38 ax2.set_xlabel("Days from start")
39 ax2.set_ylabel("Velocity (m/s)")
40 leg2 = plt.legend(['v','u'],loc='upper left')
41 leg2._drawFrame=False
42
43 # plt.show()
44
45 # Save figure (without 'white' borders)
46 plt.savefig('1D_quiver.png', bbox_inches='tight')
47
Of course, we just need to tie this all together with a main function
1def main():
2 data = download_data()
3 date, decday, u, v = process_data(data)
4 quiver_plot(date, decday, u, v)
5
6if __name__ == "__main__":
7 main()
Which produces the following output
To explain this plot, it makes sense to start with the bottom plot. It shows the current velocity across and along the channel. Along the channel is clearly much more volatile. The top plot shows the total current velocity in grey. The red arrows show the current as a vector - the length shows the magnitude and the angle shows the direction. Up and down represents along the channel and side to side shows across the channel - you can think of each point as like a top-down direction pointing in the direction the current is flowing
For the next plot we’ll use a Butterworth filter to smooth the temperature readings from the LOBO dataset. We’ll start again with importing what we need and some constants
1from urllib.request import urlopen
2from io import StringIO
3import csv
4import numpy as np
5from datetime import datetime
6import scipy.signal as signal
7import matplotlib.pyplot as plt
8
9START = '20111118'
10END = '20121125'
This time we use a slightly different URL to request different data from LOBO
1def download_data():
2 # Read data from LOBO buoy
3 response = urlopen('http://lobo.satlantic.com/cgi-data/nph-data.cgi?min_date='
4 +START+'&max_date='+END+'&y=temperature')
5 data = StringIO(response.read().decode("utf-8"))
6
7 r = csv.DictReader(data, delimiter='\t')
8
9 return r
Now we need to process that into two arrays: one of DateTime objects and the other a numpy array of temperature
1def process_data(r):
2 date, temp = [],[]
3 date, temp = zip(*[(datetime.strptime(x['date [AST]'], "%Y-%m-%d %H:%M:%S"), \
4 x['temperature [C]']) for x in r if x['temperature [C]'] is not None])
5
6 temp = np.array(temp)
7 temp = temp.astype(float)
8
9 return (date, temp)
Using Scipy to create and apply a Butterworth filter is really easy (without scipy, it would be a pretty arduous task). We can play around with N and Wn to see what kinds of different smoothing we can get.
1def smooth_data(temp):
2 N = 2 # Filter order
3 Wn = 0.01 # Cutoff frequency
4 B, A = signal.butter(N, Wn, output='ba')
5
6 tempf = signal.filtfilt(B, A, temp)
7
8 return tempf
However, we have the full dataset so we can use a smoother instead of a filter (filters generally smooth datasets reading by reading, whereas a smoother takes the entire dataset into account). Fortunately, someone working here at Naurt made a perfect smoother for Python, called the Whittaker-Eilers smoother. You can find the repo here. To install it, add the following to requirements.txt
whittaker-eilers
And run pip install -r requirements.txt
again
Import it with
from whittaker_eilers import WhittakerSmoother
The Whittaker smoother can be used like so
1def smooth_data_whittaker(temp):
2
3 whittaker_smoother = WhittakerSmoother(lmbda=2e4, order=2, data_length=len(temp))
4
5 smoothed_data = whittaker_smoother.smooth(temp)
6
7 return smoothed_data
Finally, we need to plot this, which we can do very simply. We’ll also plot the residuals to see if the Whittaker performed better than the Butterworth filter for this configuration.
1def plot_temps(date, temp, tempf, w_temp):
2 # Make plots
3 fig = plt.figure()
4 ax1 = fig.add_subplot(211)
5 plt.plot(date, temp, 'b-')
6 plt.plot(date, tempf, 'r-',linewidth=2)
7 plt.plot(date, w_temp, 'g-',linewidth=2)
8 plt.ylabel("Temperature (oC)")
9 plt.legend(['Original', 'Filtered (Butterworth)', 'Filtered (Whittaker)'])
10 plt.title("Temperature from LOBO (Halifax, Canada)")
11 ax1.axes.get_xaxis().set_visible(False)
12
13 ax1 = fig.add_subplot(212)
14 plt.plot(date,temp-tempf, 'r-')
15 plt.plot(date,temp-w_temp, 'g-')
16 plt.ylabel("Temperature (oC)")
17 plt.xlabel("Date")
18 plt.legend(['Residuals (Butterworth)', 'Residuals (Whittaker)'])
19
20 plt.show()
We can tie all this together with a main function
1def main():
2 r = download_data()
3 date, temp = process_data(r)
4 tempf = smooth_data(temp)
5 w_temp = smooth_data_whittaker(temp)
6 plot_temps(date, temp, tempf, w_temp)
7
8if __name__ == "__main__":
9 main()
The Whittaker smoother is quite clearly better for this configuration. It’s able to respond much more sensitively to sudden changes in temperature, which allows it to reduce the residual. This is often the case for smoothers, working with the full dataset against a filter running measurement-by-measurement.