Time Series Analysis of Arctic Sea Ice
The main objective of our research work is to identify a model to forecast the sea ice level in Arctic sea. The data in our study is sea ice level from January 1990 to March 2011. We performed data analysis using Autoregressive Integrated Moving Average (ARIMA) modeling techniques. We selected a suitable forecasting model by considering the Akaike Information Criterion (AIC). From the results, we concluded that the autoregressive AR(4) model was the best suitable model. Then, we forecasted the observations for testing data using this model and calculated mean absolute percentage error (MAPE) to be 24.56737% indicating that the model is much accurate.
Keywords: Time Series Analysis, ARIMA model, AR model.
In recent years, it has been observed that Arctic sea ice is melting at a faster rate. Global warming is causing a decline in the sea ice level. It can result into ill effects on ecosystems as well as a threat to marine species. Thus, it is of importance to consider the analysis of Arctic sea ice. Several statistical methods like time series analysis, regression analysis are available for forecasting. Using these methods, we can select a suitable forecasting model. The model should have higher accuracy, which means minimum error.
Our work is organized as follows. We discuss the literature review in Section II. In Section III, we discuss the methods and statistical techniques used for obtaining the suitable time series model. In the next section, we present the results obtained from analysis. Finally, in Section V, we discuss the conclusions.
II. Literature Review
The research on time series analysis of Arctic sea ice has been conducted by many authors over the years. Holt, B., Johnson, M. P., Perkovic‐Martin, D., and Panzer, B. ( 2015)  studied the time series analysis on depth of Arctic sea ice. Piwowar, J. & Wessel, G. & LeDrew, Ellsworth. (1996)  studied the image analysis using the time series for Arctic sea ice.
Our main objective is to identify a best suitable model for forecasting the sea ice level of Arctic sea, using time series modeling techniques. We have used Microsoft Excel and R language for data preprocessing and data analysis.
A. Data Preprocessing
We used the data set  about the monthly sea ice level of Arctic sea over a period from January 1990 to March 2011. Since we required the number of observations in multiples of 12, we deleted the observations corresponding to January 2011, February 2011 and March 2011.
1. Finding missing values
Before proceeding to analysis, there is a need to check whether missing values are present in data. If some of the observations are missing, then we must impute the missing observations.
2. Changing the format of data
The variable ‘Time’ in the data is taking values like 1990M02, which indicates February 1990. Using Microsoft Excel, we transformed this variable, to take value 1990 for January 1990 and 1990.083333 for February 1990; (1/12 = 083333).
3. Dividing data into training and test data
In order to build a time series model and then check its accuracy for forecasting, we included 80% of observations in the training data and remaining 20% observations in the test data.
B. Checking for the presence of trend and seasonality
We assume the model to be additive, where, denotes the trend component, denotes the seasonality component and denotes an error component of time series.
We obtained the rough idea about the presence of the trend and seasonality from time series plot. To get the exact idea about the presence of the trend, we used the relative ordering test. We have to test, H0: Absence of trend in the data against H1: Presence of trend in the data. The statistic is,
Z = ~ N(0,1), under H0, where, T = 1-4Q/(n(n-1)) and is called as Kendall’s T, the rank correlation coefficient, E(T) = 0 and V(T) = . Q is the number of decreasing points. The decision criterion is, reject H0, if observed |Z| > Zα/2 at α% level of significance. If Q > E(Q), it indicates a decreasing trend, else, it indicates an increasing trend. E(Q) = n(n-1)/4, under H0.
In order to check the presence of seasonality, we applied Friedman’s non parametric test on detrended data, to test H0: No seasonality is present in the data against H1: Seasonality is present in the data. The test statistic is, , where r is the number of months and c is the number of years, Mi = and Mij is the rank corresponding to ith month and jth year. Here, S~, under H0. The decision criterion is, reject H0, if the observed S > .
C. Estimation of trend and seasonality
We obtained estimate for the trend using a MA(moving average) filter. Let, d denote the period of seasonality. For, d = 2q, the estimate of trend is given by,
Estimated trend will be, . In order to estimate trend for the remaining time points, we use the symmetric padding technique. We add first q observations above the first term and last q terms below the last term, in order to estimate the trend for the remaining time points with the help of moving average method. Finally, we obtained estimate for the trend corresponding to all time points, . We obtained the detrended data as,
We obtained estimate for seasonality using the fast changing trend method. From the detrended data, we obtained the averages wk of following deviation over j years for all k=1, 2, …, d :
Xd(j-1)+k – md(j-1)+k, q+1 ≤ d(j-1) + k ≤ n-q. Estimate of seasonality is given by,
We obtained the deseasonalized data as, . We again estimated the trend by using moving average and symmetric padding and the data is again detrended. Let, this data be denoted as .
D. Checking for the stationarity of a time series
Before identification of model, we need to check whether data is stationary or not. By stationarity, we mean that the expectation, variance and auto-covariance structure become stable over the time. When stationarity is attained, the statistical equilibrium is achieved. To check stationarity, we used Augmented Dickey–Fuller (ADF) test. We have to test H0: ϒ = 0 against H1: ϒ < 0 (stationarity). The test statistic is . The more negative it is, the more likely that H0 will be rejected. If p-value is less than 0.05, then, H0 is rejected at 5% level of significance.
E. Building model and estimation of parameters
We used an autoregressive integrated moving average (ARIMA) model for forecasting. Let, denote the dth difference of . Then,
ARIMA(p, d, q) model is given by,
where, , p is the number of autoregressive terms, q is the number of lagged forecast errors and d is the number of non-seasonal differences needed for stationarity. If p ≠ 0, d = 0 and q = 0, then, it is AR(p) model. AR indicates an autoregressive model. If p = 0, d = 0 and q ≠ 0, then, it is MA(q) model. MA indicates moving average model. We estimated the parameters with the help of Akaike Information Criterion (AIC).
F. Forecasting using the time series model
Once we select an appropriate time series model, the next step is to forecast observations using the model. The testing data is also detrended and deseasonalized. We used the selected model to forecast observations on testing data.
G. Checking accuracy of the time series model
We measured the accuracy of the time series model by using mean absolute percentage error (MAPE). It is given by,
where, denotes actual observation and denotes forecast value. Lower is the MAPE, higher is the accuracy of the time series model.
IV. Data Analysis
We obtained the time series plot for training data as,[image: E:1st graph.jpeg]
From the graph, we observed that there is a decreasing trend and seasonality is present with period d = 12.
Using relative ordering test for testing the presence of trend, we get, Q = 11469 and under H0: E(Q) = 10353. Thus, Q > E(Q) indicating decreasing trend. We get, observed |Z| = 2.289772 and Z0.05/2 = 1.96. Since, observed |Z| > Z0.05/2, we reject H0 at 5% level of significance and conclude that trend is present in the time series data.
Using Friedman’s non parametric test for testing presence of seasonality, we get, observed S = 181.6244 and . Since, observed S > , we reject H0 at 5% level of significance and conclude that seasonality is present in the data.
After detrending and deseasonalizing data, we proceed to check the stationarity of data. Using Augmented Dickey–Fuller test (ADF), we get, DF = -4.2423 and p-value = 0.01 < 0.05. Thus, we reject H0 at 5% level of significance and conclude that the time series data is stationary.
We get, the time series model is ARIMA(4, 0, 0) which means AR(4) model, where AR indicates an autoregressive model. We obtained the estimates for the parameters of the model and their standard errors as,
We get, . Thus, the AR(4) model is,
Using this model, we obtained forecast for observations in testing dataset. We obtained the time series plot for actual observations and forecast for the test data as,[image: E:2nd graph.jpeg]
Points on the green line indicate forecast, while, those on the red line indicate actual observations for the testing data. From the above plot, we observe that the error in forecasting by using the AR(4) model is not very large. We will check the accuracy of the model on the test data set by using mean absolute percentage error (MAPE). We get, MAPE = 24.56737%. Thus, we can say that the error in forecasting is lesser and the model is much accurate.
The purpose of this research was to find a suitable model for forecasting the sea ice level in Arctic sea. We divided data into training data and testing data. We used training data for building the model and testing data for checking the accuracy of the model. In the training data we used relative ordering test for testing presence of trend and Friedman’s non parametric test for testing presence of seasonality. We concluded that the trend and seasonality were present in the data. Using moving average method and symmetric padding technique, we obtained the estimates for trend. Using fast changing trend method, we obtained estimates for seasonality and then, detrended and deseasonalized the data. We checked stationarity of the data by using Augmented Dickey–Fuller test (ADF). We concluded that the data is stationary and used the ARIMA modeling for forecasting the sea ice level. Then, we chose the suitable forecasting model with the help of Akaike Information Criteria (AIC). The results showed that the best suited model for forecasting sea ice level will be ARIMA(4, 0 ,0) model, which means, AR(4) model. The model is given as,
Then, we forecasted the observations for testing data using this model and calculated mean absolute percentage error (MAPE) to be 24.56737%, indicating that the model is much accurate.
- [Online]. Available: http://cran.us.r-project.org/ [Date accessed 21 June, 2020]
- [Online]. Available: https://en.wikipedia.org/wiki/Arctic_sea_ice_decline [Date accessed 21 June, 2020]
- Box, G.E.P., and Jenkins, G., (1970) Time Series Analysis, Forecasting and Control, HoldenDay, San Francisco
- Holt, B., Johnson, M. P., Perkovic‐Martin, D., and Panzer, B. ( 2015), Snow depth on Arctic sea ice derived from radar: In situ comparisons and time series analysis, J. Geophys. Res. Oceans, 120, 4260– 4287, doi:10.1002/2015JC010815
- Piwowar, J. & Wessel, G. & LeDrew, Ellsworth. (1996). Time Series Image Analysis Arctic Sea Ice. 18th Canadian Remote Sensing Symposium.. Proceedings..
- Chujai, Pasapitch & Kerdprasop, Nittaya & Kerdprasop, Kittisak. (2013). Time series analysis of household electric consumption with ARIMA and ARMA models. Lecture Notes in Engineering and Computer Science. 2202. 295-300.
- [Online]. Available: https://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average [Date accessed 21 June, 2020]
- [Online]. Available: https://people.duke.edu/~rnau/411arim.htm#:~:text=The%20ARIMA%20forecasting%20equation%20for,lags%20of%20the%20forecast%20errors. [Date accessed 21 June, 2020]
- [Online]. Available: https://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test [Date accessed 22 June, 2020]
- [Online]. Available: https://www.rdocumentation.org/packages/forecast/versions/8.12/topics/auto.arima [Date accessed 23 June, 2020]
- [Online]. Available: https://www.rdocumentation.org/packages/aTSA/versions/3.1.2/topics/adf.test [Date accessed 23 June, 2020]
- [Online]. Available: https://timeseries.weebly.com/data-sets.html [Date accessed 18 June, 2020]