SVM regression on time series, is there a lag?

1 min read

It would be nice if we can predict the future. For example, give the following time series, can we predict the next point?

Let’s use SVM regression, which is said to be powerful. We use the immediate past data point as the predictor. We train our model with the first 70% of data. Blue and Black are actual data, and Red and Pink are predicted data.

The prediction in general matches the trend. But if you look closely, you see that the predicted data is always lagging the actual data by one time step. See a zoom in below.

Why does this lag come from?

Let’s plot the predictor and the predicted (i.e. the current data point vs the next data point):

It looks normal to me.

It took me a few hours to think about this. Well, the reason turns out to be simple. It’s because our SVM model is too simple (only taking the last data point as predictor): if a data has a increasing trend, then the SVM model, which only consider the immediate history, will give a high predicted value if the current data value is high, a low value if the current data value is low. As a consequence, the predicted value is actually more similar to the current value – and that gives a lag if compared to the actual data.

To reduce the lag, you can build a more powerful SVM model – say use the past 2 data points as the predictor. It will make a more reliable prediction – if the data is not random. See below comparison: you can easily see the lag is much smaller.

Source code can be downloaded here test_svr. Part of the source code is adapted from


作者:北京师范大学 龙宇航,longyuhangwork@163.com代码来源(见本页底部):周思远 在使用wtc计算脑间神经同步后,我们需要在多个频率段、多个通道组合上对神经同步值进行统计检验,因此当进行频段选择时,面临多重比较的问题。为了解决多重比较的问题,可以采取基于参数或非参数检验的多重比较矫正的方法。由于基于非参数检验的多重比较矫正对数据的分布形态没有严格要求,因此具有更广泛的应用场景 (Maris and Oostenveld, 2007)。本文即介绍基于随机置换的非参数检验的方法 (Zheng et al., 2020; Long et al., 2021)。 在寻找感兴趣的效应时,我们采取了基于频率簇(Cluster)的方法,即在频率方向寻找连续显著的Cluster,该方法比基于最强效应点的方法具有更为优秀的抗噪音能力。值得注意的是,我们并没有沿着通道的方向去寻找连续显著的通道簇,这是因为沿着通道方向寻找Cluster容易受到生理噪音的影响。 下面进入具体的实操部分。假设本例招募了22对组1被试及22对组2被试,每对被试分别进行3种条件的任务,因此本例是2(组别,被试间因素)*3(条件,被试内因素)的实验设计。本例对神经同步值进行2*3的混合方差分析,并关注交互作用。 具体来讲,进行置换检验需要进行以下几个步骤:1. 重采样;2. 对随机样本进行计算及统计;3. 计算真实样本的统计量;4. 真实样本与随机样本的对比。下面依次进行介绍。 1. 重采样...
Xu Cui
1 min read

第二十期 fNIRS Journal Club 通知 2021/06/26,10:00am

同时用fNIRS和EEG测量脑信号有哪些好处?技术上应该注意什么?美国斯坦福大学李日辉博士,将为大家讲解他做过的一个同时用fNIRS和EEG测量的实验。热烈欢迎大家参与讨论。 时间: 北京时间2021年6月26日上午10:00地点: https://zoom.com房间号: 856 9352 0230密码: 695930 李博士要讲解的文章如下: Li, Li, Roh, Wang, Zhang (2020) Multimodal Neuroimaging Using Concurrent EEG/fNIRS for Poststroke Recovery Assessment:...
Xu Cui
7 sec read

Calculate phase difference between two general signals (e.g. HbO…

In a recent fNIRS journal club (vedio recorded here), Dr. Tong talked about their work on the phase difference between oxy and deoxy Hb,...
Xu Cui
1 min read

8 Replies to “SVM regression on time series, is there a lag?”

  1. Hi Mr Cui,
    I have found the same situation that you described here in this post.
    I tried to add more data points before the day that I want to predict, despite that the lag is still there. How is possible that? I have to change in manual way some weights in the the SMV Function about these previous data in order to obtain a future value?

  2. Hi Xu Cui,

    1)Which part of your source codes correspond to the lag reduction?
    2)What if my data do not have a trend? If it’s random, can the code be used to reduce the lag?

Leave a Reply

Your email address will not be published. Required fields are marked *