Abstract
In kernel based methods such as Support Vector Machines, Kernel PCA,
Gaussian Processes or Regularization Networks the computational requirements scale as
O(n^3) where n is the number of training points. In this paper we investigate
Kernel Principal Component Regression (KPCR) with the Expectation Maximization approach
in estimating of the subset of p principal components (p < n) in a feature space
defined by a positive definite kernel function. The computational requirements of the method are O(pn^2).
Moreover, the algorithm can be implemented with memory requirements O(p^2)+O((p+1)n)).
We give the theoretical description explaining how by the proper selection of a subset of
non-linear principal components desired generalization of the KPCR is achieved.
On two data sets we experimentally demonstrate this fact.
Moreover, on a noisy chaotic Mackey-Glass time series prediction
the best performance is achieved with p << n and experiments also suggests that
in such cases we can also use significantly reduced training data sets to estimate the
non-linear principal components.
The theoretical relation and experimental comparison to Kernel Ridge Regression
and epsilon-insensitive Support Vector Regression is also given.
Go back