Download pdf - Microwave Characterization Using Ridge Polynomial Neural Networks and Least-Square Support Vector Machines

990 IEEE TRANSACTIONS ON MAGNETICS, VOL. 47, NO. 5, MAY 2011

Microwave Characterization Using Ridge Polynomial Neural Networks andLeast-Square Support Vector Machines

T. Hacib�, Y. Le Bihan�, M. K. Smail�, M. R. Mekideche�, O. Meyer�, and L. Pichon�

Laboratoire d’études et de modélisation en électrotechnique, Faculté des Sciences de l’Ingénieur, Univ. Jijel, BP 98, OuledAissa, Jijel, 18000 Algérie

Laboratoire de Génie Electrique de Paris, SUPELEC, UMR 8507 CNRS,UPMC University of Paris, Plateau de Moulon91192 Gif-sur-Yvette Cedex, France

This paper shows that Ridge Polynomial Neural Networks (RPNN) and Least-Square Support Vector Machines (LS-SVM) techniqueprovide efficient tools for microwave characterization of dielectric materials. Such methods avoids the slow learning properties of mul-tilayer perceptrons (MLP) which utilize computationally intensive training algorithms and can get trapped in local minima. RPNN andLS-SVM are combined in this work with the Finite Element Method (FEM) to evaluate the dielectric materials properties. The RPNNis constructed from a number of increasing orders of Pi-Sigma units, it maintains fast learning properties and powerful mapping ca-pabilities of single layer High Order Neural Networks (HONN). LS-SVM is a statistical learning method that has good generalizationcapability and learning performance. The FEM is used to create the data set required to train the RPNN and LS-SVM. The performanceof a LS-SVM model depends on a careful setting of its associated hyper-parameters. In this study the LS-SVM hyper-parameters areoptimized by using a Bayesian regularization technique. Results show that LS-SVM can achieve good accuracy and faster speed thanneural network methods.

Index Terms—Least-square support vector machines, microwave characterization, ridge polynomial neural networks.

I. INTRODUCTION

N ON-DESTRUCTIVE testing methods for measuringphysical parameters have become of strong interest in

all fields of engineering. The dielectric properties of materials(dielectric constant and loss factor ) can be deducedfrom the admittance measured at the discontinuity plane of acoaxial probe. This implies the implementation of an inversionprocedure. The inversion methodology combines forward andinverse problem solutions of the investigated structure.

There are two kinds of approaches for the inverse problem so-lution. The first one is based on the use of global optimizationmethods, and the second on the use of polynomial functions todetermine correlation between searched parameters and admit-tance measured.

The application of global optimization methods assumesthe construction of an iterative procedure that uses the for-ward problem solution technique. The polynomial functionsapproach uses the forward problem solution technique as well.The admittance data of different but known values of modelparameters are used to construct the desired functions. Thereare different methods for this construction, but neural networksprovide an efficient help [1].

Higher Order Neural Networks (HONN) are a type of feed-forward neural networks. They have certain advantages overthe usual Multilayer Perceptrons (MLP) neural network. For aconsidered degree of complexity of the function to be approxi-mated, they are simple in their architecture and this potentially

Manuscript received May 31, 2010; revised July 16, 2010; accepted October02, 2010. Date of current version April 22, 2011. Corresponding author:T. Hacib (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMAG.2010.2087743

reduces the number of required training parameters. As a result,they can learn faster [2].

Higher order terms or product units in HONN can increasethe information capacity of the networks. The representationalpower of higher order terms can help solving complex problemsusing significantly smaller network sizes, while maintaining fastlearning properties [2]. Pi-Sigma Neural Networks (PSNN) area type of HONN, which were introduced by Shin et al. [3]. Theyhave a regular structure and require a reduced number of freeparameters, when compared to other single layer HONN. How-ever, the PSNN is not a universal approximator [4]. The RPNNwas originally proposed by Shin et al. [4]. RPNN has a well-reg-ulated structure, which is achieved through the embedding ofdifferent orders of PSNN and are universal approximators [5].

The Support Vector Machines (SVM) is a recent machinelearning method based on statistical learning theory, which ad-heres to structural risk minimization principle, aiming to min-imize both the empirical risk (estimation of the training error)and the complexity of the model, thereby providing high gen-eralization abilities (estimation accuracy) [6]. Nonlinear SVMprovides robust solutions by mapping the input space into ahigher dimensional feature space using kernel functions. Orig-inally, SVM has been developed to solve pattern recognitionproblems. With the introduction of Vapnik’s insensitive lossfunction, SVM has been extended to solve nonlinear regressionproblems and have shown excellent performances [7].

Least-Square Support Vector Machines (LS-SVM) are a re-formulation of the standard SVM that uses equality constraints(instead of the inequality constraints implemented in standardSVM) and a quadratic error term to obtain a linear set of equa-tions in a dual space [8]. By this way LS-SVM allow to reducethe learning cost compared to SVM.

In this paper, we study these new inversion methods for theevaluation of the microwave properties of dielectric materials(complex permittivity) from the admittance measured at the dis-

0018-9464/$26.00 © 2011 IEEE

HACIB et al.: MICROWAVE CHARACTERIZATION USING RIDGE POLYNOMIAL NEURAL NETWORKS 991

Fig. 1. Pi-Sigma neural network.

continuity plane of a coaxial open-ended probe which is con-nected to an impedance analyzer. The method combines eitherthe RPNN scheme or the LS-SVM scheme with the Finite Ele-ment Method (FEM).

In this work the FEM is used for the elaboration of RPNN andLS-SVM inverse models. For this purpose it provides the datasets required by these models. A data set is constituted of input(complex admittance, frequency) and output ) pairs. TheRPNN and LS-SVM models are then subsequently evaluated onexperimental data.

II. HIGHER ORDER NEURAL NETWORKS

A. Pi-Sigma Neural Network

The PSNN was introduced by Shin et al. [3]. It is a feedfor-ward network with a single hidden layer and product units at theoutput layer [4]. The PSNN uses the product of the sums of theinput components. The number of free parameters in the PSNNincreases linearly with the order of the network. The reduction inthe number of weights (parameters) allows the PSNN to enjoyfaster training. The structure of the PSNN avoids the problemof the combinatorial explosion of the higher order terms. ThePSNN is able to learn in a stable manner even with fairly largelearning rates [9]. In addition, the use of linear summing unitsmakes the convergence analysis of the learning rules for PSNNmore accurate and tractable.

Shin et al. [9] investigated the applicability of PSNN for shift,scale and rotation invariant pattern recognition. Results for func-tion approximation and classification were encouraging, whencompared to backpropagation networks for achieving similarperformance. Ghosh et al. [3] argued that the PSNN requires lessmemory, and at least two orders of magnitude less number ofcomputations, when compared to MLP for similar performancelevels, and over a broad class of problems. Fig. 1 shows a PSNN,whose output is determined according to (1) and (2)

(1)

(2)

Fig. 2. Ridge polynomial neural network.

where are the adjustable weights, the are the scalar in-puts, is the number of summing units, is the number ofinput nodes, and is a suitable nonlinear transfer function.PSNN demonstrated competent ability to solve scientific andengineering problems despite not being universal approxima-tors [3], [9].

B. Ridge Polynomial Neural Network

The RPNN was introduced by Shin et al. [5]. The network isconstructed by adding gradually more complex PSNN, denotedby in (3). RPNN can approximate any multivariate con-tinuous function defined on a compact set in multidimensionalinput space, with arbitrary degree of accuracy. Similarly to thePSNN, the RPNN has only a single layer of adaptive weights asshown in Fig. 2, and hence the network preserves all the advan-tages of PSNN.

Any multivariate polynomial can be represented in the formof a ridge polynomial and realized by the RPNN [5], whoseoutput is determined according to the following equations:

(3)

(4)

where is the inner product between the trainableweights matrix , and the input vector are the biasesof the summing units in the corresponding PSNN units, is thenumber of PSNN units used (or alternatively, the order of theRPNN), and denotes a suitable nonlinear transfer function,typically the sigmoid transfer function.

The RPNN provides a natural mechanism for incrementalnetwork growth, by which the number of free parameters isgradually increased with the addition of Pi-Sigma units of

992 IEEE TRANSACTIONS ON MAGNETICS, VOL. 47, NO. 5, MAY 2011

higher orders. The structure of the RPNN is highly regular, inthe sense that Pi-Sigma units are added incrementally until anappropriate order of the network or a predefined error levelcriterion is achieved.

Shin et al. [5] tested the RPNN in a surface fitting problem,the classification of high dimensional data, and the realizationof a multivariate polynomial function. Results showed that anRPNN trained with the constructive learning algorithm providedsmooth and steady learning and used much less computationsand memory, in terms of the number of units and weights thanMLP networks. In this work, the RPNN were trained with theincremental backpropagation learning algorithm [10].

III. LS-SVM FOR FUNCTION ESTIMATION

Given a training set with input data andoutput data , the LS-SVM model for nonlinear functionapproximation is represented in feature space as

(5)

Here the nonlinear function maps theinput space to a higher dimension feature space. is a bias termand w is the weight vector. The optimization problemconsists in minimizing

(6)

Subject to the equality constraints

(7)

where the fitting error is denoted by . The hyper-parametercontrols the trade-off between the smoothness of the function

and the accuracy of the fitting. The solution is obtained afterconstructing the Lagrangian of the optimization problem.

The resulting LS-SVM model for nonlinear function estima-tion becomes

(8)

where constitute the solution to the linear system andis the so-called kernel function. The

most usual kernel functions are linear, polynomial and aboveall radial basis function (RBF) [8]. In this paper RBF functionsare used

(9)

where is a constant defining the kernel width.There is no doubt that the efficient performance of the

LS-SVM model involves an optimal selection of the hyper-pa-rameters and (included in (6)). This can be done using anapproach based on the Bayesian inference [11].

IV. MEASUREMENT SETUP AND NUMERICAL METHOD

The characterization cell implemented in this work, calledSuperPol, consists in a junction between a coaxial waveguide

Fig. 3. SuperPol measuring cell.

and a circular guide which is filled in an inhomogeneous way.This means that the material under test is located in the conti-nuity of the inner conductor and that it is held up by a Tefloncrown (Fig. 3). The whole device is connected to an impedanceanalyzer.

This configuration may be used from low frequencies until afew gigahertz using the coaxial waveguide GR900 (inner diam-eter mm, outer diameter mm).

The modelling of this configuration is done by using the FEMin order to generate the data sets required by the LS-SVM andthe RPNN.

The problem is expressed in terms of the electric field Ewhich satisfies the following harmonic wave equation

(10)

where is the pulsation, and are thepermittivity and the permeability, respectively. is the permit-tivity of the free space. Second order 3-D tetrahedral edge finiteelements are used.

V. IMPLEMENTATION AND EVALUATION

In this part we summarize the steps of the RPNN andLS-SVM implementation (Fig. 4).

1) Generation of the samples of the data sets (training, val-idation and test sets) used for the model elaboration andpre-processing of them (centring and normalisation).

2) Training, validation (except for LS-SVM with Bayesianregularization which does not involve validation) and testof the RPNN and LS-SVM model.

3) Utilisation of the designed RPNN and LS-SVM for mi-crowave data inversion.

Two single-output RPNN or LS-SVM corresponding to thetwo estimated real quantities and are used. The inputs arethe values of the complex admittance (real part , imaginarypart ) and the measurement frequency .

According to the split-sample procedure [10] the data set isdivided into three different sets: training, validation and test set.The RPNN and LS-SVM were designed with 2000 learning ex-amples. Each example is constituted of input (complex admit-tance, frequency) and output pairs. Of the 2000 vectors,a random sample of 1000 samples (50 %) was used for training,500 (25 %) for validation and 500 (25 %) for test.

The order of RPNN is equal to 7 (starts from 1 till 7). Thetraining of each PSNN is stopped when the magnitude of theweights update is less than a given threshold. The network istrained using an incremental method with a Levenberg-Mar-quardt algorithm [10] to improve the convergence speed.

HACIB et al.: MICROWAVE CHARACTERIZATION USING RIDGE POLYNOMIAL NEURAL NETWORKS 993

Fig. 4. Microwave properties parameter extracting procedure.

TABLE ISUMMARY OF MSE, � AND CPU TIME

The quality of each model is assessed by the mean squarederror (MSE) and the linear correlation coefficient

(11)

(12)

where is the value of microwave properties parameter, isthe predicted value, is the number of test data, is the meanvalue of microwave properties parameter, and is the mean ofpredicted value.

VI. RESULTS

RPNN and LS-SVM approaches are compared in terms ofaccuracy and time cost (time required for the elaboration of eachmodel). is varying between 1 and 100 and between 0 and80 whereas the measurement frequency is from 1 MHz to 1.8GHz. The MSE, the correlation coefficient calculated on the testset and the CPU time are summarized in Table I.

From Table I, it can be observed that the LS-SVM outperformthe RPNN approach in term of time cost though leading to agood accuracy.

As a measure of effectiveness, the optimized RPNN andLS-SVM were tested on ethanol data obtained from exper-iments. RPNN and LS-SVM result are compared to resultsobtained from a time consuming iterative inversion method.This last method consists to use a direct model in an iterativeprocedure. Measurements have been carried out by using anAgilent 4291A impedance analyzer on an ethanol samplewhose dielectric characteristics are known. The thickness ofthe sample under test is 2.9 mm.

Fig. 5. Permittivity evolution obtained by LS-SVM, RPNN and iterative inver-sion procedure.

Fig. 5 shows the good agreement between the RPNN resultsand those obtained by LS-SVM model. Furthermore the resultsfit with those found in the scientific literature [12].

VII. CONCLUSION

In this paper, A method employing RPNN and LS-SVM havebeen proposed and studied for solving the inverse problem inorder to determine the complex permittivity of dielectric mate-rials. In the problem of microwave characterization of dielectricmaterials, it was observed that the LS-SVM approach outper-formed the RPNN technique due to their good accuracy, fasterspeed and superb generalization. Further works will concern theoptimization of the structure of the training data set.

REFERENCES

[1] H. Acikgoz, Y. Le Bihan, O. Meyer, and L. Pichon, “Neural networksfor broadband evaluation of complex permittivity using a coaxial dis-continuity,” J. Appl. Phys., vol. 39, no. 2, pp. 197–201, 2007.

[2] L. Leerink, C. Giles, B. Horne, and M. Jabri, Learning With ProductUnits. In: Advances in Neural Information Processing Systems 7.Cambridge, MA: MIT Press, 1995, pp. 537–544.

[3] Y. Shin and J. Ghosh, “The Pi-Sigma networks: An efficient high-erorder neural network for pattern classification and function approx-imation,” in Proc. Int. Joint Conf. Neural Netw., Jul. 1991, vol. 1, pp.13–18.

[4] J. Ghosh and Y. Shin, “Efficient higher-order neural networks for func-tion approximation and classification,” Int. J. Neural Syst., vol. 3, no.4, pp. 323–350, 1992.

[5] Y. Shin and J. Ghosh, “Ridge polynomial networks,” IEEE Trans.Neural Netw., vol. 6, no. 3, pp. 610–622, 1995.

[6] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.[7] L. Xia, R. Xu, and B. Yan, “Modeling of 3-D vertical interconnect using

support vector machine regression,” IEEE Microw. Wireless Compon.Lett., vol. 16, no. 12, pp. 639–641, 2006.

[8] J. A. K. Suykens, “Nonlinear modelling and support vector machines,”in IEEE IMTC Conf., Hungary, 2001, pp. 287–294.

[9] Y. Shin and J. Ghosh, “Computationally efficient invariant patternrecognition with higher order pi-sigma networks,” Univ. Texas atAustin, Tech. Rep., 1992.

[10] S. Haykin, Neural Networks, A Comprehensive Foundation. NewYork: Macmillan, 1994.

[11] T. Hacib, Y. Le Bihan, M. R. Mekideche, H. Acikgoz, O. Meyer, and L.Pichon, “Microwave characterization using least-square support vectormachines,” IEEE Trans. Magn., vol. 46, no. 8, pp. 2811–2814, Aug.2010.

[12] F. Buckley and A. A. Maryott, “Tables of Dielectric Dispersion Datafor Pure Liquids and Dilute Solutions,” U.S. Dept. Commerce, NationalBureau of Standards in Washington, 1958.