Review and Empirical Comparative Study
of Parameterless Genetic Algorithms
(cont'd)
EXPERIMENTAL RESULTS AND ANALYSIS
A. Basic Statistical Measures and
Results
In this section we explain the various
statistical measures (defined below), which are used to evaluate the
performance of the five test algorithms. The actual numerical results are
presented in five tables of data, one per test function- see Appendix A,
tables I-V.
Percentage of Runs to Optimal Fitness:
Each GA was run 30 times. This measure reflects the percentage of runs that
were successful in converging to the optimal solution at or before 500
thousand (fitness function) evaluations.
Ave. No. of Evaluations to Best Fitness
(and C.V.): This measure represents the average number of evaluations that
are required for a GA to achieve its best fitness value in a run. In cases
where the best fitness is 1, it serves as a measure of convergence velocity.
Every run produces a different number of evaluations to best fitness. C.V.
(Coefficient of Variation) is equal to the standard deviation of that set of
evaluations, divided by the average. It is a measure of reliability. Ave.
No. of Evaluations to Near-Optimal Fitness: Near-Optimal fitness is defined
as a fitness of 0.95. In cases where optimal fitness is not obtained,
near-optimal fitness is the next best measure of convergence velocity. This
measure is defined in the same way as the preceding measure, except that we
substitute near-optimal for optimal.
Average Best Fitness (and S.D.): This
is the average of the set of best fitness values achieved in all 30 GA runs.
S.D. is standard deviation of that set. Naturally, this is a crucial
measure; GAs that are able to achieve a best fitness of 1 (and reliably) are
taken seriously; those that return best fitnesses of less than 1 (or 1 but
inconsistently) are not as good. Ave. Mean Fitness (and S.D.): This is the
average of the set of average fitness values, achieved at the end of the 30
GA runs. S.D. is the standard deviation of that set.
Ave. Mean Population Size to Optimal
Fitness: In a given run, the size of the population may differ from one
generation to the next until (and after) the GA converges to the optimal
value (if ever). In one run, the average size of all the populations
preceding optimal convergence is called Average Population Size to Optimal
Fitness (or APSOF). Every one of the 30 runs may return a value for APSOF.
The average value for the set of APSOF values is the Ave. Mean Population
Size to Optimal Fitness. Ave. Max. Population Size to Optimal Fitness: For
each GA run, the largest population size prior to optimal convergence is
stored in a set. The mean of that set is the average maximum population size
to optimal fitness. Ave. Mean Population Size to Near-Optimal Fitness: In a
given run, the size of the population may differ from one generation to the
next until (and after) the GA converges to the near-optimal value of 0.95
(if ever). In one run, the average size of all the populations preceding
near-optimal convergence is called Average Population Size to Near-Optimal
Fitness (or APSNOF). Every one of the 30 runs may return a value for APSNOF.
The average value for the set of APSNOF values is the Ave. Mean Population
Size to Near-Optimal Fitness.
Ave. Max. Population Size to
Near-Optimal Fitness: For each GA run, the largest population size prior to
near-optimal convergence is stored in a set. The mean of that set is the
average maximum population size to near-optimal fitness. These measures
allow GA users to assess the memory requirements for a given GA. The smaller
the size of the population required to get an optimally fit individual the
better. This is because smaller populations require less memory. And, memory
is a serious concern, still, if one is using large populations for
real-world optimization and design problems.
B. Performance
Measures and Ranking Tables
In order to compare the performance of
any number of parameterless GAs - not just those used for this paper - one
requires a standardized set of metrics that a) are fair, i.e. algorithm- and
platform-independent, and b) meaningful, i.e. reflect those properties of a
GA that are relevant to the human user.
To ensure fairness, variables that may
discriminate in favor of a particular type of GA must not be used. For
example, number of generations is not an acceptable component of a formula
measuring speed of convergence. This is so, because some GAs may use big
populations and run them for a small number of generations, while other GAs
may run for somewhat longer periods (in terms of generations) but using much
smaller populations. On the other hand, time is not a good measure of speed,
since different computers can run at radically different speeds. Algorithmic
complexity is not either, because what actual users (mostly engineers) care
about is not some idealized measure of complexity but a portable measure of
real speed. We use number of fitness evaluations. This and the other two
measures are explained in detailed below.
As to being meaningful, the metrics we
propose are: reliability, speed and memory load. GAs are often accused of
being unreliable since they are non-deterministic and few things are proven
about their performance - this situation is changing but slowly. What we can
do, empirically, to remedy this situation, is attach a sound statistical
measure of reliable performance to GAs. As to speed: speed is important for
real-time near real-time and on-line applications. Speed is also important
for off-line applications that operate within industrial and/or commercial
environments. Finally, memory load determines if a computer with specific
physical (and virtual) memory resources is able to run a given GA or not. If
a given computer is slow then a GA will simply take longer to converge.
However, if a GA requires more memory than is available then the computer
will not be able to run the GA at all, unless additional memory is added.
Together, speed and memory load are the two core measures of any
computational algorithm. When defining reliability, speed and memory load,
we assume that each algorithm is run (at least) 30 times and that the
maximum number of fitness evaluations allowed during one run is 500, 000.
The actual ranking results are
presented in tables VI-X in Appendix B, and they come from the application
of the formulae for reliability, speed and memory load (defined below) to
the data of tables I-V in Appendix A.
Reliability.
It is primarily defined as the percentage of runs that were able to return
an optimally fit individual (within 500,000 evaluations). Of course, it is
possible for more than one GA to return 100% reliability. In order to
differentiate between these GAs, we use C.V. as a further means of
differentiation. The lower the C.V. is, for a given GA, the more reliable it
is, and visa versa. Once all GAs with 100% reliability are ranked, those GAs
with less than 100% reliability (and > 0%) are ranked simply according to
their reliability percentage. To mark the fact that their reliability is
less than 100%, an asterisk (*) is attached to their rankings. If a given GA
completely fails to return any optimally fit individuals in all the runs
(reliability = 0%), then it is given an NA (not applicable) designation, for
such a GAs is totally unreliable.
For example, assume 3 of 5
parameterless GAs (call them GA1, GA2 and GA5) return an optimally fit
individual 100% of the time, with GA3 and GA4 optimally converging only 0%
and 98% of the time, respectively. Further assume that the C.V.s for GA1,
GA2 and GA5 are: 10, 14 and 64, respectively. This will lead to a ranking of
1, 2, NA, 4* and 3 for GA1-GA5.
Speed.
Speed is primarily defined as the average number of fitness evaluations to
optimal convergence. First, GAs with the same primary reliability of 100%
are ranked according to the average number of evaluations executed up to
optimal convergence. Then, those GAs that have, on average, reached a
maximum fitness of 0.95 (or more, but less than 1) are ranked according to
the average number of evaluations executed before 0.95 fitness was reached;
to mark the fact that their convergence is near-optimal, an asterisk (*) is
attached to their rankings. Any GA that, on average, failed to reach even
the near-optimal fitness (of 0.95), is given an NA (not applicable)
assignment.
For example, assume five GAs: GA1-GA5.
GA1-GA4 all achieved positive reliability (>0%), but only GA1 and GA2
returned 100% reliability with the rest (GA3-GA4) achieving 85% and 33%
reliability. Assume the number of evaluations required for optimal
convergence in GA1 and GA2 were (on average) 42545 and 56010, respectively.
Further assume that the number of evaluations required for near-optimal
convergence in GA3 and GA4 were (on average) 65050 and 32545, respectively.
Finally, the average maximum fitness of GA5 never reached 0.95. This leads
to the following speed rankings for GA1-GA5: 1, 2, 4*, 3* and NA.
Memory Load.
Memory load is primarily defined as the average size of the largest
populations encountered during runs that resulted in optimal convergence
(call that ave-max); populations that were evolved after the moment of
optimal convergence are excluded. First, GAs with 100% reliability are
ranked according to their ave-max values; GAs requiring smaller populations
receive better rankings and visa versa. Then, those GAs that have, on
average, reached a maximum fitness of 0.95 (or more, but less than 1) are
ranked according to the average size of the largest populations encountered
during runs that resulted in near-optimal convergence; to mark the fact that
their convergence is near-optimal, an asterisk (*) is attached to their
rankings. Finally, any GA that, on average, failed to reach even the
near-optimal fitness (of 0.95), is given an NA (not applicable) assignment.
CONCLUSIONS
The most significant contributions of this study are: a) a
standardized means of measuring and comparing parameterless Genetic
Algorithms using three metrics; b) an extensive empirical evaluation and
ranking of five parameterless GAs; and c) a review and classification of all
types of parameterless GAs.
The new metrics we proposed are applicable to any
parameterless GA; they are also platform-independent. Having them would
facilitate the process of comparing many GAs without having to repeat other
people's work. They are also meaningful, in that they allow GA users to
choose those GAs that are most reliable, fastest, or require the least
amount of memory. The five GAs we evaluated displayed much variance in
performance, and as such we were able to make well-founded recommendations
(in the form of rankings) about which of them should be used, and when.
Finally, the short review at the start of paper provides a good first
reference for researchers that wish to familiarize themselves with the field
of parameterless GAs.
If we were to make recommendations for future work, then we
would urge researchers to develop more reliable parameterless GAs that work
for all kinds of fitness surfaces, and not just one application or
theoretical situation. Indeed, we have shown in this paper that there are
many GAs that need not be manually tuned before use; our greatest hope is
that researchers would make all fundamentally new GAs parameterless, even if
that required some extra effort on their (our) part.
REFERENCES
[1] 1. Arabas, J., Michalewicz, Z., & Mulawka, J., GAVaPS- A
genetic algorithm with varying population size, Proceeding of the 1st IEEE
Conference on Evolutionary Computation, IEEE Press, 1994.
[2] 2. Annunziato, M., & Pizzuti, S., Adaptive
parameterization of evolutionary algorithms driven by reproduction and
competition, Proceeding of ESIT2000, PP 246-256, Achen Germany.
[3] 3. Back, T., The interaction of mutation rate,
selection, and self-adaptation within a genetic algorithm, In R. Manner and
B. Manderick, editors, Parallel Problem solving from Nature, PP 85-94,
Elsevier Amsterdam, 1992.
[4] 4. Back, T., & Schutz M., Intelligent mutation rate
control in canonical genetic algorithm, Proceeding of the International
Symposium on Methodologies for Intelligent Systems, PP 158-167, 1996.
[5] 5. Back, T., Evolutionary Algorithms in theory and
practice, Oxford University Press, 1996.
[6] 6. Back, T., Eiben, A.E., & Van der Vaart, N.A., An
empirical study on GAs "without parameters", In Schenauer, M., Deb, K.,
Rudolph, G., Yao, X., Lutton, E.,Merelo, J. J., and Schwefel, H-P. (Ed):
Parallel Problem Solving from Nature PPSN V, Lecture Notes in Computer
Science Vol. 1917, PP 315-324, 2000.
[7] 7. Bryant, A., & Julstrom, What have you done for me
lately? Adapting operator probabilities in a steady-state genetic algorithm,
Proceeding of the Sixth International Conference on Genetic Algorithms, PP
81-7, Morgan Kufmann, 1995.
[8] 8. De Jong, K. A., An analysis of the behavior of a
class of genetic adaptive systems, Doctoral dissertation, University of
Michigan, Ann Arbor, University Microfilms No 76-9381, 1975.
[9] 9. Eiben, A.E., Hinterding, R., & Michalewicz, Z.,
Parameter control in evolutionary algorithms, IEEE Transactions on
Evolutionary Computation, Vol. 3, No. 2, PP 124-41, 1999.
[10] 10. Fogarty, T., & Terence, C., Varying the probability
of mutation in the genetic algorithm, Proceeding of the Third International
Conference on Genetic algorithms, PP 104-109, Morgan Kufmann, 1989.
[11] 11. Grefenstette, J. J., Optimization of control
parameters for genetic algorithms, In Sage, A. P. (Ed), IEEE Transactions on
Systems, Man, and Cybernetics, Volume SMC-16-1, pp 122-128, New York: IEEE,
1986.
[12] 12. Harik, G. R., & Lobo, F. G., A parameter-less
genetic algorithm, Banzhaf, W., Daida, J., Eiben, A. E., Garzon, M. H.,
Honavar, V., Jakiela, M., & Smith, R. E. (Eds.) .GECCO-99: Proceedings of
the Genetic and Evolutionary Computation Conference, PP 258-267, San
Francisco, CA: Morgan Kaufmann, 1999.
[13] 13. Hesser, J. & Manner, R., Towards an optimal
mutation probability in genetic algorithms, Proceeding of the 1st Parallel
Problem Solving from Nature, PP 23-32, Springer 1991.
[14] 14. Hinterding, R., Michalewicz, Z., & Peachy, T. C.,
Self-Adaptive genetic algorithm for numeric functions, Proceeding of the
Fourth International Conference on Parallel Problem Solving from Nature, PP
420-429, in Lecture Notes from Computer Science, Springer Verlag, 1996.
[15] 15. Muhlenbein, H., How genetic algorithms really work:
I. Mutation and Hill climbing, Parallel Problem Solving from Nature- PPSN
II, 15-2, 1992.
[16] 16. Rechenberg, I., Evolutions strategie: Optimierung
technischer systeme nach prinzipien der biologischen evolution, Frommann,
1973.
[17] 17. Schlierkamp-Voosen, D., & Muhlenbein, H. Adaptation
of population sizes by competing subpopulations, Proceeding of International
Conference on Evolutionary Computation (ICEC'96), Negoya, Japan, PP 330-335,
1996.
[18] 18. Schwefel, H-P., Numerische
optimierung von computermodellen mittels der evolutionsstrategie, Volume 26
of Interdisciplinary systems research. Birkhauser, Basel, 1997.
[19] 19. Srinivas, M., & Patniak, L.
M., Adaptive Probabilities of crossover and mutation in genetic algorithms,
IEEE Transactions on Systems, Man and Cybernetics, Vol. 24, No. 4, PP
17-26,1994.
Page 1, Page 2,
Page 3
|
|