Concordia University's DEC LAB: Research at the DEC LAB

Concordia DEC Lab about us . courses . students . . publications . events . people . contact us

Review and Empirical Comparative Study

of Parameterless Genetic Algorithms (cont'd)

EXPERIMENTAL RESULTS AND ANALYSIS

A. Basic Statistical Measures and Results

In this section we explain the various statistical measures (defined below), which are used to evaluate the performance of the five test algorithms. The actual numerical results are presented in five tables of data, one per test function- see Appendix A, tables I-V.

Percentage of Runs to Optimal Fitness: Each GA was run 30 times. This measure reflects the percentage of runs that were successful in converging to the optimal solution at or before 500 thousand (fitness function) evaluations.

Ave. No. of Evaluations to Best Fitness (and C.V.): This measure represents the average number of evaluations that are required for a GA to achieve its best fitness value in a run. In cases where the best fitness is 1, it serves as a measure of convergence velocity. Every run produces a different number of evaluations to best fitness. C.V. (Coefficient of Variation) is equal to the standard deviation of that set of evaluations, divided by the average. It is a measure of reliability. Ave. No. of Evaluations to Near-Optimal Fitness: Near-Optimal fitness is defined as a fitness of 0.95. In cases where optimal fitness is not obtained, near-optimal fitness is the next best measure of convergence velocity. This measure is defined in the same way as the preceding measure, except that we substitute near-optimal for optimal.

Average Best Fitness (and S.D.): This is the average of the set of best fitness values achieved in all 30 GA runs. S.D. is standard deviation of that set. Naturally, this is a crucial measure; GAs that are able to achieve a best fitness of 1 (and reliably) are taken seriously; those that return best fitnesses of less than 1 (or 1 but inconsistently) are not as good. Ave. Mean Fitness (and S.D.): This is the average of the set of average fitness values, achieved at the end of the 30 GA runs. S.D. is the standard deviation of that set.

Ave. Mean Population Size to Optimal Fitness: In a given run, the size of the population may differ from one generation to the next until (and after) the GA converges to the optimal value (if ever). In one run, the average size of all the populations preceding optimal convergence is called Average Population Size to Optimal Fitness (or APSOF). Every one of the 30 runs may return a value for APSOF. The average value for the set of APSOF values is the Ave. Mean Population Size to Optimal Fitness. Ave. Max. Population Size to Optimal Fitness: For each GA run, the largest population size prior to optimal convergence is stored in a set. The mean of that set is the average maximum population size to optimal fitness. Ave. Mean Population Size to Near-Optimal Fitness: In a given run, the size of the population may differ from one generation to the next until (and after) the GA converges to the near-optimal value of 0.95 (if ever). In one run, the average size of all the populations preceding near-optimal convergence is called Average Population Size to Near-Optimal Fitness (or APSNOF). Every one of the 30 runs may return a value for APSNOF. The average value for the set of APSNOF values is the Ave. Mean Population Size to Near-Optimal Fitness.

Ave. Max. Population Size to Near-Optimal Fitness: For each GA run, the largest population size prior to near-optimal convergence is stored in a set. The mean of that set is the average maximum population size to near-optimal fitness. These measures allow GA users to assess the memory requirements for a given GA. The smaller the size of the population required to get an optimally fit individual the better. This is because smaller populations require less memory. And, memory is a serious concern, still, if one is using large populations for real-world optimization and design problems.

B. Performance Measures and Ranking Tables

In order to compare the performance of any number of parameterless GAs - not just those used for this paper - one requires a standardized set of metrics that a) are fair, i.e. algorithm- and platform-independent, and b) meaningful, i.e. reflect those properties of a GA that are relevant to the human user.

To ensure fairness, variables that may discriminate in favor of a particular type of GA must not be used. For example, number of generations is not an acceptable component of a formula measuring speed of convergence. This is so, because some GAs may use big populations and run them for a small number of generations, while other GAs may run for somewhat longer periods (in terms of generations) but using much smaller populations. On the other hand, time is not a good measure of speed, since different computers can run at radically different speeds. Algorithmic complexity is not either, because what actual users (mostly engineers) care about is not some idealized measure of complexity but a portable measure of real speed. We use number of fitness evaluations. This and the other two measures are explained in detailed below.

As to being meaningful, the metrics we propose are: reliability, speed and memory load. GAs are often accused of being unreliable since they are non-deterministic and few things are proven about their performance - this situation is changing but slowly. What we can do, empirically, to remedy this situation, is attach a sound statistical measure of reliable performance to GAs. As to speed: speed is important for real-time near real-time and on-line applications. Speed is also important for off-line applications that operate within industrial and/or commercial environments. Finally, memory load determines if a computer with specific physical (and virtual) memory resources is able to run a given GA or not. If a given computer is slow then a GA will simply take longer to converge. However, if a GA requires more memory than is available then the computer will not be able to run the GA at all, unless additional memory is added. Together, speed and memory load are the two core measures of any computational algorithm. When defining reliability, speed and memory load, we assume that each algorithm is run (at least) 30 times and that the maximum number of fitness evaluations allowed during one run is 500, 000.

The actual ranking results are presented in tables VI-X in Appendix B, and they come from the application of the formulae for reliability, speed and memory load (defined below) to the data of tables I-V in Appendix A.

Reliability. It is primarily defined as the percentage of runs that were able to return an optimally fit individual (within 500,000 evaluations). Of course, it is possible for more than one GA to return 100% reliability. In order to differentiate between these GAs, we use C.V. as a further means of differentiation. The lower the C.V. is, for a given GA, the more reliable it is, and visa versa. Once all GAs with 100% reliability are ranked, those GAs with less than 100% reliability (and > 0%) are ranked simply according to their reliability percentage. To mark the fact that their reliability is less than 100%, an asterisk (*) is attached to their rankings. If a given GA completely fails to return any optimally fit individuals in all the runs (reliability = 0%), then it is given an NA (not applicable) designation, for such a GAs is totally unreliable.

For example, assume 3 of 5 parameterless GAs (call them GA1, GA2 and GA5) return an optimally fit individual 100% of the time, with GA3 and GA4 optimally converging only 0% and 98% of the time, respectively. Further assume that the C.V.s for GA1, GA2 and GA5 are: 10, 14 and 64, respectively. This will lead to a ranking of 1, 2, NA, 4* and 3 for GA1-GA5.

Speed. Speed is primarily defined as the average number of fitness evaluations to optimal convergence. First, GAs with the same primary reliability of 100% are ranked according to the average number of evaluations executed up to optimal convergence. Then, those GAs that have, on average, reached a maximum fitness of 0.95 (or more, but less than 1) are ranked according to the average number of evaluations executed before 0.95 fitness was reached; to mark the fact that their convergence is near-optimal, an asterisk (*) is attached to their rankings. Any GA that, on average, failed to reach even the near-optimal fitness (of 0.95), is given an NA (not applicable) assignment.

For example, assume five GAs: GA1-GA5. GA1-GA4 all achieved positive reliability (>0%), but only GA1 and GA2 returned 100% reliability with the rest (GA3-GA4) achieving 85% and 33% reliability. Assume the number of evaluations required for optimal convergence in GA1 and GA2 were (on average) 42545 and 56010, respectively. Further assume that the number of evaluations required for near-optimal convergence in GA3 and GA4 were (on average) 65050 and 32545, respectively. Finally, the average maximum fitness of GA5 never reached 0.95. This leads to the following speed rankings for GA1-GA5: 1, 2, 4*, 3* and NA.

Memory Load. Memory load is primarily defined as the average size of the largest populations encountered during runs that resulted in optimal convergence (call that ave-max); populations that were evolved after the moment of optimal convergence are excluded. First, GAs with 100% reliability are ranked according to their ave-max values; GAs requiring smaller populations receive better rankings and visa versa. Then, those GAs that have, on average, reached a maximum fitness of 0.95 (or more, but less than 1) are ranked according to the average size of the largest populations encountered during runs that resulted in near-optimal convergence; to mark the fact that their convergence is near-optimal, an asterisk (*) is attached to their rankings. Finally, any GA that, on average, failed to reach even the near-optimal fitness (of 0.95), is given an NA (not applicable) assignment.

CONCLUSIONS

The most significant contributions of this study are: a) a standardized means of measuring and comparing parameterless Genetic Algorithms using three metrics; b) an extensive empirical evaluation and ranking of five parameterless GAs; and c) a review and classification of all types of parameterless GAs.

The new metrics we proposed are applicable to any parameterless GA; they are also platform-independent. Having them would facilitate the process of comparing many GAs without having to repeat other people's work. They are also meaningful, in that they allow GA users to choose those GAs that are most reliable, fastest, or require the least amount of memory. The five GAs we evaluated displayed much variance in performance, and as such we were able to make well-founded recommendations (in the form of rankings) about which of them should be used, and when. Finally, the short review at the start of paper provides a good first reference for researchers that wish to familiarize themselves with the field of parameterless GAs.

If we were to make recommendations for future work, then we would urge researchers to develop more reliable parameterless GAs that work for all kinds of fitness surfaces, and not just one application or theoretical situation. Indeed, we have shown in this paper that there are many GAs that need not be manually tuned before use; our greatest hope is that researchers would make all fundamentally new GAs parameterless, even if that required some extra effort on their (our) part.

REFERENCES

[1] 1. Arabas, J., Michalewicz, Z., & Mulawka, J., GAVaPS- A genetic algorithm with varying population size, Proceeding of the 1st IEEE Conference on Evolutionary Computation, IEEE Press, 1994.

[2] 2. Annunziato, M., & Pizzuti, S., Adaptive parameterization of evolutionary algorithms driven by reproduction and competition, Proceeding of ESIT2000, PP 246-256, Achen Germany.

[3] 3. Back, T., The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm, In R. Manner and B. Manderick, editors, Parallel Problem solving from Nature, PP 85-94, Elsevier Amsterdam, 1992.

[4] 4. Back, T., & Schutz M., Intelligent mutation rate control in canonical genetic algorithm, Proceeding of the International Symposium on Methodologies for Intelligent Systems, PP 158-167, 1996.

[5] 5. Back, T., Evolutionary Algorithms in theory and practice, Oxford University Press, 1996.

[6] 6. Back, T., Eiben, A.E., & Van der Vaart, N.A., An empirical study on GAs "without parameters", In Schenauer, M., Deb, K., Rudolph, G., Yao, X., Lutton, E.,Merelo, J. J., and Schwefel, H-P. (Ed): Parallel Problem Solving from Nature PPSN V, Lecture Notes in Computer Science Vol. 1917, PP 315-324, 2000.

[7] 7. Bryant, A., & Julstrom, What have you done for me lately? Adapting operator probabilities in a steady-state genetic algorithm, Proceeding of the Sixth International Conference on Genetic Algorithms, PP 81-7, Morgan Kufmann, 1995.

[8] 8. De Jong, K. A., An analysis of the behavior of a class of genetic adaptive systems, Doctoral dissertation, University of Michigan, Ann Arbor, University Microfilms No 76-9381, 1975.

[9] 9. Eiben, A.E., Hinterding, R., & Michalewicz, Z., Parameter control in evolutionary algorithms, IEEE Transactions on Evolutionary Computation, Vol. 3, No. 2, PP 124-41, 1999.

[10] 10. Fogarty, T., & Terence, C., Varying the probability of mutation in the genetic algorithm, Proceeding of the Third International Conference on Genetic algorithms, PP 104-109, Morgan Kufmann, 1989.

[11] 11. Grefenstette, J. J., Optimization of control parameters for genetic algorithms, In Sage, A. P. (Ed), IEEE Transactions on Systems, Man, and Cybernetics, Volume SMC-16-1, pp 122-128, New York: IEEE, 1986.

[12] 12. Harik, G. R., & Lobo, F. G., A parameter-less genetic algorithm, Banzhaf, W., Daida, J., Eiben, A. E., Garzon, M. H., Honavar, V., Jakiela, M., & Smith, R. E. (Eds.) .GECCO-99: Proceedings of the Genetic and Evolutionary Computation Conference, PP 258-267, San Francisco, CA: Morgan Kaufmann, 1999.

[13] 13. Hesser, J. & Manner, R., Towards an optimal mutation probability in genetic algorithms, Proceeding of the 1st Parallel Problem Solving from Nature, PP 23-32, Springer 1991.

[14] 14. Hinterding, R., Michalewicz, Z., & Peachy, T. C., Self-Adaptive genetic algorithm for numeric functions, Proceeding of the Fourth International Conference on Parallel Problem Solving from Nature, PP 420-429, in Lecture Notes from Computer Science, Springer Verlag, 1996.

[15] 15. Muhlenbein, H., How genetic algorithms really work: I. Mutation and Hill climbing, Parallel Problem Solving from Nature- PPSN II, 15-2, 1992.

[16] 16. Rechenberg, I., Evolutions strategie: Optimierung technischer systeme nach prinzipien der biologischen evolution, Frommann, 1973.

[17] 17. Schlierkamp-Voosen, D., & Muhlenbein, H. Adaptation of population sizes by competing subpopulations, Proceeding of International Conference on Evolutionary Computation (ICEC'96), Negoya, Japan, PP 330-335, 1996.

[18] 18. Schwefel, H-P., Numerische optimierung von computermodellen mittels der evolutionsstrategie, Volume 26 of Interdisciplinary systems research. Birkhauser, Basel, 1997.

[19] 19. Srinivas, M., & Patniak, L. M., Adaptive Probabilities of crossover and mutation in genetic algorithms, IEEE Transactions on Systems, Man and Cybernetics, Vol. 24, No. 4, PP 17-26,1994.

Page 1, Page 2, Page 3