Concordia University's DEC LAB: Research at the DEC LAB

Concordia DEC Lab

about us . courses . students .

. publications . events . people . contact us

Using Genetic Algorithms for Feature Selection and Weighting in Character Recognition System (cont'd)

5 Conclusions

5.1 Feature Selection vs. Feature Weighting

Feature selection is clearly superior to feature weighting in terms of feature reduction. The main reason for this superiority appears to be the smaller number of weight levels that feature selection uses (2), compared to feature weighting (potentially infinite). However, it is possible to make feature weighting as effective as feature selection in eliminating features, via for example, forcing all weights less than a given threshold to nil.

In the presence of irrelevant features, feature selection and not feature weighting is the technique most suited to feature reduction. Furthermore, it is necessary to use some method of feature selection before a 1-NN or similar nearest-neighbour classifier is applied, because the performance of such classifiers degrades rapidly in the presence of irrelevant features. Since it has been shown that GA-based feature selection is effective in eliminating irrelevant features (here and in the literature), it is reasonable to try a GA (at least) as a method of feature selection before classification is attempted using nearest-neighbour classifiers.

The classification accuracy of a 1-NN classifier does not suffer greatly as a result of having a significant number of redundant features in the (original) feature set. However, due to the other problems associated with redundant features increased dimensionality (and hence computational cost), and the arbitrariness of procedure, as well as slightly worse classification accuracies, it is recommended that a GA-based feature selection method be used to eliminate as many of the redundant features as possible. It is clear that the most suited feature selection/weighting methods for such a task are FS and 3FW. Which one is used should depend on the need for real-valued weights vs. computational efficiency.

Despite the fact that feature weighting has the best training classification accuracies, feature selection is better in generalization, and hence more suited to real-world applications (in which most data is new). This is because FW overfits the training data, loosing generalizability in the process. Therefore, it is advisable to use 2 (FS) or 3 (3FW) weight levels at most, as (Kohavi et al., 1997) recommend.

5.2 Performance of GA-based Feature Selection

GA-based feature selection is a reliable method of locating optimal or near-optimal feature sub-sets. These techniques also save time, relative to exhaustive searches. However, their effective use in large features spaces is dependant upon the invention of specialized feature selection/weighting methods (e.g., DveGa) or/and the availability of parallel processors.

6 Future Research & Applications

6.1 Research Challenges

We present below a list of research problems, carefully justified, that we believe future research in GAs should address. It is our belief that solutions to such problems will not only help with the automation of feature selection/weighting in pattern recognition applications, but will advance GA research in general.

·        Research Problem 1. The cost of evaluating fitness for one whole generation is roughly proportional to the number of individuals in a generation. Hence, the complexity of the fitness function is a critical determinant of the complexity of any GA-based system. The fitness function we will use have at least two core components: accuracy of recognition and complexity of the recognition system. The second component of the fitness function is easy to calculate. Accuracy, however, is more important than complexity. Nevertheless, a precise estimate of accuracy is important and can be achieved only by utilizing (at some point) a large set of actual pattern samples carefully selected from a database of real-world symbols. The goal is to retain authenticity without exploding complexity-a typical engineering tradeoff. A common technique is to measure all the features that may be selected by the GA-FS before the start of the run, then store the measurements in memory or in a file (Gaborsky & Anderson, 1993).

·        Research Problem 2. The other problem is the dimensionality of the search space. Feature selection, and therefore feature weighting, is NP-complete. Hence, although feature selection has shown very promising results, practical applications are limited by the dimensionality of the solution search space. Moser (Moser & Murty, 2000) examined the scalability of Distributed Vertical Genetic Algorithms (DVeGA) to very large-scale feature selection applications with more than 500 features. His application succeeded in reducing the dimensionality while simultaneously maintaining high accuracy. Crucially, Moser's "experiments showed that GAs scale well to domains of large complexity in feature selection" (Moser & Murty, 2000). It is our intention to try and use the idea of DVeGA in the way we implement our own GA optimizer to make highly-dimensional feature spaces feasible.

·        Research Problem 3. There is a problem common to all GA applications: how do we minimize the number of parameters (e.g., size of population, probability of crossover, etc.) that must be set in order to make GAs truly autonomous optimization and design tools (that require minimal human intervention)? We intend to use our experimental platform to investigate possible solutions for this problem. This includes a) keeping the type of GA as simple as possible; and b) allowing the genome to describe as much of the functionality of an individual as possible, including some/all of the GA parameters. However, b) is a 'solution' with problems of its own, including increasing computational cost.

·        Research Problem 4. Whereas the reason for tackling research problems 1-3 above is to build an efficient and effective GA-based optimization system that is as autonomous as possible, the next step in our research is to understand how to generalize the lessons gained from the successful application of the GA optimizer (to a particular symbol set) to new and different symbols sets. The first symbol set we intend to use is hand-written English characters; next we intend to use mathematical notation (see (Ahmed, Ward, & Kharma, 2001)), and, finally, any (pre-segmented) black & white or gray-scale 2D line drawing. Indeed, the real power of the GA optimization approach we are proposing will not be fully realized until the experimental bench starts working successfully with different symbol sets. Furthermore, it should do so without recourse to extensive and lengthy trial and error tuning.

·        Research Problem 5. We have shown that FS, in general, is superior to FW in terms of the number of eliminated features, as well as accuracy of recognition, especially in cases where irrelevant/redundant features are present in the original feature set. However, several combinations of FS and FW could perform better than FS or FW alone. For example, (Raymer, Punch, Goodman, Huhn & Jain, 2000) have applied simultaneous feature weighting and selection via a masking technique. The obtained better results on validation samples than with FW alone. It is likely that applying FS simultaneously with FW finds nonlinear interactions between features that are impossible to uncover with FS or FW alone. Hence, researchers may wish to investigate further that approach by comparing the application of FS and FW separately and sequentially, with the application of FS simultaneously with FW. It is important that in this study the methods used for FS and FW be the same in both cases (sequential and simultaneous). (Raymer et al., 2000) is another good example of the simultaneous FS and FW approach to feature set optimization.

6.2 Potential Applications

The final practical objective of our research is to build a prototype of a GA-configurable generic symbol recognition system, which is capable of achieving a commercial-grade recognition rate (>98%) for (at least) two different finite 2D symbol sets, such as the English alphabet and a set of mathematical symbols. The potential industrial applications are significant. It would, for example, allow an IT company eyeing a new, expanding market of palm-top computer users in Russia, to relatively quickly build and configure a character recognition software product for them, with minimal help from pattern recognition experts.

References

Aha, D. W. (1992). Tolerating noisy, irrelevant, and novel attributes in instance-based Learning algorithms. International Journal of Man-Machine Studies, 36, 267-287.

Aha, D. W., & Bankert, R. L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In D. W. Aha (Ed.) Case-Based Reasoning: Papers from the 1994 Workshop (Technical Report WS-94-01). Menlo Park, CA: AAAI Press. (NCARAI TR: AIC-94-011)

Ahmed, M., Ward, R., and Kharma, N. (2001). A Knowledge-base System for Recognizing and Solving Mathematical Problems, in the 1st IEEE International Symposium on Signal Processing and Information Technology, (to be held in) Cairo, Egypt, December 28-30, 2001.

Brill, F., Brown, D., & Martin, W. (1992). Fast genetic selection of features for neural network classifiers. IEEE Transactions on Neural Networks, 3(2), 324-328.

Dasarathy, Belur V. (1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, Los Alamitos, CA: IEEE Computer Society Press.

Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1, 131-156.

Demiroz, G., & Guvenir, H. A. (1996). Genetic algorithms to learn feature weights for the nearest neighbor algorithm. InProceedings of the 6 th Belgian--Dutch Conference on Machine Learning (BENELEARN--96), 117-126.

Estevez, P.A., & Fernandez, M. (1999). Selection of features for the classification of wood board defects. In Proceedings of the Ninth International Conference on Artificial Neural Networks, 1, 347-352.

Fung, G., Liu, J., & Lau, R. (1996). Feature selection in automatic signature verification based on genetic algorithms. In Proceedings of International Conference on Neural Information, pages 811-815.

Gaborski, R. S., & Anderson, P. G. (1993). Genetic algorithm selection of features for handwritten character identification. Inthe International Conference on Artificial Neural Networks & Genetic Algorithms ANNGA 93, Innsbruck, Austria, April 1993.

Geman, S., Bienenstock, E., & Doursat, R. (1995). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1-58.

Goldberg DE. (1989). Genetic algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA.

Handels, H., Ross, T., Kreusch, J., Wolff, H. H., & Poppl, S.J. (1999). Feature selection for optimized skin tumor recognition using genetic algorithms. Artificial Intelligence in Medicine, 16(3), 283-297.

Howe, N., & Cardie, C. (1997). Examining locally varying weights for nearest neighbor algorithms. Case-Based Reasoning Research and Development: Second International Conference on Case-Based Reasoning, D. Leake and E. Plaza, eds., Lecture Notes in Aritificial Intelligence, Springer, 455-466, 1997.

Hussein, F., Kharma, N., and Ward, R. (2001). Genetic Algorithms for Feature Selection and Weighting, in proceedings of the Sixth International Conference on Document Analysis and Recognition ICDAR'01, Seattle, Washington, September 10-13, 2001.

Jain, Anil K., Duin, Robert P.W., & Mao, Jianchang (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4-37.

Kelly, J., & Davis, L. (1991). A hybrid genetic algorithm for classification. In Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pages 645-650.

Kharma, N., and Ward, R. (1999). Character Recognition Systems for the Non-expert, in IEEE Canadian Review, 33, Autumn, 1999.

Kim, G., & Kim, S. (2000). Feature selection using genetic algorithms for handwritten character recognition. In Proceedings of the Seventh International Workshop on Froniers in Handwriting Recognition, pages 103-112.

Kohavi, R., & Sommerfield, D. (1995). Feature subset selection using the wrapper method: overfitting and dynamic search space topology. Proceedings of the First International Conference on Knowledge Discovery and Data Mining, KDD'95, Montreal, Canada, pp. 192-197.

Kohavi, R., & John, G. (1997) Wrappers for feature subset selection. In Artificial Intelligence journal, special issue on relevance, Vol. 97, No. 1-2 (pp. 273-324).

Kohavi, R., Langley, P., & Yun, Y. (1997). The utility of feature weighting in nearest-neighbor algorithms, European Conference on Machine Learning, ECML'97.

Komosinski, M., & Krawiec, K. (2000). Evolutionary weighting of image features for diagnoses of CNS tumors. Artificial Intelligence in Medicine. 19, 25-38.

Kudo, M., & Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1), 25-41.

Martin-Bautista, M.J., & Vila, M.A. (1998). Applying genetic algorithms to the feature selection problem in information retrieval. In Lecture Notes On Artificial Intelligence (LNAI), 1495. Springer-Verlag.

Matthew, W. (1999). Galib (2.4.4) Massachusetts Institute of Technology MIT. http://lancet.mit.edu/ga/

Michalewicz, Z., & Fogel, D. B. (1998). How to solve it. Springer-Verlag. Pp.117-124.

Moser, A. (1999). A distributed vertical genetic algorithm for feature selection. In Proceedings of the Fifth International Conference on Document Analysis and Recognition, Open Research Forum.

Moser, A., & Murty, M. (2000). On the scalability of genetic algorithms to very large-scale feature selection. Real World Applications of Evolutionary Computing: Proceedings, 1803: 77-86.

Murphy, P., & Aha, D. (1994). Repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA. http://www.ics.uci.edu/~mlearn/MLRepository

Nicholas Howe & Claire Cardie (1997). Examining locally varying weights for nearest neighbor algorithms. Case-Based Reasoning Research and Development: Second International Conference on Case-Based Reasoning, D. Leake and E. Plaza, eds., Lecture Notes in Artificial Intelligence, Springer, 455-466, 1997.

Punch, W., Goodman, E., Pei, M., Chia-Shun, L., Hovland, P., & Enbody, R. (1993). Further research on feature selection and classification using genetic algorithms. In Proceedings of the Fifth International Conference on Genetic Algorithms, pages 379-383.

Raymer, M., Punch, W., Goodman, E., Kuhn, L., & Jain, A. (2000). Dimensionality reduction using genetic algorithms. IEEE Transcations on Evolutionary Computation, 4(2), 164-171.

Shi, D., Shu, W., & Liu, H. (1998). Feature selection for handwritten Chinese character recognition based on genetic algorithms. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 5, 4201-4206.

Siedlecki, W., & Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition, 2, 197-220.

Siedlecki, W., & Sklansky, J. (1989). A note on genetic algorithms for large-scale feature selection. IEEE Transactions on Computers, 10, 335-347.

Smith, J.E., Fogarty, T.C., & Johnson, I.R. (1994). Genetic selection of features for clustering and classification. In IEE Colloquium on Genetic Algorithms in Image Processing and Vision.

Stuart Geman, Eli Bienenstock & Renee Doursat (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1-58.

Vafaie, H., & De Jong, K. (1993). Robust feature selection algorithms. In Proceedings of the IEEE International Conference on Tools with Artificial Intelligence, pages 356-366.

Wettschereck, D., Aha, D. W., & Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11, 273-314.

Wilson, D. Randall & Martinez, R. Tony (1996). Instance-based learning with genetically derived attribute weights, Proceedings of the International Conference on Artificial Intelligence, Expert Systems and Neural Networks (AIE'96), pp. 11-14.

Page 1, Page 2, Page 3, Page 4