5
Conclusions
5.1 Feature
Selection vs. Feature Weighting
Feature
selection is clearly superior to feature weighting in terms of
feature reduction. The main reason for this superiority appears to
be the smaller number of weight levels that feature selection uses
(2), compared to feature weighting (potentially infinite). However,
it is possible to make feature weighting as effective as feature
selection in eliminating features, via for example, forcing all
weights less than a given threshold to nil.
In the
presence of irrelevant features, feature selection and not feature weighting is the technique most suited to
feature reduction.
Furthermore, it is necessary to use some method of feature
selection before a 1-NN or similar nearest-neighbour classifier is
applied, because the performance of such classifiers degrades rapidly in the presence of irrelevant features. Since it has
been shown that GA-based feature selection is effective in
eliminating irrelevant features (here and in the literature), it is
reasonable to try a GA (at least) as a method of feature selection
before classification is attempted using nearest-neighbour
classifiers.
The
classification accuracy of a 1-NN classifier does not suffer greatly
as a result of having a significant number of redundant features in
the (original) feature set. However, due to the other problems
associated with redundant features increased dimensionality (and
hence computational cost), and the arbitrariness of procedure, as
well as slightly worse classification accuracies, it is recommended
that a GA-based feature selection method be used to eliminate as
many of the redundant features as possible. It is clear that the
most suited feature selection/weighting methods for such a task are
FS and 3FW. Which one is used should depend on the need for
real-valued weights vs. computational efficiency.
Despite the
fact that feature weighting has the best training classification
accuracies, feature selection is better in generalization, and hence
more suited to real-world applications (in which most data is new).
This is because FW overfits the training data, loosing
generalizability in the process. Therefore, it is advisable to use 2
(FS) or 3 (3FW) weight levels at most, as (Kohavi et al., 1997)
recommend.
5.2
Performance of GA-based Feature Selection
GA-based
feature selection is a reliable method of locating optimal or
near-optimal feature sub-sets. These techniques also save time,
relative to exhaustive searches. However, their effective use in
large features spaces is dependant upon the invention of specialized
feature selection/weighting methods (e.g., DveGa) or/and the
availability of parallel processors.
6
Future Research & Applications
6.1 Research
Challenges
We present
below a list of research problems, carefully justified, that we
believe future research in GAs should address. It is our belief that
solutions to such problems will not only help with the automation of
feature selection/weighting in pattern recognition applications, but
will advance GA research in general.
·
Research
Problem 1.
The cost of evaluating fitness for one whole generation is roughly
proportional to the number of individuals in a generation. Hence,
the complexity of the fitness function is a critical determinant of
the complexity of any GA-based system. The fitness function we will
use have at least two core components: accuracy of recognition and
complexity of the recognition system. The second component of the
fitness function is easy to calculate. Accuracy, however, is more
important than complexity. Nevertheless, a precise estimate of
accuracy is important and can be achieved only by utilizing (at some
point) a large set of actual pattern samples carefully selected from
a database of real-world symbols. The goal is to retain authenticity
without exploding complexity-a typical engineering tradeoff. A
common technique is to measure all the features that may be selected
by the GA-FS before the start of the run, then store the
measurements in memory or in a file (Gaborsky & Anderson, 1993).
·
Research
Problem 2.
The other problem is the dimensionality of the search space. Feature
selection, and therefore feature weighting, is NP-complete. Hence,
although feature selection has shown very promising results,
practical applications are limited by the dimensionality of the
solution search space. Moser (Moser & Murty, 2000) examined the
scalability of Distributed Vertical Genetic Algorithms (DVeGA) to
very large-scale feature selection applications with more than 500
features. His application succeeded in reducing the dimensionality
while simultaneously maintaining high accuracy. Crucially, Moser's
"experiments showed that GAs scale well to domains of large
complexity in feature selection" (Moser & Murty, 2000). It is our
intention to try and use the idea of DVeGA in the way we implement
our own GA optimizer to make highly-dimensional feature spaces
feasible.
·
Research
Problem 3.
There is a problem common to all GA applications: how do we minimize
the number of parameters (e.g., size of population, probability of
crossover, etc.) that must be set in order to make GAs truly
autonomous optimization and design tools (that require minimal human
intervention)? We intend to use our experimental platform to
investigate possible solutions for this problem. This includes a)
keeping the type of GA as simple as possible; and b) allowing the
genome to describe as much of the functionality of an individual as
possible, including some/all of the GA parameters. However, b) is a
'solution' with problems of its own, including increasing
computational cost.
·
Research
Problem 4.
Whereas the reason for tackling research problems 1-3 above is to
build an efficient and effective GA-based optimization system that
is as autonomous as possible, the next step in our research is to
understand how to generalize the lessons gained from the successful
application of the GA optimizer (to a particular symbol set) to new
and different symbols sets. The first symbol set we intend to use is
hand-written English characters; next we intend to use mathematical
notation (see (Ahmed, Ward, & Kharma, 2001)), and, finally, any
(pre-segmented) black & white or gray-scale 2D line drawing. Indeed,
the real power of the GA optimization approach we are proposing will
not be fully realized until the experimental bench starts working
successfully with different symbol sets. Furthermore, it should do
so without recourse to extensive and lengthy trial and error tuning.
·
Research
Problem 5.
We have shown
that FS, in general, is superior to FW in terms of the number of
eliminated features, as well as accuracy of recognition, especially
in cases where irrelevant/redundant features are present in the
original feature set. However, several combinations of FS and FW
could perform better than FS or FW alone. For example, (Raymer,
Punch, Goodman, Huhn & Jain, 2000) have applied simultaneous feature
weighting and selection via a masking technique. The obtained better
results on validation samples than with FW alone. It is likely that
applying FS simultaneously with FW finds nonlinear interactions
between features that are impossible to uncover with FS or FW alone.
Hence, researchers may wish to investigate further that approach by
comparing the application of FS and FW separately and sequentially,
with the application of FS simultaneously with FW. It is important
that in this study the methods used for FS and FW be the same in
both cases (sequential and simultaneous). (Raymer et al., 2000) is
another good example of the simultaneous FS and FW approach to
feature set optimization.
6.2 Potential
Applications
The final
practical objective of our research is to build a prototype of a
GA-configurable generic symbol recognition system, which is capable
of achieving a commercial-grade recognition rate (>98%) for (at
least) two different finite 2D symbol sets, such as the English
alphabet and a set of mathematical symbols. The potential industrial
applications are significant. It would, for example, allow an IT
company eyeing a new, expanding market of palm-top computer users in
Russia, to relatively quickly build and configure a character
recognition software product for them, with minimal help from
pattern recognition experts.
References
Aha, D. W.
(1992). Tolerating noisy, irrelevant, and novel attributes in
instance-based Learning algorithms. International Journal of Man-Machine Studies, 36, 267-287.
Aha, D. W., &
Bankert, R. L. (1994). Feature selection for case-based
classification of cloud types: An empirical comparison. In D. W. Aha
(Ed.) Case-Based Reasoning: Papers from the 1994 Workshop
(Technical Report WS-94-01). Menlo Park, CA: AAAI Press. (NCARAI TR:
AIC-94-011)
Ahmed, M.,
Ward, R., and Kharma, N. (2001). A Knowledge-base System for
Recognizing and Solving Mathematical Problems, in the 1st IEEE International Symposium on Signal Processing and Information
Technology, (to
be held in)
Cairo, Egypt,
December 28-30, 2001.
Brill, F.,
Brown, D., & Martin, W. (1992). Fast genetic selection of features
for neural network classifiers. IEEE Transactions on Neural Networks, 3(2), 324-328.
Dasarathy, Belur V.
(1991). Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques, Los Alamitos, CA: IEEE Computer Society Press.
Dash, M., & Liu, H. (1997). Feature selection for
classification. Intelligent Data Analysis, 1, 131-156.
Demiroz, G., & Guvenir,
H. A. (1996). Genetic algorithms to learn feature weights for the
nearest neighbor algorithm. InProceedings of the 6 th Belgian--Dutch Conference on Machine Learning (BENELEARN--96),
117-126.
Estevez, P.A., &
Fernandez, M. (1999). Selection of features for the classification
of wood
board defects. In Proceedings of the Ninth International Conference on Artificial Neural Networks, 1, 347-352.
Fung, G., Liu,
J., & Lau, R. (1996). Feature selection in automatic signature
verification based on genetic algorithms. In Proceedings of International Conference on Neural Information, pages 811-815.
Gaborski, R.
S., & Anderson, P. G. (1993). Genetic algorithm selection of
features for handwritten character identification. Inthe International Conference on Artificial Neural Networks &
Genetic Algorithms ANNGA 93, Innsbruck, Austria, April 1993.
Geman, S., Bienenstock,
E., & Doursat, R. (1995). Neural networks and the bias/variance
dilemma. Neural Computation, 4, 1-58.
Goldberg DE.
(1989). Genetic algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA.
Handels, H., Ross, T., Kreusch, J., Wolff, H. H., & Poppl, S.J.
(1999). Feature selection for optimized skin tumor recognition using
genetic algorithms. Artificial Intelligence in Medicine, 16(3), 283-297.
Howe, N., &
Cardie, C. (1997). Examining locally varying weights for nearest
neighbor algorithms. Case-Based Reasoning Research and Development: Second International Conference on Case-Based Reasoning,
D. Leake and E. Plaza, eds., Lecture Notes in Aritificial
Intelligence, Springer, 455-466, 1997.
Hussein, F.,
Kharma, N., and Ward, R. (2001).
Genetic
Algorithms for Feature Selection and Weighting,
in proceedings of the Sixth International Conference on Document Analysis and Recognition ICDAR'01, Seattle, Washington,
September 10-13, 2001.
Jain, Anil K., Duin,
Robert P.W., & Mao, Jianchang (2000). Statistical pattern
recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4-37.
Kelly, J., & Davis, L.
(1991). A hybrid genetic algorithm for classification. In Proceedings of the Twelfth International Joint Conference on
Artificial Intelligence, pages 645-650.
Kharma,
N., and Ward, R. (1999). Character Recognition Systems for the
Non-expert, in IEEE Canadian Review, 33, Autumn, 1999.
Kim,
G., & Kim, S. (2000). Feature selection using genetic algorithms for
handwritten character recognition. In Proceedings of the Seventh International Workshop on Froniers in Handwriting Recognition,
pages 103-112.
Kohavi, R., &
Sommerfield, D. (1995). Feature subset selection using the wrapper
method: overfitting and dynamic search space topology. Proceedings of the First International Conference on Knowledge Discovery
and Data Mining, KDD'95, Montreal, Canada, pp.
192-197.
Kohavi, R., & John, G.
(1997) Wrappers for feature subset selection. In Artificial Intelligence journal, special issue on relevance, Vol. 97, No.
1-2 (pp. 273-324).
Kohavi, R.,
Langley, P., & Yun, Y. (1997). The utility of feature weighting in
nearest-neighbor algorithms, European Conference on Machine Learning, ECML'97.
Komosinski, M., & Krawiec, K. (2000). Evolutionary weighting of
image features for diagnoses of CNS tumors. Artificial Intelligence in Medicine. 19, 25-38.
Kudo,
M., & Sklansky, J. (2000). Comparison of algorithms that select
features for pattern classifiers. Pattern Recognition, 33(1), 25-41.
Martin-Bautista, M.J., & Vila, M.A. (1998). Applying genetic
algorithms to the feature selection problem in information
retrieval. In Lecture Notes On Artificial Intelligence (LNAI), 1495. Springer-Verlag.
Matthew, W. (1999).
Galib (2.4.4)
Massachusetts Institute of Technology MIT.
http://lancet.mit.edu/ga/
Michalewicz, Z., & Fogel,
D. B. (1998). How to solve it. Springer-Verlag. Pp.117-124.
Moser, A. (1999). A
distributed vertical genetic algorithm for feature selection. In
Proceedings of the Fifth International Conference on Document
Analysis and Recognition,
Open Research Forum.
Moser, A., & Murty, M.
(2000). On the scalability of genetic algorithms to very large-scale
feature selection. Real World Applications of Evolutionary Computing: Proceedings, 1803:
77-86.
Murphy, P., & Aha, D. (1994). Repository of machine
learning databases. Department of Information and Computer Science,
University of California, Irvine, CA.
http://www.ics.uci.edu/~mlearn/MLRepository
Nicholas Howe
& Claire Cardie (1997). Examining locally varying weights for
nearest neighbor algorithms. Case-Based Reasoning Research and Development: Second International Conference on Case-Based
Reasoning,
D. Leake and E. Plaza, eds., Lecture Notes in Artificial
Intelligence, Springer, 455-466, 1997.
Punch, W.,
Goodman, E., Pei, M., Chia-Shun, L., Hovland, P., & Enbody, R.
(1993). Further research on feature selection and classification
using genetic algorithms. In Proceedings of the Fifth International Conference on Genetic Algorithms, pages 379-383.
Raymer, M.,
Punch, W., Goodman, E., Kuhn, L., & Jain, A. (2000). Dimensionality
reduction using genetic algorithms. IEEE Transcations on Evolutionary Computation, 4(2), 164-171.
Shi, D., Shu,
W., & Liu, H. (1998). Feature selection for handwritten Chinese
character recognition based on genetic algorithms. In Proceedings of the IEEE International Conference on Systems, Man and
Cybernetics,
5, 4201-4206.
Siedlecki, W.,
& Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition, 2, 197-220.
Siedlecki, W., & Sklansky, J. (1989). A note on
genetic algorithms for large-scale feature selection. IEEE Transactions on Computers, 10, 335-347.
Smith, J.E.,
Fogarty, T.C., & Johnson, I.R. (1994). Genetic selection of features
for clustering and classification. In IEE Colloquium on Genetic Algorithms in Image Processing and Vision.
Stuart Geman,
Eli Bienenstock & Renee Doursat (1992). Neural networks and the
bias/variance dilemma. Neural Computation, 4, 1-58.
Vafaie, H., & De Jong, K.
(1993). Robust feature selection algorithms. In
Proceedings of
the IEEE International Conference on Tools with Artificial
Intelligence,
pages 356-366.
Wettschereck,
D., Aha, D. W., & Mohri, T. (1997). A review and empirical
evaluation of feature weighting methods for a class of lazy learning
algorithms. Artificial Intelligence Review, 11, 273-314.
Wilson, D.
Randall & Martinez, R. Tony (1996). Instance-based learning with
genetically derived attribute weights, Proceedings of the International Conference on Artificial Intelligence, Expert Systems
and Neural Networks (AIE'96), pp. 11-14.
Page 1,
Page 2,
Page 3, Page 4
|