M. Didkovska, A. Gogolev. Comparative analysis of clustering algorithms
A constant information production rate’s growth leads to disproportionality of "information noise" due to the weak data structuring, inconsistency of formally relevant information and its multiple duplication. Search in systematically updated information space can be simplified by means of categorization. The problem of "information noise" is important for online stores. Their databases contain about half a million products. Prices are constantly changing, some goods are no longer for sale, new items appear. For this reason, online store has to update constantly their databases. And every time during update items must be categorized. Currently there is no unified approach to this problem. In the article the mathematical formulation of goods’ categorization problem is represented, the following stages are pointed out: indexing, classification and evaluation. Experimental study of classifiers (naive Bayes classifier, SVM method and decision tree) has shown that SVM method is the most effective to solve the problem of categorization.
Size 2.9 MB - File type application/pdf

