Wineinformatics: Using the Full Power of the Computational Wine Wheel to Understand 21st Century Bordeaux Wines from the Reviews

Abstract
Although wine has been produced for several thousands of years, the ancient beverage has remained popular and even more affordable in modern times. Among all wine making regions, Bordeaux, France is probably one of the most prestigious wine areas in history. Since hundreds of wines are produced from Bordeaux each year, humans are not likely to be able to examine all wines across multiple vintages to define the characteristics of outstanding 21st century Bordeaux wines. Wineinformatics is a newly proposed data science research with an application domain in wine to process a large amount of wine data through the computer. The goal of this paper is to build a high-quality computational model on wine reviews processed by the full power of the Computational Wine Wheel to understand 21st century Bordeaux wines. On top of 985 binary-attributes generated from the Computational Wine Wheel in our previous research, we try to add additional attributes by utilizing a CATEGORY and SUBCATEGORY for an additional 14 and 34 continuous-attributes to be included in the All Bordeaux (14,349 wine) and the 1855 Bordeaux datasets (1359 wines). We believe successfully merging the original binary-attributes and the new continuous-attributes can provide more insights for Naïve Bayes and Supported Vector Machine (SVM) to build the model for a wine grade category prediction. The experimental results suggest that, for the All Bordeaux dataset, with the additional 14 attributes retrieved from CATEGORY, the Naïve Bayes classification algorithm was able to outperform the existing research results by increasing accuracy by 2.15%, precision by 8.72%, and the F-score by 1.48%. For the 1855 Bordeaux dataset, with the additional attributes retrieved from the CATEGORY and SUBCATEGORY, the SVM classification algorithm was able to outperform the existing research results by increasing accuracy by 5%, precision by 2.85%, recall by 5.56%, and the F-score by 4.07%. The improvements demonstrated in the research show that attributes retrieved from the CATEGORY and SUBCATEGORY has the power to provide more information to classifiers for superior model generation. The model build in this research can better distinguish outstanding and class 21st century Bordeaux wines. This paper provides new directions in Wineinformatics for technical research in data science, such as regression, multi-target, classification and domain specific research, including wine region terroir analysis, wine quality prediction, and weather impact examination.