FuzzyClass: A family of Fuzzy and Non-Fuzzy probabilistic-based classifiers

Classification is assign labels or classes for a data set (Pathak, 2014). Several methods, are also used in pattern recognition (Webb, 2003), computational intelligence (Konar, 2006) and decision making (Efraim, 2011). The difficulties encountered in classification are also considered as one of the central problems of machine learning. However, all of them have the same goal. A special type of classification in which the class label takes on two values, that is named binary. The classification models in which the target variable has more than two values is called multiclass algorithms.

Uncertainty and imprecision are sources of problems in modeling and building classifiers. The first one can be modeled from probability theory and the second one can be modeled by fuzzy set theory, which was developed by Zadeh (1965). In fuzzy set theory, elements can belong to more than one set simultaneously with a certain degree of membership, which is a value defined in the range [0, 1], which determines how much the element belongs to the fuzzy set.
Zadeh assumed that imprecision can be modeled using a fuzzy membership function on probability distributions (see more Zadeh (1988)). Several classification methods have been proposed using probability theory for fuzzy events (RM. RM. Moraes & Machado, 2006, 2014. Classifiers based on probability and Zadeh's probability were implemented using the Binomial distribution (RM. Moraes & Machado, 2016a), the Poisson distribution (RM. Moraes & Machado, 2015), the Beta distribution (RM. Moraes, Rodrigues, et al., 2020), the Exponential distribution (RM. Moraes & Machado, 2016b), the Gamma distribution (RM. Moraes et al., 2018), the Gaussian distribution (RM. Moraes & Machado, 2010), the Triangular distribution (R.  and Trapezoidal distribution (Lopes et al., 2023). These classifiers were implemented in the R and made available through a package named FuzzyClass, which will be the basis of this article and can be found at the link: https://cran.r-project.org/web/packages/FuzzyClass/. Classifiers such as Naive Bayes, Gaussian Naive Bayes, Bernoulli Naive Bayes, and Poisson Naive Bayes can be found in libraries and software like scikit-learn (python), Weka, and R packages naivebayes and e1071. However, none of them offer implementations with fuzzy. All implementations involving fuzzy probability and distributions not mentioned earlier are contributions provided by this package. It is worth noting that these works were developed in the LabTEVE (http://www.de.ufpb.br/~labteve/) and LEAPIG (http://www.de.ufpb.br/~leapig/) research laboratories, both at Federal University of Paraiba, Brazil.

Statistical Modeling and Discrimination Measures
The classifiers presented in this paper are divided between distributions for discrete and for continuous variables.

Naive Bayes and Fuzzy Naive Bayes
In this section it is assumed that the random variables for the data are multivariate and they are represented by x. Thus, let x = { 1 , 2 , … , } be a random vector of data in the -th sample with -information (dimension/variables) obtained from training data and , ∈ Ω is the real class for x. Let Ω = 1, ..., be the total number of classes, denoted by . The probability of the class assuming that each variable is conditionally independent of any other variable for all ≠ ≤ , is:

The Fuzzy Naive Bayes Network
The Fuzzy Naive Bayes Networks are based on the Zadeh's definition of probability of fuzzy events (Zadeh, 1968). Thus, let membership function ( ) for the variable , and class , the Zadeh's probability for this class is: As criterion the decision of the classifier, we have that the vector x i will be assigned to the class that̂= arg ∈Ω ( |x ) and̂= arg ∈Ω ( |x ).
where ( |x ) will have as a probability function or pdf assuming the distributions Binomial, Beta, Exponential, Gamma, Gaussian, Poisson, Triangular, and Trapezoidal distributions.

Motivating examples
Package functions need input arguments, some of which will be described below and others can be consulted in the package's documentation. So, follow: • train that is a matrix or data frame of training set cases; • cl factor of true classifications of training set; • fuzzy boolean variable to use or not the membership function; In the example below, an application with real data will be presented using data from the paper (RM. Moraes & Machado, 2010), appliying the classifier Fuzzy Gaussian Naive Bayes, which in the package has the nomenclature of FuzzyGaussianNaiveBayes.
The data presented below were used for performance evaluation in a virtual reality (VR) simulator in that paper. Three classes of performance were defined by the expert and numbered (M=3): correct procedures (1), acceptable procedures (2) and badly executed procedures (3). Then, the classes of performance for a trainee could be: "you are well qualified", "you need some training yet" and "you need more training". Thus, our following example has three distinct classes, as can be seen in the following variable V4: The funcion fit_FGNB estimates distribution parameters, membership functions. Those results can be accessed by the user using fit_FGNB$medias, fit_FGNB$varian, and fit_FGNB$Pertinencias, respectively.
The function predict contains all the predicted classes. The probabilities for each sample, can be accessible for the user, using the input parameter type="matrix".
Through this example, which was also the result of published articles, steps can be followed and classifiers can be applied to other data. As well as the different classifiers following the same structure of prediction of the classes. For more detailed help for each classifier, the package manual can be found at the following link: https://cran.r-project.org/web/packages/Fuzzy-Class/FuzzyClass.pdf.