Abstract
Football is one of the most followed sports in the world and Turkey. This prevalence of football is used in information technologies and match statistics can be determined easily with the developing data science. The most important issue in football competitions is the match result. There are many different criteria (the number of goals scored, the number of cards the team has received, the weather, play away, etc.) that affect the match result. The data obtained from the matches played in the Turkish Football Federation Super League 2019-2020 and 2020-2021 seasons were used. The main purpose of the study is to model the winning and losing situations of the teams with classification and decision tree methods. In the matches played, the red or yellow cards received by the host and the rival team, the number of foreign players in the teams and the number of goals scored were determined as independent variables by bringing them into a categorical format. Depending on these variables, the winning or losing situation of the home team is modeled using Logistic Regression and Decision Tree (CART, QUEST and CHAID) algorithms. Six different models were created within the scope of the study. By comparing the accuracy percentages, sensitivities, specifity and F-score values of the models created, it was decided that the best model was the CART algorithm with an accuracy percentage of 67.6% from the decision trees. It has been determined that the rival's red card situation and offensive and defensive powers in this model are important for the team to win or lose. It has also been shown that machine learning algorithms can be used in modeling football data.