Unlocking College Admissions Using Machine Learning

Fatma Elçin Kurnaz
May 16
8 min read

Abstract

Machine learning, a rapidly advancing branch of artificial intelligence, has become essential for uncovering patterns in complex datasets and supporting data-driven decision-making. Its growing relevance in education and admissions systems underscores the need for transparent, efficient, and equitable evaluation methods. This study focuses on supervised learning, a subfield of machine learning where models are trained on labeled datasets—data with known inputs and corresponding outputs—to make predictions. Specifically, this research employs logistic regression, a commonly used algorithm for classification tasks, to develop a predictive model using a college admissions dataset. The model analyzes various input features, including academic performance, demographic characteristics, and socioeconomic factors, to forecast the likelihood of admission. Model performance is assessed through accuracy metrics and confusion matrix analysis, demonstrating that logistic regression can provide reliable and interpretable predictions. The study not only identifies the most influential variables in the admissions process but also illustrates the broader potential of machine learning to inform and enhance decision-making in educational institutions.

1. Introduction

College admissions is a complex process influenced by various factors, including academic performance, extracurricular involvement, and personal achievements. Understanding how these factors impact admissions decisions is crucial for creating data-driven models to analyze and predict outcomes. This study develops a machine learning model using logistic regression to predict college admissions. The dataset used for this study is sourced from the Kaggle platform [1] and incorporates insights from influential works in data science, such as Hastie, Tibshirani, and Friedman [2].

The article is organized as follows. Supervised learning, a key area of machine learning where models are trained on labeled datasets to make predictions, is first introduced. Labeled datasets refer to collections of data where each example is paired with an associated label or output, which is used to train supervised machine learning models. The college admissions problem serves as a practical example of this approach. We discuss the structure of supervised learning datasets, explaining key concepts like examples, features, and labels, and how they apply to our admissions dataset. Next, we delve into neural networks, the chosen method for classification, covering the selection of parameters, the training process, and error minimization. The performance of the model is evaluated on a validation set, where we analyze its accuracy and the role of different factors in determining admissions outcomes. Finally, we conclude with a discussion on the model’s findings and its potential applications.

To demonstrate our dataset, Table 1 presents college admission information about five students. College admissions is a process that involves evaluating various factors to determine whether an applicant should be accepted into a college or university. A "1" in the final column of the table indicates that the applicant was admitted, while a "0" denotes that the applicant was not. The admissions process often considers factors such as academic performance, extracurricular activities, personal essays, and recommendation letters. Each of these elements contributes to the overall evaluation, helping admissions officers assess the applicant's suitability for the institution. Several factors can increase the possibility of acceptance, like economic, educational, sociocultural, and other types of categories. Table 1 illustrates 10 factors that are basic criteria for college admission decisions.

Supervised learning is a type of machine learning where a model is trained using labeled data. In this case, the dataset consists of examples, with each applicant representing an individual example. For each applicant, the data is divided into features and labels. The label indicates whether the applicant was admitted to college, while the features include attributes such as "type of school”," school accreditation", " gender”," interest”, "residence”,"parent age “, "parent salary”," house area”, "average grades”, and "parents’ education level based on college status."

2. Methods

In this study, we analyze a college admissions dataset of 1,000 students to predict the likelihood of their admission using Neural Networks, implemented with the Keras software library. Keras is a user-friendly deep learning library built on top of frameworks like TensorFlow, designed for fast prototyping and ease of use; its modular design and simplicity make it ideal for building and training neural networks across a wide range of applications. The five students presented in Table 1 serve as examples from the dataset. The following sections describe the analysis process and the results of the investigation.

2.1. Data Preprocessing

To prepare the dataset for analysis, categorical variables were converted into numerical values. In the "type of school" column, "Academic" was replaced with 1 and "Vocational" with 0. Similarly, in the "school accreditation" column, "A" was encoded as 1 and "B" as 0. In the "gender" column, "Male" was replaced with 1 and "Female" with 0. Lastly, for the "residence" column, "Urban" was converted to 1 and "Rural" to 0.

In the column “parent was in college”, a 1 means that the student's parents have been in college and a 0 means that they did not study in college. The same applies to the column “will go to college”. These transformations enable the model to effectively process categorical data, ensuring consistency and improving prediction accuracy.

For each student, the “interest” feature has one of five possible values: “interested”, “less interested", “not interested”, “uncertain”, and “very interested”. One-hot encoding (One-hot encoding is a technique used to convert categorical data into a binary matrix, where each category is represented by a unique vector with one high (1) and the rest low (0).) is applied to this feature. If the interest level of a patient was “less interested”, then the value of the new features for this patient are 1 for “less interested”, 0 for “interested”, 0 for “not interested” , 0 for “very interested”, and 0 for “uncertain”. A similar rule applies for other interest levels.

Table 2 lists the information of the first five students after the preprocessing steps described in this section.

Table 2. Processed College Admission Dataset

2.2. Binary classification problems

Classification is a central concept in both the fields of statistics and machine learning, and it can be broadly categorized into two main types: binary classification and multiclass classification. In binary classification, each data point is assigned one of two possible labels: 1 or 0. In the context of the dataset we are analyzing, a label of 1 indicates that the student was accepted to the college, while a label of 0 indicates that the student was rejected. The data is divided into two distinct groups or categories: category 1 includes students who have been admitted to college, while category 0 includes those who have not been admitted.

A model designed for binary classification problems functions as a mapping that takes a set of input features for each example and returns an output value between 0 and 1. This output, often referred to as y, represents the model's prediction of the likelihood that the given example belongs to category 1. The model then classifies the example into one of the two categories based on the output value: if y is greater than 0.5, the model predicts that the example belongs to category 1 (admit), and if y is less than 0.5, it predicts that the example belongs to category 0 (not admit). This approach ensures that the classification model can make probabilistic decisions based on the features provided, allowing for more nuanced predictions.

2.3. Neural Networks

The neural network model is designed to predict whether a student will go to college (will_go_to_college) based on several factors, including school type, accreditation, gender, interest level, residence, parents' age, salary, house area, average grades, and whether the parent went to college.

The model architecture consists of:

Input Layer: This layer receives the input features (the columns in the table). These features are likely preprocessed (e.g., one-hot encoded for categorical variables and scaled for numerical variables, as indicated by X_train_scaled).
Hidden Layers: Three hidden layers are implemented. These layers are crucial for learning complex, non-linear relationships within the data.
- The first hidden layer has 16 neurons (Dense(16, activation='relu')).
- The second hidden layer has 8 neurons (Dense(8, activation='relu')).
- The third hidden layer has 4 neurons (Dense(4, activation='relu')).
- Each of these hidden layers utilizes the ReLU (Rectified Linear Unit) activation function. ReLU is computationally efficient and helps address the vanishing gradient problem, which can slow down training in deep networks. It introduces non-linearity, allowing the model to learn complex patterns, and its sparsity can improve generalization.
Output Layer: This layer has a single neuron (Dense(1, activation='sigmoid')) with a sigmoid activation function. Because the target variable (will_go_to_college) is binary (True/False), the sigmoid function is ideal. It outputs a probability between 0 and 1, representing the likelihood that a student will go to college.

2.4. Binary Cross-Entropy Error

Binary Cross-Entropy (BCE) is a widely used loss function in binary classification tasks. It quantifies the difference between the true label (y) and the predicted probability (ŷ), where y is either 0 or 1, and ŷ lies between 0 and 1. The formula for BCE is:

BCE(y, ŷ) = −[y log(ŷ)+(1−y) log (1- ŷ)]

This function has several important properties. Firstly, it is always non-negative, thus BCE(y, ŷ)≥0. Secondly, when the prediction perfectly matches the true label (ŷ = y), the BCE is zero, indicating no error. Lastly, as ŷ approaches y, the BCE value decreases, reflecting a more accurate prediction.

For instance, suppose the true label is y=0. If the model predicts ŷ=0.3, the BCE becomes:

BCE(0,0.3)=−log(1−0.3)=−log (0.7)≈0.36

Now, if the prediction improves to ŷ =0.1, the BCE reduces to:

BCE(0,0.1)=−log (1−0.1)=−log0.9)≈0.10

Similarly, consider a case where y=1. If the predicted probability is ŷ =0.6, the BCE is:

BCE(1,0.6)=−log(0.6)≈0.51

However, if the prediction improves to ŷ =0.8, the BCE becomes:

BCE(1,0.8)=−log0.8)≈0.22

These examples demonstrate that more accurate predictions yield smaller BCE values. By penalizing incorrect predictions more heavily, BCE guides the model during training to minimize errors and improve classification performance.

Binary Cross-Entropy (BCE) loss is represented in my data under the "loss" and "val_loss" columns. These columns indicate the training and validation BCE values at each epoch.

At epoch 0:

Training BCE loss: 0.680897

Validation BCE loss: 0.674166

At epoch 199:

Training BCE loss: 0.084084

Validation BCE loss: 0.294436

The BCE values decrease significantly over the epochs, which indicates that the model is learning effectively. Initially, the loss is high because the predictions are less accurate. As the model trains, the loss decreases, reflecting better alignment between predictions and actual labels.

3. Results

3.1. Dataset Splitting and Model Evaluation

To develop our machine learning model, a dataset that was divided into two subsets is utilized: the training set and the validation set. The training set contains examples used to train the model, while the validation set is reserved for evaluating the model's performance during development.

Following standard practices, we allocated 80% of the dataset to the training set and the remaining 20% to the validation set. This split was performed randomly, meaning each example had an 80% probability of being assigned to the training set and a 20% probability of being assigned to the validation set. Both subsets include the features and corresponding labels necessary for training and evaluation.

After training, we evaluated the model on the validation set. The accuracy of the model was calculated as follows:

This metric reflects the percentage of correct predictions made by the model on the validation set, providing an estimate of its performance.

The model achieved an accuracy of 91.5% on the validation set. This result indicates that, on average, the model is sufficient for generalization when applied to new data. These findings highlight the model's strong potential for reliable predictions, although further testing on unseen data is essential to confirm its generalizability.

4. Conclusion

A machine learning-based model was developed using the “Multi-Layer Perceptron” model from the TensorFlow and Keras libraries to predict the likelihood of college admission for applicants. The dataset was split into a training dataset (80%) and a validation dataset (20%) to ensure both effective learning and reliable testing. Neural networks were employed as the training method. The model achieved an accuracy of 0.91, consistently performing well on both the training dataset and the validation dataset. This high accuracy indicates that the model is effectively learning patterns without overfitting, as the performance remains stable across both datasets. These results highlight the model's reliability and its potential to accurately predict admission outcomes for new applicants.

5. References

1. Kaggle, College Admissions Dataset, Kaggle.com. Retrieved from https://www.kaggle.com/datasets/saddamazyazy/go-to-college-dataset.

2. Trevor H., Robert T., Jerome F., The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.), Springer, 2009.

3. Bogdan O., Radu D., Sorin C., Predicting Students’ Results in Higher Education Using Neural Networks, Proceedings of the International Conference on Applied Information and Communication Technologies (AICT2013), April 25–26, Jelgava, Latvia. Retrieved from http://aict.itf.llu.lv.

4. Shivansh S., Understanding the Difference Between ReLU and Sigmoid Activation Functions in Deep Learning, Medium. Retrieved from https://medium.com/@srivastavashivansh8922/understanding-the-difference-between-relu-and-sigmoid-activation-functions-in-deep-learning-33b280fc2071.

Unlocking College Admissions Using Machine Learning

Recent Posts

Comments

Get in Touch