Susan Currie Sivek

Writer, researcher, hiker, knitter. Data Science Journalist for Alteryx, Inc. Former journalism professor. Curious about everything.

Oct 3, 2020
Published on: Towards Data Science
2 min read

Photo by MD Duran on Unsplash

Caps, gowns, diplomas … and data!

Each student’s journey through a higher education institution creates lots of data. Recruitment, advising, retention, financial aid, administrative processes, assessment measures, course work, athletics and alumni activities all can be tracked in detail.

That data can be put to work in predictive models that advance institutional goals and aid student success. In addition to the effective use cases linked above, here are two more innovative ways researchers have used machine learning to make predictions in the world of higher ed. While there are challenges, of course, predictive analytics can provide insights into all kinds of higher ed data.

KISS: Keep It Simple for Students (… and Models)

With many colleges and universities primarily teaching online right now, students are facing unusual learning challenges. Online learning management systems (LMS) offer tons of data on how students engage with course activities, online resources and each other. But which data best predict which students may struggle, and which models offer the most utility?

A team of researchers gathered data from Moodle, a popular LMS, across four semesters of an online introductory computer programming course. The data included students’ “cognitive interactions” with course content, their “social interactions” with each other, and their “teaching interactions” with the instructor; the researchers thought these categories might have differing predictive power. They also collected more data, such as the students’ total LMS interactions overall, and gave students a questionnaire about motivation and demographics. Finally, they built new features, including a “commitment factor,” a ratio of a student’s weekly total of interactions to the average for all students in the class.

Photo by Iris Wang on Unsplash

With all this intriguing data on hand, the researchers tested 13 different combinations of data and six different predictive algorithms to see which would best identify students at risk of dropping out of the course or failing by the eighth week.

Surprisingly, they found that — despite trying to develop new ways to examine students’ data — “the simple counting of interactions can be used to generate predictive models,” though other research had suggested this might not be enough sophistication. Their top-performing model for predicting at-risk students was an AdaBoost classifier trained on total counts of all student interactions, and the second-best model also used AdaBoost with the same counts plus the “commitment factor” feature. Even the student questionnaire didn’t enhance the models beyond these few simple data points.

“We are able to conclude that a more structured course, with dozens of materials, best fits the students’ needs, because they can have good interactions with the course and, consequently, succeed. It also seems that student interaction means engagement, and more engagement leads students to succeed,” the researchers wrote.

While it seems like a no-brainer — build a robust online course, and students are more likely to succeed! — these results are helpful for those wanting to try out learning analytics and prediction themselves. You don’t necessarily have to build a super-complex model to identify and reach out to at-risk students. A simpler approach that tracks students’ online engagement and identifies those less engaged could still contribute to students’ success.

Photo by Andrew McElroy on Unsplash

Predictive Analytics in College Athletics: Tweets for Success

Machine learning isn’t just for universities’ academic and administrative needs. Another research project, “From Hashtags to Heismans: Social Media and Networks in College Football Recruiting,” demonstrated how logistic regression could be used with football student-athletes’ Twitter posts to predict with 87% accuracy whether they would receive a scholarship offer in the month after those tweets.

Logistic regression outperformed other algorithms, including random forest and SVM, in correctly predicting the offers. The researcher hand-labeled over 7,000 tweets, but automated natural language processing, like sentiment analysis, could also have been useful.

Though selecting an athlete for a team would seem like a complex decision with a lot of intangible elements, it’s interesting that Twitter content by itself turned out to be predictive. Important variables included whether the athletes posted “self-promoting” tweets, “ingratiating” tweets praising specific coaches and teams, and information such as camps they attended or coaches who had visited them. Bigsby also created another logistic regression model that could predict whether the athletes would commit or “decommit” to certain teams.

Beyond athletics and higher education, the research also offers ideas for how this predictive approach could be creatively used for recruitment for all kinds of jobs.

Photo by Roman Mager on Unsplash

Potential Issues

While these examples use data that’s pretty easy to access from LMSes or public social media, higher education data can be tough to gather and analyze in practice. Institutional silos, decentralized data, and concerns about student privacy and biases all pose challenges.

This recent article from The Hechinger Report covers some potential unintended consequences of using predictive analytics for student outcomes in particular. A model (and an advisor interpreting it) could steer a student away from a first-choice major that’s predicted to be too ambitious for that student … but the student might have been able to rise to the challenge. Is the model and advisor’s guidance in the student’s best interests? That’s not an easy question to answer. Questions about privacy and systemic biases also come into play.

To be sure, there are complex questions here. With care, though, there are many ways that predictive analytics can be used to help students and everyone else involved in crafting a quality higher ed experience.

For more inspiration on how to use predictive analytics, watch the video below from Educause, where some institutional leaders explain the role of predictive analytics at their institutions. You can also check out this free e-book that showcases seven different schools’ use of analytics in different areas of their institutions.