Admittedly, it has been a very long time since I’ve updated this website. Life just keeps getting in the way. However, I did want to go ahead and update the website with a quick project that I came up with a couple of days ago.
I’ve long since been a fan of Drum and Bugle Corps having gone to see my first show back when I was a junior in high school. After that I went to at least one show per year, mostly in and around Central Texas. At some point, San Antonio was the designated stop for the Southwestern Championship and is now usually the place where all the World Class Drum Corps meet for the very first time in one place each summer. That weekend also usually happens around my birthday, so naturally I make a whole weekend of it every year.
My love for Drum Corps doesn’t stop in Texas, though. Ten years ago, I spent the 4th of July weekend in California, and I went to three different shows from Pasadena (at the Rose Bowl) to Vista, CA. In 2007, I was able to actually go to Championship Weekend in Pasadena at the Rose Bowl. Then, of course, while I was living in Upstate New York, I went to two different shows in 2014 and 2017 in Buffalo.
So what got me writing for this article?
Very late on Wednesday night, I got this crazy idea to put some of the skills I learned in my Data Science Immersive class earlier in the year to use by applying time-series modeling techniques to predicting the scores for the Top 12 Drum Corps for the Semifinal and Final Round of the DCI Championships this weekend. I had to work pretty fast since I had the natural deadline of the semifinal round being completed last night (Friday night) and the final round being Saturday night. Thankfully I rose up to the challenge!
Having said that, my code isn’t exactly clean yet and not ready for public consumption, but I do plan on publishing it on Github as part of my “portfolio” soon. Might as well, right?
Another thing that got me excited about this project was also trying out my web-scraping skills. I’ll admit that, coming out of the class, my web-scraping skills weren’t really all that good. We did some basic examples in class, and, of course, the Reddit Project we did entailed some web-scraping. Doing it on my own, via my own initiative and drive to get this project done, though, made for pure joy for me. That’s what I ended up doing most of the night this last Wednesday night. I also did not use the typical DCI Website to get my data. Instead I used DCI Scores. After getting the data, it was a matter of cleaning it up and prepping it for modeling.
Since I’ve titled this post after Santa Clara Vanguard, let’s take a look at their scores over the course of the summer:
The almost perfect trend line is certainly not atypical of any Corps score throughout the summer. I would be surprised if any Corps had a flat line; actually that would be downright depressing!!
In any case, with dates and their corresponding scores, this makes my model a classic time-series model. I had to dust off my lessons and labs from class to try and wrangle this model together that would make any kind of sense. As it turned out I used an ARIMA model to help predict the scores for Santa Clara Vanguard as well as the other Top 11 Drum Corps coming out of the Preliminary Round Thursday night.
The following table shows my predictions for the Semifinal round as well as the score for each Corps sorted by rank after Thursday night’s Preliminary Round. The model predicted that there would be one change to the rankings after the Semifinal round; it predicted that the Mandarins would jump over both Phantom Regiment and the Blue Knights.
As it turns out, my model was almost right as seen below!! The Mandarins did jump up in rankings overtaking Phantom Regiment but didn’t quite move past the Blue Knights. Incredible, if I do say so myself!! Also, incredible because it’s not often that rankings change much during this part of the season, yet my model predicted that the one Corps to do it was indeed the one that did it! Amazing!!!
So what can we expect going into the Finals tonight? Well, here are the predictions:
Apparently my model still wants the Mandarins to jump over the Blue Knights so will see if that actually happens. Other than that, not too many surprises. I’m actually pretty excited about the idea of SCV pushing past 99, though. Anybody willing to make any bets on that?
I’m actually quite proud of this project. It was something I thought about on a whim the other night. From that point, I wouldn’t let myself sleep until I had all of my data scraped, cleaned, and ready for modeling. Then I lost sleep trying to figure out how to apply time-series modeling to my dataset in Python. As it turned out…not that difficult at all.
Good luck to all the Drum Corps tonight, and thank you for another wonderful season! The highlight of every summer has to be this weekend when the DCI World Championships take place in Indianapolis. I’ll be back tonight or probably more likely early morning, Sunday morning, with the final comparisons!