
Or how I learned to love big data?
ML/AI is all the rage. There are a lot of demos on how to implement them. Running through the tutorial for ML.NET and creating an ML that can evaluate a review and determine if it is positive or negative was fun but not really illuminating. Granted it was fun to see that “Loved waiting for hours for cold food” did come back as a negative review, but I wanted more.
So I started looking around for what I could use to create an ML Model and remembered a long ago thought experiment using Ohio voter data. I dug up my notes, downloaded the latest set of records and started the process of loading them into a database. I will do a longer post with code this week on the whole process but I am happy to say I have been able to write my first ML model from scratch using ML.NET.
I have some basic demographic information for every registered voter and I know whether they voted in one of the last 81 elections, as well as how they were registered at the time they voted. All of that is built into what comes from the Ohio Secretary of State.
I’ve found several other data sources that can give me more information to build the model out. I’m not planning on trying to replicate any of the great tools that are out there to handle this type of data mining, though I will document what I would add if I were going to try to make a project like this commercially viable when I write up the long post.
Quick Summary
Things I have learned for this short version are:
Tools are good, not every tool is good for everything
Preprocessing the data before handing it off to SSIS sped up my ingesting the data for consumption by multiple orders of magnitude.
Understanding the data is key
Having a mountain of data does you no good if you don’t know what the data actually represents
Formatting the data is important
Training requires specific formats for the data, so formatting the data in a way to be consumed for training is also very important
I’ll get everything cleaned up in the next couple of days and post my code to my GitHub so folks can see and, if they want, play with it themselves.