What I decided to do was run a linear regression model on the data points of each candidate from the 2000 election cycle to the 2012 election cycle in order to make my predictions on the 2016 election cycle. The variables I looked to consider in the analysis were Iowa's polling predictions, Iowa's results, whether a candidate won Iowa, a candidate's national polling average at the time of the Iowa caucus, New Hampshire's polling predictions, national polling at the time of New Hampshire, the difference in the actual Iowa results and the Iowa polling, and the national polling difference between New Hampshire and Iowa in order to predict the actual New Hampshire primary result.
I'll spare you the boring details and just say that I constructed some linear regression models, but found no variable was statistically significant besides the New Hampshire polling values when Republican and Democrat data points were used together in the same model. Therefore, I though that since Republicans and Democrats diverge in how they think about the country, why wouldn't they diverge in different linear models? This question proved correct.
For both Republicans and Democrats, Iowa results do not directly influence the results in New Hampshire. However, for each party, they effect the models in significantly different ways.
For Democrats, Iowa seems to effect the results in New Hampshire through the change in national polls for candidates from the time of the Iowa caucus to the New Hampshire primary.
The effect is actually strange when you think about it. Those that experience increases in the national polls after Iowa actually receive a negative effect in New Hampshire. The way I would best try to explain this effect is that there are significant differences in the electorates of Iowa and New Hampshire. Therefore, what voters like in Iowa, which influence the national polls in the aftermath of the caucus, is not necessarily what Democratic voters in New Hampshire like in a candidate.
Therefore, I believe the best linear equation for predicting Bernie's and Hillary's outcome in New Hampshire will be predicted by the following linear equation:
NHActual = 1.00695 * NHPredict - 0.77068 * NatDifference + 0.73230
Therefore, the prediction is that Bernie Sanders will easily win New Hampshire. I used two different sets of numbers to test the linear regression. The first set is the average of polls on RealClearPolitics. The Silver numbers are based on the numbers at FiveThirtyEight, which is run by Nate Silver. The numbers appear slightly different because each weights polls differently. If I had to guess which will be more accurate, I believe it will be the prediction run with the Silver numbers because they weight the polls based on when the poll was taken as well as how good of a track record the polling institution has. Based on the model, he should win by about 12 or 13 percentage points.
The Republican side was much different. The results on the Republican side were not influenced by the change in the candidates' national poll numbers, but the candidates' national poll numbers themselves. Similar to the Democrats on this, an increase in national poll numbers actually results in a negative effect on the New Hampshire results, which leads me to believe that New Hampshire is greatly different from the rest of the nation, not only on the Democratic side but also the Republican side.
Therefore, the model I think best fits the Republican candidates is:
NHActual = 1.25411 * NHPredict - 0.21161 * Nat@NH - 0.17777
Therefore, the model predicts Trump wins, Kasich comes in a distant second, and Rubio comes in third. Based on which numbers you weight as being more accurate (I assume Silver's), then your assumptions as to how close the race for second and third are differ. In the RealClearPolitics numbers, Kasich holds over a percent on Rubio. However, in Silver's numbers, that lead shrinks to about 0.3%. Additionally, Rubio increases his lead over Bush from about 1% to 2% going from the RealClearPolitics to Silver numbers.
It'll be interesting to see how these results play out. Remember that the sample sizes for these are EXTREMELY small. Most linear regression analysis is performed on thousands of data points. There were only 12 data points for Democrats and 18 for Republicans.
Personally, I think Bernie will beat Hillary, and the top 3 for the Republicans will be Trump, Rubio, and Cruz. However, the data bares a different result. Less than 24 hours to see which is correct.
All the data and code I used to conduct this analysis can be found on my GitHub page at: https://github.com/ScottOnestak/Data-Analysis-Projects/tree/master/New-Hampshire-Predict
Update: All post-analysis of the model can be found here: https://github.com/ScottOnestak/Data-Analysis-Projects/blob/master/New-Hampshire-Predict/Results%26Analysis.md