As we move from analytics to deep learning and eventually to AI, there is a growing concern that we aren’t focused enough on assuring the accuracy of the results. The efforts seem more focused on collecting lots of data than on assuring the validity of the process. The effort can be severely compromised in three places: the collection of the data (or sample), the initial analysis of the data, and the interpretation of the analysis.
In the recent request for data and the responses to President Trump’s effort to determine massive voter fraud, both sides are showcasing endemic problems that would corrupt the process. Let’s walk through some of the problems that will likely assure that neither side’s positions will be founded on fact.
The folks who believe that there is no fraud do have an impressive amount of prior research to point to, but none of it seems to either relate to the last election or look at the nation as a whole and thus may not be representative of the last election. This mismatch between the capabilities of hostile states like Russia and China and the security over this process has become exceedingly great and suggests the possibility of at least state-sponsored voter fraud to have increased significantly. (By the way, one of the more interesting papers done on the 2016 election uses pattern analysis to showcase no evidence of fraud. But a state-level attack could have taken this into account and worked within the patterns to drive a result that would have been undetectable with pattern analysis.)
Assuming the Outcome
In any form of analysis, there must be a willingness to accept the outcome, regardless of what it is. If that willingness doesn’t exist, the process is inherently corrupted. Those collecting, analyzing and interpreting the data will all be focused on achieving the expected outcome and will tend to disregard or discredit information that doesn’t conform with the preconceived notions. For those who want to prove tampering, this means putting excess weight on any evidence that tampering occurred, even though it could be trivial. For those who don’t believe tampering occurred, it means trivializing any related evidence to assure the result is consistent with their belief. The clear bias on both sides assures a result that will be impossible to reconcile and highly distrusted (not to mention inaccurate).
Biased Sample
It is fascinating that the states that seem to believe there is voter fraud are supplying information and the ones that don’t, which are the majority now, aren’t. This suggests that in any initial analysis, the sample will be exceptionally biased toward fraud, driving a result that the states not submitting are decidedly against. Not supplying data because you are afraid of the outcome may drive the outcome that you didn’t want in the first place. If there are security concerns over the data, make the reasonable request of protecting it part of the condition of supply. But given how unsecure these voter systems currently are, using security as a reason to not provide data seems at best ill-advised. If the real concern is the intentional corruption of the data, which is a valid concern, then make a condition of supplying the data a process through analysis that is unbiased and auditable.
Assured Analysis
This takes us to where no one seems to be all that focused: The analysis process is, given the preconceptions of the administration, biased. Regardless of the above, the analysis needs to be unbiased and auditable, otherwise no one on the opposing side will believe it. Rather than answering a question, it will just lead to more drama. The goal must change from proving something to finding out what the true state of the population is. This means that the analysis must be done by people who want to know an answer, not want to prove the answer they prefer.
Interpretation
This is going to be ugly because any time you deliver analysis to an executive who only wants one answer, you are putting your job on the line. The best advice I have for this kind of problem is to let someone you don’t like do this project; right or wrong, that person is screwed. If the results agree with the president’s claims, his work will be torn apart by those opposing, looking for any anomaly or mistake to discredit the work. If it disagrees with the president’s claims, he’ll likely be crucified for his efforts. This is the very definition of a lose-lose. The best path would be to have people on the team who would be problematic for either side to attack assure the accuracy of the outcome, and jointly present it. This is a case where shared responsibility and credit could be a life saver.
Wrapping Up: Data Is Your Friend
You’d think we’d catch on that data is our friend. But often we care more about appearing right than being right. This can be avoided by not taking a hard position before the analysis is done, but I know of few who do that, and the focus on voter fraud is a case in point. Is there likely some voter fraud? Sure. In fact, that shouldn’t even be a question, given the focus on Russia’s involvement in the election. But the question really is whether this was material and, without any real in-depth analysis on the last election, both sides of this argument took positions and are now aggressively defending them largely without any detailed, reliable, unbiased data and analysis. Worse, the focus seems to be on either preventing this analysis or compromising it, which won’t end in knowing which side is right, let alone accurately defining the cause so it can be corrected.
That’s how you waste millions on analytics. You destroy the validity of the result before you even start. This is something to think about when you fund your own projects. Can you provide a result that is accurate, and will the executives who get it accept an accurate result? If the answer to either of these is no, kill or pass on the effort as a career killer and move on.
While being able to prove voter fraud favoring Hillary Clinton initially after the election might have boosted President Trump’s influence because he had no record as president, now it would be vastly offset by his behavior in office. On the other hand, were he to discover fraud that pushed the election in his favor, it would remove any mandate he had left and might even trigger a sequence of events forcing him out of office. The fact that he isn’t even considering that fraud could be discovered that favored him, given strong indications that Russia may have done exactly that, showcases the final forgotten item. You can be right about a problem but be very wrong about its nature and outcome.
One more thing: If you look at the mass of existing research and couple that with the Russian allegations, there are two likely outcomes. One is that there is no fraud, which would make the president look bad. The other is that there was fraud, but favoring Trump rather than Clinton, which would be like Christmas in July for the Democrats fighting this process. This suggests a Republican and a Democrat could walk together down the street wearing “I’m with Stupid” t-shirts this summer and both be right.
Rob Enderle is President and Principal Analyst of the Enderle Group, a forward-looking emerging technology advisory firm. With over 30 years’ experience in emerging technologies, he has provided regional and global companies with guidance in how to better target customer needs; create new business opportunities; anticipate technology changes; select vendors and products; and present their products in the best possible light. Rob covers the technology industry broadly. Before founding the Enderle Group, Rob was the Senior Research Fellow for Forrester Research and the Giga Information Group, and held senior positions at IBM and ROLM. Follow Rob on Twitter @enderle, on Facebook and on Google+