Big Data - Can it Stop School Shootings?


Actual Posts from Youth Wanting To Kill Others

Valentine's Day, 2018 will forever be remembered for another school shooting. For those directly impacted, the pain is unimaginable. For the rest of us, the "what can be done" debate renews. There is a lot of online and pundit discussion going on right now about the merits of gun control and providing appropriate mental health services, and how new laws and funding would or would not stop future mass shootings. But the reality is that no legislation is going to happen soon enough to stop the next violent act. In analyzing Nikolas Cruz's social media posts after the fact, it is obvious that he is a deeply disturbed young man. Many of the posts he made before the Douglas High School shooting self-predicted the violence that was to come. The FBI and Florida police report that they had monitored his posts yet had no way of identifying Nikolas Cruz as the writer, and taken individually, his posts did not portend an imminent threat.

However, with 100% certainty, social media companies, mobile companies, search engine companies, and other digital organizations could have identified the pattern, identified the potential threat, and identified Cruz himself down to his mobile phone number and thus home address. How do we know? Because if I want to find an individual who has used a term on a social media post, I can do so. That post can be tied back to an account, and the keepers of the digital security key, e.g., Facebook, Twitter, Google, etc., can tie that account username and password and ultimately an email address to an IP address. That information can then be cross-referenced and used to identify an individual. Digital companies do this with their own data every second of every day.

THE BIG DATA / BIG IDEA: What if the top social media and technology companies pooled their resources and created an independent third-party organization responsible for aggregating and mining digital data for potential threats?

  • This group would be responsible for aggregating and mining anonymous and non-identifiable digital data for threatening language.

  • The organization would have its own independent oversight board to ensure objectivity, and to ensure that the data is not resold to companies, including the founding partners.

  • If a threat is determined, the information would forward to Homeland Security.

  • Homeland Security would be required to get a search warrant before it could unlock and access any identifiable information.

Can reviewing digital posts, text messages, emails and other digital data really identify a potential threat? I did some research and found that in every case of mass school violence, the shooter posted threatening messages on social media, used search engines to locate information about implementing a violent act, and/or even sent text messages, oftentimes weeks prior to the event. Using some of the search techniques I share in my programs, in less than a minute I found thousands of recent messages that contained concerning language (see above; I blurred the names).

Are these real threats and are the young people who made them serious? Probably not.


Yet knowing what we know today, should they still be taken seriously? Yes.


An automated system could enter the anonymized data into an analytics engine, scour the combined digital database as described above, and determine if there is consistent messaging across multiple media by the same person. Then by comparing the data across billions of other data points - including the historic digital information and posts of those who ultimately did implement violence - the system could identify patterns and determine the likelihood of a future act.


At the least, these young people could be offered help and a caring person ready to listen, which ultimately might stop the violent thinking before it mutates out of its digital form.


Unfortunately - or fortunately depending on where you reside on the privacy debate - companies including Apple, Facebook, Twitter, Google, etc. do not make their data available to others and certainly not the government. They take such a firm stance on privacy because they value liberty, and from a business and marketing standpoint, they must do so if they are to attract and retain users.


Yet possibly an independent organization aggregating anonymous data requiring a search warrant if a threat is deemed a possibility might provide the separation necessary for this solution to fit within the participating organizations' business models, and our nation's history and core values of protecting individual liberties. Yes, this solution is controversial. But it is infinitesimally less controversial than trying to modify the Second Amendment. And thus, it can be implemented more rapidly.


Another key issue with this potential solution is "false positives." While there is a perception that artificial intelligence (AI) and machine learning is near mathematically perfect, with today's technology, this is inaccurate. Where AI often fails today is context. While AI at its core uses mathematical and statistical modeling, math and stats cannot decipher meaning. For example, a young person Tweeting "I'm killing it at school today" in reference to doing well on a test could, without the proper context, be interpreted completely differently by an algorithm.


That is why, for such a solution to work, there must be billions of datapoints from numerous sources, so the algorithms get smarter over time. In addition, one post would not be enough to raise concern, so the system would need to accurately analyze an individual's holistic digital footprint and ensure that the cumulative data is correctly associated back to that specific individual. Finally, to ensure that such a system is 99.9% accurate before making an assumption about an individual's intent, a human must review the data and its context prior to removing anonymity and issuing a search warrant.


The reality is that similar data analytics is already being done on every individual by big companies for marketing and other purposes. Yet there is a big difference between data algorithms being used to serve relevant advertising and that same data being used by government to try and predict an individual's future actions.


With great power comes great responsibility because the opportunities for misuse are many. That is why third-party oversight is imperative, and why human involvement is paramount to ensure any algorithmic conclusions are reviewed for proper context.


What can you do? If you believe this idea has merit, contact your legislator and ask that they work with technology companies to find options on the best way to share digital data in an anonymized fashion and work together on creating a private/public partnership that - with proper oversight - leverages big data and analytics to identify future threats. As most of the companies that possess the data are publicly traded, you can also contact those companies' boards of directors and attend annual shareholder meetings and voice your concern and share ideas.


There is no escaping the reality that we have already moved from the technology age to the big data age. Those who think that digital data should be protected and not shared are kidding themselves as that ship has already sailed. All who choose to participate in sharing their digital information via social media, email, mobile devices, messaging, etc. need to understand that the consequence of that individual choice means the data is available for companies to use to make decisions on everything from hyper-personalized marketing to insurance and healthcare pricing.


Are we also willing to simultaneously give up some privacy - with appropriate restrictions and safeguards - to the government in return for a safer society? As a nation, what liberties we are willing to lessen for the security of our children.



Author: Sam Richter, CSP, CPAE National Speaker Hall of Fame | Top 50 Sales Keynote Presenters | Bestselling Author