Fake News is a term that made its way into common vocabulary only recently, starting with the US presidential elections. This is a phenomenon that was mostly known through click bait articles that were rapidly "viralised" on social media. Fake news gained notoriety only when its global impact was acknowledged by the bosses of IT giants like Google and Facebook, who admitted, for the first time ever, that this kind of news has the power to influence democratic elections.
"Of course" - that was Google's CEO, Sundar Pinchai, answer in November 2016 during a BBC interview, when asked if fake news tilted the balance towards Donald Trump during the United States Presidential elections. The statement came only a few days after Adam Mosseri, VP of Product Management at Facebook, said that the company must do more to stop this phenomena, in the context of Facebook becoming the main social media distribution channel for online news in 2016.
The solutions were not long expected. From blacklists containing sites that create fake news, to crowdsourcing experiments and even services offered by organizations specialized in fact checking, the fake news phenomena became the media industry's number one topic, a serious reason to worry for content distributors and a dilemma for researchers in the field. All along, a startup from Cluj-Napoca founded in 2013, Zetta Cloud, was getting ready to launch the first version of a software product that promised to help news organizations detect news that was not trustworthy, all in a 100% automated way: TrustServista.
Conceived almost 1 year before the fake news phenomena took over the entire planet and financed by a Google grant (Digital News Initiative) in 2016, TrustServista was launched at the beginning of February 2017. Zetta Cloud, a company specialized in data analytics and artificial intelligence, is at its second product designed for the "digital news" sector, after launching the Știrili application in 2013, the first and still only of its kind in Romania.
The TrustServista platform is mainly pitched at journalists who work for news agencies. It also targets any media professional working in a newsroom that requires a tool that can automate the process of filtering and verifying information. TrustServista is mainly aimed at significantly reducing the time to collect and analyze information through the full automation of human activities: from searching, collecting and filtering news articles, finding related content on a specific topic and the links between processed items, to classifying information based on its trustworthiness.
Making use of concepts from investigative journalism and Open Source Intelligence (OSINT), TrustServista uses artificial intelligence algorithms, namely Natural Language Processing (NLP), to automatically extract as much information as possible from online news articles. Thus, each collected news article is vectorized as extracted entities from the text and their frequency. Named Entities, or keywords that are categorized as person names, geographical locations, organizations or measurement units define the context of each article. The importance of this process, namely the automatic extraction of named entities, specific to Natural Language Processing, is the abstraction of each news article in order to facilitate the search for specific topics and detecting similarities in terms of content.
Since any news story also has a source of information, which can be another referenced article, a social media post or public statements or events. The reliability of a news article is determined foremost by the information source it has used, which is called in TrustServista "patient zero", a name inspired (not at all by chance) from the epidemiology field. Since the propagation of false news is similar to the spread of a virus, it is crucial to determine the source of the outbreak, which is the "patient zero" that TrustServista determines by extracting the URLs in articles, and by finding and tracking implicit references, where sources are only mentioned ( "according to the Guardian") but without being referenced by URL. The task that falls on algorithms is to find the referenced news article with accuracy, even if the name or URL location of the article is not known beforehand.
The result of this link analysis results in an article graph that can be traversed to find both Patient Zero, and other links that may give rise to new hypotheses. The objectivity or subjectivity of each news article is determined automatically using an algorithm based on sentiment analysis. The context, another metric important to trustworthiness, usually missing from click-bait articles, is determined by analyzing named entities (person names, location and time units), including whether the news article has an author. All these elements are used to automatically classify content according to its trustworthiness. The result can then be used by users to determine the sources that generate and propagate fake news.
TrustServista's fully automatic approach does not exclude human intervention. The automated trustworthiness scoring requires human validation and calibration, and, in future releases; it will be improved by using machine learning for different fake news recipes (click-bait, propaganda, aso). It will also add more elements that will be verified, both in terms of information sources and the news articles. Most importantly, TrustServista allows the processing of information in languages other than English, because information travels across the borders and languages of its country of origin.
The technology used by TrustServista is specific to Big Data platforms using a microservices architecture based on Apache Kafka. It mainly uses Hadoop components such as Hbase, HbaseGraph and zookeper. Many of TrustServista's algorithms, including Natural Language Processing, are created by Zetta Cloud, but for some specific operations (such as sentiment analysis) it is using the Rosette platform from Basis Technology.
TrustServista offers, in addition to the web GUI, an API that allows integration with complementary technologies such as "newsroom dashboards", OSINT platforms and even social media platforms that would use TrustServista's automated mechanism to determine the confidence level of information, in order to prevent the spread of fake news.
From June 2017, TrustServista will be available under production version 1.0 and will be available for purchase under a subscription model.
by Ovidiu Mățan
by Lucian Torje