TSM - Big Data - Apache Hadoop

Robert Enyedi - Senior Software Developer

After we started with the introduction to the “big data” world in the second issue and we continued with the article on NoSQL type of databases in the third issue, we are now introducing another important member of the family: Apache Hadoop. Apache Hadoop is a framework facilitating processing of large (and very large) data sets, running on an array of machines and using a simple programming model (map/reduce paradigm). It is designed to scale from few machines (even one) to several thousands, each of those contributing processing power and storage. Apache Hadoop does not rely on hardware for “high-availability”, but the library is designed to detect the errors at the application level.