Choose currency

Big Data

Data Science and Data Mining

What are the differences between data science, data mining, machine learning, statistics, operations research, and so on?

Here I compare several analytic disciplines that overlap, to explain the differences and common denominators. Sometimes differences exist for nothing else other than historical reasons. Sometimes the differences are real and subtle. I also provide typical job titles, […]

Principles of Hadoop

Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.