Data Mining - Machine Learning = Data Mining ?

Machine learning (ML), one of branch of Artificial Intelligence (AI). Arthur Samuel defined it as the field of study that gives computers the ability to learn without being explicitly programmed in 1959. As a result, experts tends to pay attention to the study, design and development of the algorithms.

Data Mining (DM) is defined as the process of extract the useful information or knowledge from unstructured data. Practically, the algorithms in machine learning is used as tools to extract the information.

Consequently, the purposes of DM and ML are not the same but there is still the overlap areas which is the models in machine learning. It is the reason that you could see the famous model such as Artificial Neural Network and Support Vector Machine occur in these two subject areas simultaneously.



Having understood the the difference between data mining and machine learning, another question is raised that what the difference of classification and prediction is. We usually heard the keywords of prediction that are predict, classify and forecast.What is the difference among these? Frankly speaking, there is still no standard defining and differentiate such words. As as consequence, the below distinction is based on personal opinion and experience.

Classification = Prediction ?
According to Foster and Tom in 2013, classification predicts something will happen, prediction predicts how much something will happen. From this, classification is used to classify the categorical(discrete) variables with either supervised or unsupervised learning models. In addition, prediction is to estimate the numeric(continuous) targets with supervised-learning models.
 
Prediction = Forecasting ?
Once time,  my friend ask me which algorithm is competent to forecast the customer behaviours based on their consuming records. In my opinion, there is distinction between prediction and forecasting. Therefore, I suggest him to replace "forecast" by "predict".

In Chinese, it has a similar meaning between "predict" and "forecast". Both are used to Prediction is to estimate the values of target variables based on the independent variables with known ranges of value while forecasting is to estimate the values of target variables under the attributes which are out of the known ranges (e.g. time after now). The graph shown below includes two models that are linear regression and time-series. The fitted-line of the regression are plotted within the data range while curves in time-series are scratched after the maximum data range.

Coming back to the question, transaction records belongs to the historical data so that we only could predict the customer behaviour based on these known ranges.

As mentioned, above statements mainly come from my point of views, which aims to spark more discussion on this topics. You are also welcome to share your own ideas or opinion here!


Comments