Posts

Showing posts from August, 2015

MySQL - Create relational database with schema

Image
As mentioned in previous articles, MySQL is the programming language to manage and control the data in relational database. Nowadays, there is graphical query builder or popular web framework like Django so that you no longer need to code with SQL statement. Nonetheless, I still recommend to understand the principle of relational database and SQL especially who have less knowledge of database system design. There are several relational databases including MySQL, SQLite, PostgreSQL. In this tutorial, we will take an advantage of MySQL to administrate our own database. Imagine that you are the shop owner and start up a business, however,  resources are lacked to hire a technical to help you to construct the database structure. What could you do? Ask help? No, just do it by yourself~! Before building the operational database, we better to know the data flow in the process under this scenario. If you have never heard relational database model, you are recommended to understand more...

數據庫 - 數據之家

Image
數據庫,為某些組織及過程產生出來的數據,收集和存儲起來。 在數據庫管理中,某些數據庫會用作Operation或Analytics,所以被稱為操作類數據庫和分析類數據庫。 操作類數據庫 這種數據庫的設計應用在收集,修改和維護Application運作所產生的數據。因此,這些數據一般是動態的,和保持最新的。 最簡單的例子是庫存數據庫不斷更新並顯示庫存數量。 分析類數據庫 它用於存儲,查詢,分析數據。 這些靜態的數據在一段時間內被允許進行性能評估,決策和預測趨勢分析。 數據在正常情況不會被修改。 用來儲存交易的數據庫,某些企業會利用其數據分析營銷和營業表現等分析。 另外,數據庫內置了不同的模型,也被稱為數據庫模型,說到數據庫模型的歷史,在關聯式數據庫出現之前,分層數據庫和網絡數據庫都廣泛使用。 分層數據庫 (Hierarchical database) 它的構造為倒置的樹,每個Entity與另一個Entity只有一對多關係。 例如,產品由一個或多個供應商提供,由一個或多個會員購買, 會員進行一次或多次交易。 下圖顯示分層數據庫的結構: 每個Table之間有明確的連接。 此設計允許請求者/管理員快速檢索數據。例如, 如果產品不再銷售,該產品在Table的Parent或Child中相關的記錄也將被刪除。通常會出現一個問題,是Root Table下的實體,無法在沒有連接Table的實體的情況下存儲。 實際上,即使他們沒有購買任何東西,成員也可以存在。 然而,它可以通過在產品表中添加一個虛擬記錄來解決,或者將成員表與數據庫分開,以便成員可以寫入成員表中,但這個方法明顯是多了一個步驟,變得複雜了。 網絡數據庫 (Network database) 這種數據庫以節點和集合結構的形式表示。 這種設計有助解決分層數據庫的不足,並通過查詢快速訪問數據。 然而,用戶仍需要了解數據庫中的集合結構。 關聯式數據庫 (Relational database) 數據庫模型隨著時間和技術的發展而變化,為了解決數據不一致和完整性,關聯式數據庫模型因而誕生到今天仍是廣泛使用。 Table結構由Key,Attribute,Record和Relationship組成。Key被分類為主鍵(Primary Key)以及外鍵(Foreign...

Databases - The city where data living

Image
As the name suggests, database is the bases where the series of data used for modeling some organization and its process are collected and stored into. Actually, there are operational databases and analytical databases in database management. Operational databases This type of databases is designed for collecting, modifying and maintaining the data in routines. As a result,. the data is dynamic and kept to be latest information. The simplest example is that the inventory database that keep updating and show the latest figures of inventory. Analytical databases It is designed for storing, querying, analyzing the data. This static data in a certain period of time is allowed to analyzed for performance review, decision making and forecasting the trend. The data is supposed to be hardly modified. The transaction databases is used for marketing analysis in some firms. As we know, the databases are built in different variety of model that it is also called database model. Tow databa...

機器學習=資料探勘?

Image
機器學習(Machine Learning)屬於人工智能(Artificial Intelligence)的其中一門學科,人工智能界的先驅者Arthur Samuel在1959年給人工智能定義為令電腦機器無需明確編程,也有能力學習。人工智能的研究人員一般在研究,設計和開發A.I算法,務求讓機器能像人一般思考。而機器學習的模型有部分是基於數學統計的理論假設被設計出來的,亦有部分無需為模型作出任何假設,各式各樣的模型不停演化,由最基礎的Regression和Tree,至ANN和Random Forest,進階至Bagging, Boosting,多樣化的配搭也亦筆者著迷。 資料探勘(Data Mining)定義為從結構或非結構性的數據中發掘隱藏的資訊,而剛好機器的學習的模型成為資料探勘的工具,借此來進行分類或預測,亦著重闡釋那些因素如何影響結果。有時,為了解釋模型如何影響結果,選擇一個性能較弱但能夠解釋的模型, 明顯地,資料探勘和機器學習最大不同的就是其目的,機器學習目的只為機器懂得思考,做 最好的 決定和預測,而資料探勘著重的則是為結果作 最好的 解釋,由於這兩個領域用上的工具是一樣的,所以對於模型的設計,假設,理論支持要有一定的認識和基礎知識。 另外,我們常聽到Classification, Prediction, Forecasting這幾個術語,它們挻容易混淆,各人亦對其有不同的見解和定義,對於自己,我會有以下的意見: Classification = Prediction ? Classification會用作預計在多個事件中,某一事件會否發生 (Foster & Tom, 2013),所以我們會用模型把每一組分類成最近似的群組,就好像預計顧客們會否流失,把磨菇的品種分類;而Prediction則預計有多少事情會發生,即是一組連續性的數字,例如價錢,數量等。無論是 監督或無監督式的學習模型均可以用作Classification。至於Prediction便需要監督式的學習模型才能滿足。   Prediction = Forecasting ? 有一次,我的朋友問我哪個算法模型有能力根據消費記錄來"Forecast"客戶行為。 在我看來,prediction與forecasting有區別我建議他用...

Data Mining - Machine Learning = Data Mining ?

Image
Machine learning (ML), one of branch of Artificial Intelligence (AI). Arthur Samuel defined it as the field of study that gives computers the ability to learn without being explicitly programmed in 1959. As a result, experts tends to pay attention to the study, design and development of the algorithms. Data Mining (DM) is defined as the process of extract the useful information or knowledge from unstructured data. Practically, the algorithms in machine learning is used as tools to extract the information. Consequently, the purposes of DM and ML are not the same but there is still the overlap areas which is the models in machine learning. It is the reason that you could see the famous model such as Artificial Neural Network and Support Vector Machine occur in these two subject areas simultaneously. Having understood the the difference between data mining and machine learning, another question is raised that what the difference of classification and prediction is. We usually ...