Relational Database - Logical design of tables in relational database

In articles of creating database with schema, I have suggested to design the schema before establishing tables in the database. Frankly speaking, you may still be confused about the design of the schema. As a result, I am going to talk more about the logical design and tables structure in the relational database.

Firstly, I do highlight the relational databases could be far away from that of NoSQL and non-relational databases so do not mix up each of them. In order to make this tutorial more readable, the list of terms are explained and list out below:

Database: All tables and raw data stored and consolidated in
Table (Entity): Arrange and store data in with table structure (e.g. Member Table, Product Table)
Column (Attribute): the column of the table (e.g. member_id, member_name, gender in Member Table)
Row (Record): the row of the table (e.g. 1, horace,  M)


Some of you should have heard about "Normalization". Actually, the process of normalization is quite similar to the process of breaking down the table to keep data consistency. Based on my own experience, I will try to normalize the tables in line with entities when handling complicated system. For instant, if we were the online shop owner, we should know there should be an amount of data generated from a record of the transactions.


We can see the Transaction table with listed attributes including the sets of member and product's related data. It is a little bit of redundant and many value are replicated so that some consistency may occur. Suppose we have the three records of transaction data and present in table formats. Obviously, the records are ordered from earliest to latest.
 
transaction_id member_name membership_type product_name  price  payment_datetime
1 Horace Tang Sliver Coca-Cola  $10.00 11 Nov 12:30
2 Nicky Li Gold Milk  $11.00 11 Nov 14:05
3 Horace Tang Gold Potato Chips  $30.00 11 Nov 14:25

See? The member, "Horace Tang" exists twice in the three records in Transaction table, however, the membership_type is not same. It could be caused by that Horace is upgraded to gold membership after the transaction it makes at 12:30. Thus, the membership_type change to "Gold" and show in the second transaction. In order overcome such inconsistency, we will extract the information and form a new table of User from transaction table. Similarly, product data also is extracted to be a Product table to keep the data consistency.




Now, two new primary keys (member_id, product_id) are assigned to new Member and Product table respectively.. A user say "Horace Tang" (user_id: 1) could have many transaction records (transaction_id: 1, 3). Meanwhile, each transaction record should be owned by one user. To put it simply, it is the one-many relationship between User and Transaction table. So as to indicate "Horace Tang" belongs to transaction (transaction_id: 1), user_id will be assigned into transaction table as foreign key. The same manner can be applied to product table and product_id also stored as foreign key.






member_id member_name membership_type
1 Horace Tang Gold
2 Nicky Li Gold


product_id product_name  price 
1 Coca-Cola  $10.00
2 Milk  $11.00
3 Potato Chips  $30.00


transaction_id payment_datetime member_id product_id
1 11 Nov 12:30 1 1
2 11 Nov 14:05 2 2
3 11 Nov 14:25 1 3


Consequently, two new tables should be divided from the original table where you find there are two entity in with one-to-many relationship according to my own experience.

In reality, our members is allowed to purchase one or more than one product in a transaction. That means one or more than product could exist in one or more than transaction. As I have mentioned about the relationship, there are many-to-many relationship between entities apart from one-to-one and one-to-many. In this case, there are many-to-many relationship between Product and Transaction table. Nevertheless, it is hard to implement the many-to-many relations in the database. The table structure need to be further modified so that a transaction could show a list of purchased product list.

The solution is to create a new table to connect these two entities:



A new table named Purchased List is created and transaction_id, product_id is assigned to foreign key in Purchase List table. This implies that one product is stored in one or many purchased list (one-to-many relationship) and each transaction record consists of one or many purchased list (one-to-many relationship). As a consequence, It could be presented in table structure:


transaction_id payment_datetime member_id
1 11 Nov 12:30 1
2 11 Nov 14:05 2
3 11 Nov 14:25 1


product_id product_name  price 
1 Coca-Cola $10.00
2 Milk $11.00
3 Potato Chips $30.00


purchase_list_id transaction_id product_id
1 1 1
2 1 2
3 2 2

If we only knew that Horace brought on 11 Nov 12:30 and thus transaction id equals to 1, we could search the Purchase List table to get the purchase_list_id based on transaction id (= 1). After obtaining purchase_list_id and then search the product_id. Apparently, Horace has bought Coca-cola and milk as the product_id: 1, 2.

To put it into few words, Horace (member_id: 1) purchased both Coca-Cola (product_id: 1) and Milk (product_id: 2) in a transaction (transaction_id: 1).

In conclusion, we will keep it into one table unless we find there are one-to-many (many-to-one) and many-to-many relationship between the entities. It also aims to avoid data inconsistency. Glad if you become more familiar with logical design of relational databases now!



Comments

Popular posts from this blog

Boosting vs Bagging? 別再胡亂用了!

機器學習之陷阱 - Imbalance Class Classification

Excel VBA - 自動生成分析報告