Python - Multi-Athlon Programming
It is no doubt that Python can facilitates the development the tools with the support from the huge library. Many friends have raised a question which is better for data analysis with R or Python. Actually, there are CRAN in R and Python Package in Python, so that almost all function of R which Python also could have. As a both back-end developer and data analyst, Python seems to be the best choice of programming language for me to process the data extraction, preparation, analysis and modeling simultaneously.
Download and Installation
Steps to install the python for different kinds of OS are as follows. Installation of PIP which is used for downloading other useful modules is also include in the steps.
Mac
1. Download the 2.7.x version in https://www.python.org/downloads/
2. Open the downloaded file and follow the instruction to install the python
3. Run the python in terminals
4. Open terminal and install PIP:
1. Download the 2.7.x version in https://www.python.org/downloads/
2. Open the downloaded file and follow the instruction to install the python
3. Set PATH variables (Sorry for cannot provide English version as Traditional Chinese interface is set in my Window laptop)
Select the variables path and Add the ";C:\Python27\;C:\Python27\Scripts" after the value of variables.
4. Open the CMD and type:
5. Download the get-pip.py
6. Open the CMD and type (Assume you save the file in Desktop):
7. Upgrade PIP by typing the command in CMD:
Ubuntu
1. Install both Python and PIP in terminal:
Application for Data Analysis
Data Wrangling, the data scientists named as, is to import and read the data sources before cleansing and analyzing them. Moreover, Django, a famous web framework written in Python, also acts as API (application programming interface) between apps and server. Under such framework, data are serialized as JSON format and transmitted to target location.
Read JSON
Assume we have downloaded the JSON file in the Desktop. To decode the JSON in python, a function for reading JSON in python.
Suppose you have saved these files in the Desktop. Open the terminal(command) or IDLE installed before and type (the code after '#' is my comments:
Mac
len(array): the basic function to get the number of elements in the array.
array.append: the basic function to put the elements into the array.
In fact, the code in line 3 can be replaced by "for i in range(10):". However, what happen if the customers data varied we have? To avoid the hard-code, "len(raw_data["profile"])" which outputs the value of number of customer is replaced to "10".
You are also suggested to visit http://docs.scipy.org/doc/ for more statistical function references.
Hope this materials useful for analysts who are learning programming. Should you have any questions, please feel free to leave a comment. Also, I will demonstrate the application of advanced statistical and machine learning in Python later on.
Download and Installation
Steps to install the python for different kinds of OS are as follows. Installation of PIP which is used for downloading other useful modules is also include in the steps.
Mac
1. Download the 2.7.x version in https://www.python.org/downloads/
2. Open the downloaded file and follow the instruction to install the python
3. Run the python in terminals
$ python
>>> quit() Quit the Python Shell
4. Open terminal and install PIP:
$ sudo easy_install pip
$ sudo pip install --upgrade pipWindow
1. Download the 2.7.x version in https://www.python.org/downloads/
2. Open the downloaded file and follow the instruction to install the python
3. Set PATH variables (Sorry for cannot provide English version as Traditional Chinese interface is set in my Window laptop)
Open My Computer, Right click and select the last option
Click the button in Red Circle to change PATH Variables
4. Open the CMD and type:
> python
>>> quit() Quit the Python Shell
Successfully execute python in CMD! |
6. Open the CMD and type (Assume you save the file in Desktop):
> python <directory>\get-pip.py
7. Upgrade PIP by typing the command in CMD:
> python -m pip install -U pip
Ubuntu
1. Install both Python and PIP in terminal:
$ sudo apt-get install python-setuptools python-pip python-dev build-essential2. Start the python in terminal:
$ python
Successfully execute python and pip as well |
Application for Data Analysis
Data Wrangling, the data scientists named as, is to import and read the data sources before cleansing and analyzing them. Moreover, Django, a famous web framework written in Python, also acts as API (application programming interface) between apps and server. Under such framework, data are serialized as JSON format and transmitted to target location.
Read JSON
Assume we have downloaded the JSON file in the Desktop. To decode the JSON in python, a function for reading JSON in python.
Please download the source file in here (GitHub) and the checklist is:
1. data.json: Sample customer data in JSON format.
2. read_json.py: Python file consists of function of importing JSON
2. read_json.py: Python file consists of function of importing JSON
Suppose you have saved these files in the Desktop. Open the terminal(command) or IDLE installed before and type (the code after '#' is my comments:
> import sys,os,pprint # Import the build-in modules in python > desktop_dir = os.path.join(os.path.expanduser("~"),"Desktop") # Extract the desktop path and put it in the variables: desktop_dir > sys.path.insert(0,desktop_dir) # Add the desktop directory in search-module path > import read_json # Import the read_json.py > raw_data = read_json.import_json(os.path.join(desktop_dir,"data.json")) # Call the function named import_json and the output stores in variables: raw_data > pprint.pprint(raw_data) # Pretty print the imported data
Tips:
For more details of module library, you are recommended to visit the https://docs.python.org/2/library/index.html
For more details of module library, you are recommended to visit the https://docs.python.org/2/library/index.html
Good jobs! You have imported the data successfully! Next, we could do some analysis on these data. Let's be the data jungler!
Data Statistics Analysis
Let's do some simple statistics!
Data Statistics Analysis
Let's do some simple statistics!
Please install module of NumPy and SciPy first.
Mac
$ pip install numpy scipyWindow
> pip install numpy
Download the installation package and follow the instruction to install SciPy after clicking on it.Ubuntu
$ sudo apt-get install python-numpy python-scipy python-matplotliGreat! The data could be summarized. Thanks to this supports of modules, you could ignore the tedious formula in programming
> import numpy, scipy > age = [] # declare the empty array > for i in range(len(raw_data["profile"])): # Put the values of ten customers' age into the array age.append(raw_data["profile"][i]["age"]) > print numpy.mean(age) # Call the function from NumPy to get the mean of age > print numpy.std(age, ddof = 1) # Get the sample standard deviation of ageTips:
len(array): the basic function to get the number of elements in the array.
array.append: the basic function to put the elements into the array.
In fact, the code in line 3 can be replaced by "for i in range(10):". However, what happen if the customers data varied we have? To avoid the hard-code, "len(raw_data["profile"])" which outputs the value of number of customer is replaced to "10".
You are also suggested to visit http://docs.scipy.org/doc/ for more statistical function references.
Hope this materials useful for analysts who are learning programming. Should you have any questions, please feel free to leave a comment. Also, I will demonstrate the application of advanced statistical and machine learning in Python later on.
Comments
Post a Comment