Python | Learn for Master
  • Cool Python tricks and tips

    Here are some cool tricks to write better python code:

    List comprehensions:

    Instead of building a list with a loop:

    We can often build it much more concisely with a list comprehension:



    We can use enumerate to do a for loop:

    like this:

    Enumerate can also take a second argument. Here is an example:


    Dict/Set comprehensions

    dict/set comprehensions are simple to use and just as effective:


    [Read More...]
  • python multi thread example

    In python, it is easy to start multiple threads using the Thread class in the threading module.  The threading module is built on the low-level features of thread to make it easier to write multithreading program in python.  If you want to run multiple operations concurrently in python, you need to master the Thread class. 

    Thread Objects
    Create and start a Thread

    We can easily make several threads run concurrently using the Thread class. The syntax to create and start a thread is as follows :

    The jobs are defined in my_function,

    [Read More...]
  • Python Queue examples

    In this post, I will discuss how to use the python Queue module. This module implements queues for multiple thread programming. Specifically, the python Queue object can be easily used to solve the multi-producer, multi-consumer problem, where messages must be exchanged safely between multiple threads.  As the locking semantics have already been implemented in the Queue class, you don’t need to handle the low level lock, unlock operations, which can easily cause the dead lock problems.


    Tips: queue is one of the most widely used data structures in computer science.

    [Read More...]
  • Count word frequency

    Count word frequency is a popular task for text analysis. In this post, I describe how to count word frequency using Java HashMap, python dictionary, and Spark. 

    Use Java HashMap to Count Word frequency

    {a=5, b=2, c=6, d=3}

    Use Python Dict to count word frequency

    The output:

    {‘a’: 5, ‘c’: 6, ‘b’: 2, ‘d’: 3}

    Use Spark to count word Frequency

    The above method works well for small dataset. However, if you have a huge dataset, the hashTable based method will not work. You will need to develop a distributed program to accomplish this task.

    [Read More...]
  • pyspark unit test based on python unittest library

    pyspark unit test

    Pyspark is a powerful framework for large scale data analysis. Because of the easy-to-use API, you can easily develop pyspark programs if you are familiar with Python programming.

    One problem is that it is a little hard to do unit test for pyspark. After some google search using “pyspark unit test”, I only get articles about using py.test or some other complicated libraries for pyspark unit test. However, I don’t want to install any other third party libraries .  What I want is to set up the pyspark unit test environment just based on the unittest library,

    [Read More...]
  • python UnicodeEncodeError, converting unicode to ascii

    In python, we often encounter the unicode convert issue. For instance, when you try to print a unicode string, you will get the following exception:

    The reason is that  the str() function tries to convert the unicode string using ascii, which doesn’t support the character u’\xe6′.

    The solution is to convert the string into ‘utf-8’ encoding.

    recommended conversion workflow: input (any cp) -> convert to unicode -> (process) -> output to utf-8

    See the following two examples:

    Best practice:

    Always encode from unicode to bytes.

    [Read More...]
  • Python error import: unable to open X server

    When you run python from shell, you may encounter the following error:

    import: unable to open X server `' @ error/import.c/ImportImageCommand/368

    Double check you have  the proper shebang line in the beginning of your python script:

    #!/usr/bin/env python

    Once your set the shebang line, you can run your python script as :

    [Read More...]
  • Simple Json Manipulation using Python

    We list the top json related operations which include load, loads, dump, dumps and pretty-print json. 

    Create a json file from a python dictionary

    We can easily store a python dictionary into a json file using the json dump method. In the following code, we first define a dictionary, then transfer that dictionary into a json file:

    The content of my.json file looks like this:


    How to pretty-print JSON?

    You can run python with the json.tool  option to build a more readable json file: prettyprint json. 

    [Read More...]
  • Get shell output when calling shell from Python

    We have shown how to call shell in python using the subprocess communicate method. However, this method has some problems as the output is buffered into memory. We need to print out the shell output if the size is too large. Sometimes, when the shell scripts run for a long time, we may need to examine the output in real time to check the status of the problem.  In order to get the real time output from shell from python, we can using the following method:

    Suppose we have a shell script,,  like this:

    Then we have the following python program to call the shell command and print the real output in screen.

    [Read More...]
  • Run hadoop command in Python

    Hadoop is the most widely used big data platform for big data analysis. It is easy to run Hadoop command in Shell or a shell script. However, there is often a need to run manipulate hdfs file directly from python. We use examples to describe how to run hadoop command in python to list, save hdfs files.

    We already know how to call an extern shell command from python. We can simply call Hadoop command using the run_cmd method.

    Run Hadoop ls command in Python


    Run Hadoop get command in Python

    Run Hadoop put command in Python


    [Read More...]
Page 1 of 212