Press enter to skip the top menu

Python Programming

Text Files

NCEA Compatibility

Module: as91896 Use advanced programming techniques to develop a computer program

Requirements: modifying data stored in collections (e.g. lists, arrays, dictionaries), storing multidimensional data in collections, creating methods, functions, or procedures that use parameters and/or return values

Go to top

Learning Outcomes

On completion of this section you will know:

Go to top

Introduction

Previously when we looked at output data, we exclusively meant output to the screen using the print() function. This certainly allowed us to see the processed data and thus determine if our processing was correct or not.

In a real-life situation, however, this would not be sufficient because, for a variety of reasons, we would need a more permanent record of the processed data. For keeping a permanent record we need to store our data to a file – either a simple text file or a database.

Here we shall look at simple data storage in text files, and also how to read back the same data for further processing.

To create a new file or open an existing file we use a function open() and once we are finished with the file we use a complimentary function close() for closing it.

For writing data we shall use a function called write(). This function stores data in the files in the form of text. For this reason string data can be written directly to the file, whereas numeric data has to be converted to text using the str() function.

When writing to the file we need to conclude each item with a newline character. Before leaving the introduction we need to look at this newline character and why it is useful to us in processing text files.

Listing 1
                        #print Test
                        strData="The fool on the hill\nSees the sun going down"
                        print(strData)
                    

In Listing 1 above notice that the text stored in the variable strData has the character combination ‘\n’ in the middle. This is normally a signal to Python (and most other langauges) to put the text following the ‘\n’ on a new line – hence the name newline character.

Figure 1 below shows us what happens when the code above is run. Notice that the single line of text in Listing 1 has now been broken into two lines. Also notice that the ‘\n’ does not appear anywhere in the output. Its sole function is to inform Python to put a new line between the words ‘hill’ and ‘Sees’

Fig 1

When we write our data to the file we shall append a newline character at the end of each data item. The reason for this is that when later we are reading data from the file we will be using a function called readline(), which reads from the current file position as far as the next newline character.

Go to top

The File Pointer

Fig 2

In order to understand how a text file is read by a programme we need to understand the concept of the file pointer. Essentially a file pointer simply points at some byte inside the file. This byte can be anywhere from the very first by to the last one or something in between. Normally when a file is opened for reading data the file pointer is at the very first byte of the file, or in other words, it points to the first byte that is to be read from the file. Once a byte is read the pointer is moved to the very next byte.

As stated before we shall be using the readline() function in order to read data from the file. Recall that we said that this function reads data from the current position of the file pointer up to and including the next newline character it meets.

Figure 2 above shows the different positions of the file pointer as it is moved along due to successive uses of the readline() function. In this file the newline character is represented by the black square

The top line in Figure 2 shows the file newly opened with the file pointer pointing to the very first byte of the file.

After a readline() function call the data ‘John Smith’ plus the following newline character is read and the file pointer is positioned over the first character name to be read, i.e. over the ‘m’ of ‘mike jones’.

As each successive name is read the file pointer is moved along to the starting byte of the next name until, once the name ‘sue smith’ and its following newline character is read, the file pointer finally points just beyond the last character of the file. At this point any further call of the function readline() will return a blank.

Listing 2
                   
                        #File Demo.py
                        myfile = open("File Demo.txt","r")
                        strData=myfile.readline()
                        while len(strData)>0:
                            print(strData)
                            strData=myfile.readline()
                        myfile.close()

                    
                    

Listing 2 shows how a programme would read the above file while Figure 3 below shows the output of that programme.

Fig 3
Go to top

Writing to a File

Listing 3
                        #C10 Payroll File.py
                        
                        #Main body of programme
                        import C9_Validating_Module
                        import C9_Payroll_Module
                        
                        #input section
                        strName = input("Enter employee's full name:  ")
                        floatHours=C9_Validating_Module.validateFloat(5, 60,"Enter value for hours:  ")
                        floatRate=C9_Validating_Module.validateFloat(16,100,"Enter value for rate:  ")
                        
                        #processing section
                        floatGross = C9_Payroll_Module.calculateGross(floatHours, floatRate)
                        floatTax = C9_Payroll_Module.calculateTax(floatGross)
                        floatNet = C9_Payroll_Module.calculateNet(floatGross,floatTax)
                        
                        #output section
                        myfile=open('Payroll File.txt','a')
                        
                        myfile.write(strName +"\n")
                        myfile.write(str(floatHours) +"\n")
                        myfile.write(str(floatRate) +"\n")
                        myfile.write(str(floatGross) +"\n")
                        myfile.write(str(floatTax) +"\n")
                        myfile.write(str(floatNet) +"\n")
                        
                        myfile.close()
                    

In order to explain the idea of a file pointer we have shown the sequence of writing to and reading from a text file in reverse order. We will now go to the beginning and show exactly how data is written to the file in the first place

Listing 3 above is almost identical to its equivalent in the chapter on Modules – or at least lines 1 – 17, which is the input and the processing, are identical. The output, however, is quite different. Instead of writing the results on the console as before, here the results are stored in a file.

At line 20 a file called ‘Payroll File.txt’ is opened. The second parameter in the function open(), i.e. ‘a’ means that the file is opened in append mode. This means that if there is data already in the file that we shall be adding the new data to the end of the file. Also if no file by that names exists on the drive then a new file is created.

A reference to the newly opened file is stored in the variable myfile.

If the newly opened file has just been created file then the file pointer is placed at the beginning of the file. On the other hand if it already exists and has data in it, then the file pointer is pointing beyond the last character in the file, ready to insert the new set of data to the file.

Lines 20 – 25 are involved in writing both the input and the processed data to the file.

Line 20 writes the employee’s name followed by a newline character.

Line 21 converts the floating point variable, floatHours, to text using the str() function and then writes the converted version to the file, followed by a newline character.

The other lines do the exact same with the rate, gross, tax, superannuation contributions and net.

At line 27 the file is closed.

Figure 4 below shows what the contents of the file looks like when opened using Notepad.

Fig 4

Fig 4 above is a composite image. The left portion shows is a Python Idle showing three runs of the code in Listing 3, i.e. entering payroll data into the file 'Payroll File.txt'. The right portion shows the same file opened in a text editor.

Go to top

Reading Data from a File

Listing 4
                    
                        #C10 Read Lines.py
                        myfile=open('Payroll File.txt','r')
                        floatTotalGross=0.0
                        floatTotalTax=0.0
                        floatTotalNet=0.0
                        strData = myfile.readline()
                        while len(strData)>0:
                            print("Employee name :"+strData.strip('\n'))
                            strData = myfile.readline()
                            print("Hours are :"+strData.strip('\n'))
                            strData = myfile.readline()
                            print("Hourly Rate is  :"+strData.strip('\n'))
                            strData = myfile.readline()
                            floatTotalGross+=float(strData)
                            print("Gross pay is :"+strData.strip('\n'))
                            strData = myfile.readline()
                            floatTotalTax+=float(strData)
                            print("Tax is :",strData.strip('\n'))
                            strData = myfile.readline()
                            floatTotalNet+=float(strData)
                            print("Net pay is  :"+strData.strip('\n'))
                            strData = myfile.readline()
                        myfile.close()
                        print("Totals")
                        print("Total Gross:"+str(floatTotalGross))
                        print("Total Tax:"+str(floatTotalTax))
                        print("Total Net:"+str(floatTotalNet))
                   
                    

Listing 4 is a more complex version of Listing 2. It reads data from the file that was created and populated by Listing 3. This code does two things:

Now let us see how the progamme works.

At line 2 the file is opened, but this time the second parameter for the open() function is ‘r’, which means it is in read mode. The file pointer is positioned at the beginning.

Lines 3 – 5 initialise four floating point variables which will respectively hold the accumulated values of the gross, tax and net.

Line 6 performs the first reading of the file using the function readline(). This function reads from its current point in the file as far as the next newline character. That means that at the beginning of the file it should read ‘Tom Thumb’ as well as the newline that follows it. These would be stored in the variable strData. On completion of this line the file pointer is moved to the first byte beyond the newline character, i.e. to the first byte of the next item to be read.

Once the programme has done its priming read, control passes to the while loop at line 7. This tests the length of the data read for being greater than zero. Clearly it will be so on the first time around and thus the body of the loop is entered.

Line 8 prints the name of the employee. As this line is fairly unusual let us examine it in some detail.

print("Employee name :"+strData.strip('\n'))

Clearly enough it will print the text in quotes, i.e. “Employee name :” This is followed by the variable strData, which, as we said earlier, should have the value “Tom Thumb”. Remember, however that we said that when the function readline() reads data from a file it reads as far as the next newline character, including the character itself. Having this character as part of the string could cause problems when printing and thus we use the strip() function in order to remove it from the variable.

Also notice that we use the same function to remove the newline character from all other data items up as far as line 21.

In lines 9 – 12 work the same way with the hours and the rate.

At line 13 the data read from the file will be the value of the gross. At line 14 this is converted to floating point using the function float() and then the value is added to the variable floatTotalGross.

At line 15 the value of the gross is printed like all of the values before it including stripping the newline character from the end.

Lines 16 – 21 do the same for the tax superannuation and net.

Line 22 performs the next read on the file. This time it should be the second employee’s name. All other six data items belonging to the employee’s payroll would then be read within the body of the loop.

The same would apply for the third employee.

Once the third employee was processed we would have searched to the end of the file as we entered only three employees. Consequently at line 22 the readline() function will have a value of blank and thus the variable strData will have a length of zero. When this is tested at line 7 the condition will be false and thus programme control will jump out of the loop and pass to line 23 where the file is closed.

The rest of the code simply prints out the accumulated totals for the gross, tax, superannuation and net.

The output of the programme is shown below in Fig 5.

Fig 5
Go to top

Processing Lists using a Text File

Here we will attempt to create a 'real world' application based on what we have learnt so far. We know how to store data to a file and retrieve the same data. Similarly we know how to store data about multiple employees in a one-dimensional list. We shall now look at an application that accepts data about multiple employees from a user, stores same data in a two dimensional list and finally saves the contents of the same list to a text file. Its companion program will read the data from the file, processes the data and write the processed data back to the file.

Below is the first program for collecting data and writing it to the file.

Listing 5
                        #Main body of programme                     
                        import C9_Payroll_Module
                        lstEmployee=[]
                        lstOrg =[]
                        #input section                       
                        strName = input("Enter employee's full name:  ")
                        while strName !="":
                            floatHours=float(input("Enter value for hours:  "))                    
                            floatRate=float(input("Enter value for rate:  "))                    
                            #processing section           
                            floatGross = C9_Payroll_Module.calculateGross(floatHours, floatRate)     
                            floatTax = C9_Payroll_Module.calculateTax(floatGross)       
                            floatNet = C9_Payroll_Module.calculateNet(floatGross,floatTax)
                            ytdGross = floatGross;
                            ytdTax = floatTax
                            ytdNet = floatNet;
                            lstEmployee.append(strName)
                            lstEmployee.append(str(floatHours))
                            lstEmployee.append(str(floatRate))
                            lstEmployee.append(str(floatGross))
                            lstEmployee.append(str(floatTax))
                            lstEmployee.append(str(floatNet))
                            lstEmployee.append(str(ytdGross))
                            lstEmployee.append(str(ytdTax))
                            lstEmployee.append(str(ytdNet))
                            lstOrg.append(lstEmployee)
                            lstEmployee=[]
                            strName = input("Enter employee's full name:  ")
                        #Output section
                        myfile=open('Storage File.txt','a')
                        for intFCounter in range(len(lstOrg)):
                            for intSCounter in range(len(lstOrg[intFCounter])):
                                myfile.write(lstOrg[intFCounter][intSCounter]+'\n')
                        myfile.close()
                    

This program is very similar to Listing 4 of the page Lists. There are two major differrences:

The program for reading the data from the file and then processing and saving it is fairly long. It also divided into two distinct sections, one for reading the data and populating the two dimensional list, and the other for processing the list and saving the processed data back to the file.

For the reasons above we have divided the program into two separate listings below. In Listing 6 we have the first 28 lines of the program, i.e. the part that deals with reading the data from the file and populating the two dimensional list with it, while Listing 7, containing lines 38 to 57, involves the processing of the data and storing the processed data back to the file.

Listing 6
                    
                        #Main body of programme                     
                        import C9_Payroll_Module
                        lstEmployee=[]
                        lstOrg =[]
                        myfile=open('Storage File.txt','r')                       
                        strData = myfile.readline()
                        while strData !="":
                            lstEmployee.append(strData.strip('\n'))
                            strData = myfile.readline()
                            lstEmployee.append(strData.strip('\n'))
                            strData = myfile.readline()
                            lstEmployee.append(strData.strip('\n'))                  
                            strData = myfile.readline()       
                            lstEmployee.append(strData.strip('\n'))
                            strData = myfile.readline() 
                            lstEmployee.append(strData.strip('\n'))
                            strData = myfile.readline() 
                            lstEmployee.append(strData.strip('\n'))
                            strData = myfile.readline() 
                            lstEmployee.append(strData.strip('\n'))
                            strData = myfile.readline() 
                            lstEmployee.append(strData.strip('\n'))
                            strData = myfile.readline() 
                            lstEmployee.append(strData.strip('\n'))
                            lstOrg.append(lstEmployee)
                            lstEmployee=[]
                            strData = myfile.readline()
                        myfile.close()                        
                    
                    

Lines 6 to 24 above are somewhat similar to lines 6 to 22 of Listing 4. In both cases the data for each employee is read from the file one text item at a time until the end of the file is reached. As each data item is read it is stripped on the newline character at the end and appended to the list lstEmployee.

The difference between the two pieces of code is that at line 25 the contents of the list lstEmployee are appended to the list lstOrg. At line 26 lstEmployee is initialised to blank, ready to receive the data for the next employee's data. At line 27 the main read for the file occurs after which control passes back to line 7 where the data read is tested for being blank.

Listing 7
                    
                        for intCounter in range(len(lstOrg)):
                            floatHours = float(input("Enter hours worked by "+str(lstOrg[intCounter][0])))
                            floatRate = float(input("Enter hourly rate for "+str(lstOrg[intCounter][0])))
                            floatGross = C9_Payroll_Module.calculateGross(floatHours, floatRate)
                            floatTax = C9_Payroll_Module.calculateTax(floatGross)
                            floatNet = C9_Payroll_Module.calculateNet(floatGross, floatTax)
                            ytdGross = float(lstOrg[intCounter][6]) + floatGross
                            ytdTax = float(lstOrg[intCounter][7]) + floatTax
                            ytdNet = float(lstOrg[intCounter][8]) + floatNet
                            lstOrg[intCounter][1] = str(floatHours)
                            lstOrg[intCounter][2] = str(floatRate)
                            lstOrg[intCounter][3] = str(floatGross)
                            lstOrg[intCounter][4] = str(floatTax)
                            lstOrg[intCounter][5] = str(floatNet)
                            lstOrg[intCounter][6] = str(ytdGross)
                            lstOrg[intCounter][7] = str(ytdTax)
                            lstOrg[intCounter][8] = str(ytdNet)
                        myfile=open('Storage File.txt','w')   
                        for intFCounter in range(len(lstOrg)):
                            for intSCounter in range(len(lstOrg[intFCounter])):
                                myfile.write(lstOrg[intFCounter][intSCounter]+'\n')
                        myfile.close()
                    
                    

The purpose of this section is to bring up the details of each of the employees whose details are stored in the two dimensional array. This is controlled by a for loop which starts at line 29. The upper limit of the loop is the length of the list lstOrg If we have three employees in the two dimensional array then the length of the list lstOrg will be three.

As we have nine data items for each of our employees, each of the sublists inside lstOrg will have nine elements, indexed from 0 to 8. With this knowledge let us examine the body of the loop.

The body of the loop, which spans lines 30 to 45, is a complete sub-program in that it has an input, process and output section. The input section comprises of lines 30 and 31. Both are the standard imput we have used up to now to get floating point values from the user. The only difference is that when we prompt the user for the hours we specify the name of the employee whose hours and rate we want to process. We get the name from element 0 of the current sublist. Thus if the employee's name is 'Tom Thumb', the prompt generated by linne 30 will be 'Enterr the hours worked by Tom Thumb'. The same applies to the hourly rate.

The processing section spans lines 32 to 37. The values of the gross, tax and net are calculated at lines 32 to 34 in exactly the same way as we have done in previous examples, and thus there is no need to explain it further. The year-to-date values for the gross tax and net is new however, and therefore we will spend some time explaining it. These values are calculated at lines 35 to 37. The year-to-date values are stored at elements 6, 7 and 8 of the sublist and thus line 35 converts the value stored in element 6 to floating point, adds the value of floatGross to it and stores the result in ytdGross. The other two year-to-date values are calculated in the same way.

We now come to the output section, which spans lines 38 to 45. Recall that all the variables referred to in lines 30 to 37 are floating point values. For the purpose of file storage the elements in our sublists must be text and thus all of the values in the variables must be converted to text using the str() function. Thus at line 38 the value for the hours is converted to text and stored in element 1 of the sublist. The other variable have their data converted in the same way and stored in the appropriate elements of the list.

Go to top

Practice

Copy the code in Listing 2 into a text editor, save it and then run it. You don’t need to copy the included files a second time as they are the same files that we used in the Module examples. Once you have added a few records open the file in Notepad and check that the data there corresponds to the data you entered.

Next copy the code of Listing 3 and run it. It should read the file, display its contents on the console, which should correspond with what was seen in the Notepad application, and finally it should print the accumulated totals at the end.

Go to top

Exercise

  1. What function do we use to open a file?
  2. What does it mean when the second parameter to the open() function is ‘a’?
  3. In what form can we write data to the file?
  4. What function do you use to write data to a file?
  5. When a file is newly opened where is the file pointer positioned?
  6. Explain the workings of the readline() function.
  7. Explain what the str() function does.
  8. Why is the str() function necessary when writing data to a file?
Go to top

Summary

Text File Processing

There are a number of ways of organising and processing text data in a file.

Whichever way you choose these features apply:

  • processing starts by opening the file using the open() function
  • This function always has a mode indicating how the file is to be opened:
    • 'r' read only
    • 'w' write. Writes data to the beginning of the file and overwrites any existing data
    • 'a' append. Writes data beginning at the end of the file, thus preserving existing data
    • 'rw' read/write. Can both read from the file and write back to it.

File Pointer

The file pointer applies to a text file that is opened by the open() function for either reading or writing

It is manipulated by the open(), readline(), write() and append() methods.

It determines where to start reading from the file or writing to it.

How the methods manipulate the file pointer

How the functionopen(filename, mode) influences the file pointer depends on the value of the parameter mode

  • 'r' read only: pointer is placed at the beginning of the file so that the reading can start from there
  • 'w' write: pointer is placed at the beginning of the file so that the writing can start from there
  • 'a' append: pointer is placed after the last byte in the file so that new data does not overwrite any of the existing data.
  • 'rw' read/write: In this mode the pointer can be moved anywhere in the file using the seek() method.

Functions for reading and writing Data

We have used only two functions here: one for reading data and one for writing data

  • readline(): this functions begins reading at the current position of the file pointer and moves forward until the first newline character is encountered. On completion the file pointer is positioned beyond beyond the newline character, ready for the next read.
  • write(): this function begins writing data at the current position of the file pointer and moves forward until the end of the data is reached. On completion the file pointer is positioned beyond beyond the newline character, ready for the next write.
Go to top

Assignment

Modify the previous assignment so that it writes its data to a text file instead printing it on the console.

Next create a separate application that both display the file contents but also accumulate the numeric values for each transaction stored in the file.

Go to top