Postscript: Creating business documents

Introduction

Many years ago mainframe computers produced business documents, such as invoices, statements, on multi-part preprinted continuous forms using high speed line printers. These would then be decollated, and trimmed and burst into individual pages for inserting into envelopes and posted to the customers.

Sample invoice

Today it is more likely that a desktop or free standing laser page printer will be used. This can be done using pre-printed paper or with a pre-loaded format held in the printer. Alternately a form could be created by the program, building the form by generating all the lines and text elements, specifying the font and text size, and including the images to be printed onto blank sheets of paper. This could be done using a specific language for a printer range or the more general postscript language.

Programming libraries and modules are available that cater for the program to draw the form and data items by calling functions which will output the appropriate postscript text. It is better, however, to separate the presentation and the data. By starting with a postscript template we could merge data into it and send it all to the printer.

There is also the potential to easily convert the postscript form into a PDF so that this can be sent to the destination using email rather than printing the form, placing these into envelopes and posting them.

This series of articles will cover the drawing of postscript templates, merging data into these to produce the completed form, printing or converting postscript into PDFs and emailing these as attachments.

Vector Drawing

The first part of the process is to create the document templates. This requires a vector drawing program that can output postscript. The program is used to draw the graphics of the document:: boxes, headings, logos, text and inserting images. The document will also contain the text areas where the data will be merged into. One major requirement of the drawing program is that it will output the postscript template in a way that makes each of the text data fields a contiguous text block that can easily be located and replaced with data items by the merge program.

Most vector drawing programs do not handle this in a usable way and will not write out a suitable template. In particular: Inkscape, LibreOffice, Dia, Karbon (Krita); cannot be used, The program that I use and recommend is 'tgif' which is available in many Linux distribution repositories and can be installed with your usual package manager or it can be downloaded from the author's website at http://bourbon.usc.edu/tgif/.

tgif X windows vector drawing program

Create a form template

Set the page size and orientation using the menu items:

Set the page size: 
  ''Layout' 
  'Stacked Page' 
  'Specify Paper Size...' 
  enter a4
Set the page orientation: 
  'Layout'
  'Portrait' or 'Landscape'

Draw he fixed items on the page: boxes, lines, shapes, text or logos can be added to create the form. Various sizes and fonts for the text are available. Lines and shading can be coloured with various styles, patterns and thickness.

Field tags, where data is to be merged into, should be created as text fields containing a marker that can easily be identified when processing the form. I use <!% at the start and %> at the end with the field name between the % signs. The field tag should be extended to the full number of characters that the data could occupy by adding hyphens between the % and > of the tag end.

For multi-line fields the text field should include all the possible lines with each subsequent line starting with a marker tag, I use <!>.

The required font, style and size attributes can be set for each field.

Sample template for an invoice

When the form is complete save it and export it as a postscript file: set the 'File' 'Print/Export Format' to PS (raw postscript) and then export with 'File' 'Print'.

The postscript file created by tgif is a text file with line terminators that can be inspected using a text editor. It will be quite verbose but it should be possible to find the field tags, such as POSTAL_ADDRESS:

% TEXT                                                         
NP                                                             
0 SG
   GS
      1 W
      64 288 M
      GS
            0 SG
            /Helvetica-Bold FF [14 0 0 -14 0 0] MS
            (<!%POSTAL_ADDRESS%--------------------->) SH
      GR
      0 17 RM
      GS
            0 SG
            /Helvetica-Bold FF [14 0 0 -14 0 0] MS
            (<!>) SH
      GR
      0 17 RM
      GS
            0 SG
            /Helvetica-Bold FF [14 0 0 -14 0 0] MS
            (<!>) SH
      GR
      0 17 RM
      GS
            0 SG
            /Helvetica-Bold FF [14 0 0 -14 0 0] MS
            (<!>) SH
      GR
      0 17 RM
      GS
            0 SG
            /Helvetica-Bold FF [14 0 0 -14 0 0] MS
            (<!>) SH
      GR
   GR

If a different drawing program has been used and the tags do not appear as they have been input, then the drawing program may not be suitable as it may not be possible for the merge program to identify these tags reliably.

Testing the Template

Use program, pscheck.py, to read the template and extract all the tags in order to produce a report showing all the tag names with the length of the data and the number of repeat lines.

This will ensure that the template is suitable for merging as well as producing a check list for producing the merge data.

pscheck.py

#!/bin/env python3                                             
import sys                                                     
import __future__ 
                                             
class TemplateCheck(object):                                   
    def __init__(self):                                        
        self.error = ''                                        
        self.fields = []                                       
        self.buffer = ''                                       
    def check(self, filename):                                 
        try:                                                   
            fd = open(filename, 'r')                           
        except:                                                
            self.error = "Template not found: " + filename     
            return 0                                           
        else:                                                  
            self.buffer = fd.read()                            
            fd.close()                                         
            previous_tag = 0                                   
            # string.find returns -1 when it fails 
            while previous_tag >= 0:                           
                tag_start = self.buffer.find('<!%', previous_tag)
                if ( tag_start >= 0 ):                           
                    tag_end = self.buffer.find('>', tag_start+3)
                    name_end = self.buffer.find('%', tag_start+3)
                    field_name = self.buffer[tag_start+3:name_end]
                    field_length = tag_end-tag_start+1        
                    field_repeats = 1                             
                    repeat_list = []                              
                    start_search = tag_end+1                      
                    # only search for repeats up to next field tag ...
                    repeat_end = self.buffer.find('<!%', tag_end)
                    if ( repeat_end < 0 ):                            
                        # ... or end of buffer
                        repeat_end = len(self.buffer)
                    while start_search < repeat_end:
                        repeat_pos = self.buffer.find('<!>', start_search)
                        if ( repeat_pos > 0 and repeat_pos < repeat_end ):
                            field_repeats += 1
                            repeat_list.append(repeat_pos)
                            start_search = repeat_pos + 3
                        else:
                            start_search = repeat_end

                    self.fields.append((field_name, field_length,
                                    field_repeats, tag_start, repeat_list))
                    previous_tag = tag_end
                else:
                    previous_tag = tag_start
            return len(self.fields)
    def report(self):
        if ( self.error ):
            print(self.error)
        else:
            for (field_name, field_length, field_repeats, tag_start, repeat_list) in self.fields:
                print("%-20s %4d %4d" % (field_name, field_length, field_repeats))

if ( __name__ == "__main__" ):
    if ( len(sys.argv) > 1 ):
        template_filename = sys.argv[1]
        print(template_filename)
        tc = TemplateCheck()
        tc.check(template_filename)
        tc.report()
    else:
        print("pscheck.py: Template filename required.")

The program starts with a 'shebang' to make the program text executable. Python 3 is used but the 'import __future__' makes the code runnable with Python 2. The program is written to accept the filename of the template as a parameter on the command line so the sys module is imported.

The body of the code is written as a class to cater for this to be reused by other programs if this should be necessary.

The line 'if ( __name__ == "__main__" ):' introduces the code that is to execute only if the program is run directly as a main program. This will not execute if the program file is imported into another program in order to use the class. This code checks that a filename has been supplied on the command line, creates a TemplateCheck instance, executes the check method on the template to build a list of the data fields and then executes the report method to print this list.

The class TemplateCheck inherits from the Python base object. The __init__ method initialises an error text attribute and a list to hold the field data.

The check method is supplied with the template filename as a parameter. It opens this file and reads it into a buffer, putting a message into the error attribute if the open fails.

The buffer is then searched for each tag start '<!%' in turn. The string.find method is given the second parameter which is the position to start the search in the buffer (lastpos). This is initially set to the start of the buffer, but after finding and processing a tag it is set to the end of the tag or, when no next tag is found, to the end of the buffer. Note that string.find returns -1 if it fails to find the required string.

The '%' that ends the field name is searched for starting after the tag start by adding 3 to its position (spos). The name of the field is extracted between these '%' characters.

The tag terminator '>' is searched for after the tag start position. The difference between these positions gives the field length, 1 must be added to include the terminator in the length.

In order to count the field repeat tags it is necessary to locate the next tag to set a limit on how far the search can be done (rend). Again a position (lastrep) is updated after finding each repeat tag, or to the search ending position if no more are found within the specified positions. The start positions are appended to a list.

The field name, length, repeat count, start position and a list of the repeat tag start positions is appended as a tuple to the fields list.

The report method prints the error message if there is one, or the list of fields.

$ ./pscheck.py TestInvoice.ps
TestInvoice.ps
POSTAL_ADDRESS         40    5
DELIVERY_ADDRESS       40    5
INVOICENO              14    1
INVOICEDATE            16    1
ACCOUNT                12    1
ORDERREF               13    1
ORDERDATE              14    1
COPY                    9    1
COMMENTS               60    2
PRODUCT                12   40
DESCRIPTION            32   40
QTYORD                 11   40
QTYDEL                 11   40
PRICE                  10   40
VALUE                  10   40
CONTINUE               13    1
TOTAL                  10    1
TAX                    11    1
INVTOTAL               13    1

Postscript: Merging data

My programs

Many years ago, in the days when MS-DOS was the operating system that was mostly used for desktop computers, I was developing business applications for small businesses that ran on multi-user systems such as DR-Multiuser-DOS or on networked PCs.

When I wanted to develop laser printed forms I purchased a propriety package that included a drawing program for the form templates, this ran under Windows 2, and a batch merge program that ran under MS-DOS. The merge program used a particular file format for the data to be inserted into the forms. This specified the form to be used and then each of the fields with the text data to be used. When I moved to using Linux I wrote my own merge program that used the same data files but merged into postscript templates.

^form TestInvoice.ps                                           
^field POSTAL_ADDRESS                                          
RETAIL ELECTRONICS LTD                                         
PO BOX 55701                                                   
READING                                                        
BERKSHIRE                                                      
3RG 9PO                                                        
^field DELIVERY_ADDRESS                                        
RETAIL ELECTRONICS LTD                                         
UNIT 4                                                         
777 HIGH ST                                                    
READING                                                        
BERKSHIRE 3RG 9AH                                              
^field INVOICENO                                               
019658                                                         
^field INVOICEDATE                                             
2018-03-02                                                     
^field ACCOUNT                                                 
RETELEC                                                        
^field ORDERREF                                                
PO 17812                                                       
^field ORDERDATE                                               
2018-03-01            
...
^end

Several forms can be included in the file and they may be for different form templates. Each data field is identified by the ^field heading that includes the tag name. Each heading is followed by the data value for that field. Where the field is multiline then there may be multiple lines of values.

The merge data file would be written by the business application as an alternative to printing on, say, multipart forms. If the program cannot be modified to do this then it may be possible to have the application write a print spool file to disk and then process this, extracting the data items to create the merge file, or to activate the merging directly.

Merge Program Design

Originally I wrote the merge program in the C programming language as this was available on all the operating systems that I used at the time. Later, I reimplemented it in Python. This actually proved to run faster due to the efficiency of Python's text handling compared to C's standard string handling routines and the use of a Python dictionary for the data values compared to a C array searched serially.

The program will read the merge file. When it finds a '^form' line it will first check if a previous form is to be output, it will read in the specified postscript template and then will read the merge file until the next ^form or a ^end and will store all the data values in a dictionary indexed by the tag names of each ^field line.

open merge fiile
initialize data area
read merge file line
    if ^form
        check if output required
        initialize data area
        save template name
    elif ^end or end of file
        check if output required
    elif ^field
        store field name
        create data list item in dictionary
    else
        add value to current data list

Output requires reading the specified postscript template and merging the data into the tags and writing this to the output file. The pscheck.py program is reused by importing the TemplateCheck class and using this to load the template into its buffer and create a list of the fields that will be replaced by the data items.

The output files will have a filename that is serially numbered to avoid overwriting the previous file.

create instance of TemplateCheck
execute check
open output file
for each field
    write text up to start of tag
    write dataitem
    for each repeat item
        write text up to start of tag
        write dataitem
write remainder of text

Executing or Importing

When a Python program is executed at the command line, either by running the Python run time and giving the program as a parameter to that, or by executing the program and having the run time loaded by the shebang, then the program has all the lines in the outer level of the program executed. Thus, in pscheck.py, the program is compiled and the line 'if ( __name__ == "__main__" ): is run. The run time sets the __name__ local variable to "__main__" when this is the 'main' program that has been run from the command line. In this case the condition is true and the following block of code is executed.

When this module is imported by another Python program the same process happens: the code is compiled and the the outer level is executed. In this case, however, the __name__ variable is set to the name of the module, in this case "pscheck", so the block following the if statement is not executed.

This feature can be used to include a test run in a module or, in this case, use the file as a useful executable while reusing it as a module in other programs.

Merge Data Values

The tags will be replaced by the data values for the form. In the postscript file these are held between parentheses. This means that unescaped parentheses cannot be allowed in the data value itself. During the merge process the value should have any parentheses replaced by the escaped version.

dataitem = ....replace('(', '\(').replace(')', '\)')

Alternative merge file formats

Several other formats for the merge file could be used instead of the one outlined above and used by the program code supplied. XML (eXtensible Markup Language) and JSON (JavaScript Object Notation) would probably be the ones that are most popular currently and both have extensive support from many programming languages, including Python.

Merge Program

The program has been written as two classes. The first, TemplateMerger, will process the merge file and will build a dictionary for the field data along with other requirements. Each set of data, along with the form template file name, will be passed to an instance of the second class to be output.

#!/bin/env python3                                             
import sys, os                                                 
import __future__                                              
from pscheck import TemplateCheck                              

class TemplateMerger(object):
    def __init__(self):      
        self.formname = ''   
        self.output_files = 0
        self.continuation = False
        self.command = ''        
        self.formdata = {}       
        self.error = ''          
    def process(self, merge_filename, output_base):
        try:                                       
            fd = open(merge_filename, 'r')         
        except:                                    
            self.error = "Mergefile not found: " + merge_filename
            return 0                                             
        else:                                                    
            self.merger = Merger(output_base)                    
            field_name = ''                                      
            for raw_line in fd:                                  
                line = raw_line.rstrip()                         
                if ( line.startswith('^end') ):                  
                    self.check_output()                          
                elif ( line.startswith('^form') ):               
                    self.check_output()                          
                    formspecifier   = line.split()[1].split(',') 
                    self.formname   = formspecifier[0]           
                    if ( len(formspecifier) > 1 and formspecifier[1] == 'c' ):
                        self.continuation = True                              
                    else:                                                     
                        self.continuation = False                             
                    field_name = ''                                           
                elif ( line.startswith('^command') ):                         
                    self.command = line.split(None, 1)[1]                     
                elif ( line.startswith('^field') ):                           
                    field_name = line.split()[1]                              
                    self.formdata[field_name] = []                            
                else:                                                         
                    if ( field_name ):                                        
                        self.formdata[field_name].append(line)                
            self.check_output()                                               
            fd.close()                                                        
            return self.output_files                                          
    def check_output(self):                                                   
        if ( self.formdata ):                                                 
            self.output_files += self.merger.check_output(self.formname, self.formdata, self.continuation)
            if ( self.command ):                                                                          
                self.merger.issue_command(self.command)                                                   
                self.command = ''                                                                         
            self.formdata = {}

The second class does the actual merging of the data into the template.

class Merger(object):
    def __init__(self, output_base):
        self.output_base = output_base
        self.output_seq = 0           
        self.output_name = ''         
    def check_output(self, formname, formdata, continuation=False):            
        self.formname = formname                                               
        self.formdata = formdata                                               
        result = 0                                                             
        if ( self.formname ):                                                  
            tc = TemplateCheck()                                               
            if ( tc.check(self.formname) ):                                    
                if ( continuation ):                                           
                    # append to previous output file                           
                    output_mode = 'a'                                          
                else:                                                          
                    # create new output filename and write new file            
                    self.output_seq += 1                                       
                    output_mode = 'w'                                          
                    result = 1                                                 
                self.output_name = self.output_base + "%04d" % (self.output_seq) + '.ps'
                fd = open(self.output_name, output_mode)                                
                last_pos = 0                                                            
                #outlist = []                                                           
                for (field_name, field_length, repeat_count, start_pos, repeat_list) in tc.fields:
                    self.set_datalist(field_name)                                                 
                    dataitem = self.next_datalist()                                               
                    fd.write(tc.buffer[last_pos:start_pos])                                       
                    fd.write(dataitem)                                                            
                    last_pos = start_pos + field_length                                           
                    for repeat_pos in repeat_list:                                                
                        dataitem = self.next_datalist()                                           
                        fd.write(tc.buffer[last_pos:repeat_pos])                                  
                        fd.write(dataitem)                                                        
                        last_pos = repeat_pos + 3                                                 
                fd.write(tc.buffer[last_pos:])                                                    
                fd.close()
        return result
    def set_datalist(self, field_name):
        self.datalist = self.formdata.get(field_name, [])
        self.datalen = len(self.datalist)
        self.dlindex = 0
    def next_datalist(self):
        if ( self.dlindex < self.datalen ):
            dataitem = self.datalist[self.dlindex].replace('(','\(').replace(')','\)')
            self.dlindex += 1
        else:
            dataitem = ''
        return dataitem
    def issue_command(self, command):
        params = {"filename": self.output_name,
                  "basename": self.output_name.replace('.ps', '')}
        os.system(command % params)

The mainline accepts two parameters from the command line, the first specifies the merge file to be used and the second gives a base name for the output files. These are passed to the process method.

Having the mainline under the control of a conditional allows the file to be imported into another program so that the classes can be reused.

if ( __name__ == "__main__" ):
    if ( len(sys.argv) > 2 ):
        merge_filename = sys.argv[1]
        output_base = sys.argv[2]
        print(merge_filename)
        tm = TemplateMerger()
        result = tm.process(merge_filename, output_base)
        if ( result == 0 ):
            print(tm.error)
        print(str(result) + " files output.")
    else:
        print("psmerger.py: Merge file filename and output base required.")

Postscript: Printing and PDF

CUPS

CUPS, the Common Unix Printing System, is the standards-based, open source printing system developed by Apple Inc. for MacOS and Unix-like operating systems, including Linux.

CUPS caters for a large number of printing formats and will convert these into formats suitable for the destination device. Sending a postscript file to a CUPS server will result in the document being correctly formatted even if the printer does not directly support postscript.

Printing of a file using Linux can be done with a command:

lpr -P destination filename

This adds the file to the print queue for that destination. If the file is to be deleted after printing then the -r option should be included.

PDF

Portable Document Format encapsulates all the text, fonts, and graphics required to consistently display the contents regardless of the underlying system. There are many programs freely available to display these documents and they can also be displayed in many web browsers. This makes PDF an ideal format to distribute documents.

Converting Postscript to PDF

Most Linux distros will have PDF conversion and utility programs which run in a script. A common one is ps2pdf. This uses Ghostscript to do the actual conversion and there are various options which can be entered on the command line to be passed to this, such as setting the paper size.

ps2pdf -sPAPERSIZE=a4 name.ps

Multipage Postscript and PDF

Business forms may require multiple pages. This may be because, for example, there are more invoice line items than the basic form caters for, or it is because several different, related forms are required.

In the first case there may be a main form and then different continuation forms which may have a similar layout but. perhaps, a smaller header and the space used for more line items, or the same main form may be used with fields indicating that it is continued. In either case page numbers may be required.

If a set of related forms is required, such as a packing slip or a despatch note along with an invoice, then this can be catered for in the merge file by having a ^form line specifying each form and following it with the appropriate field data.

What is required, especially in the case where the forms are to be converted into PDFs and emailed, is that several forms should be able to be output into a single postscript file. This would then be converted into a single PDF containing those related forms. This will make management of the final output, by printing or emailing, much simpler.

The program is to be modified so that the file grouping is indicated by the ^form directive by adding a qualifier ',c' and changing the output file handling allowing several calls to the merge process to output to the same postscript file.

Adding a command

While the merge program will produce a series of merged postscript files, it would be useful have these processed further to send them to an appropriate printer or into PDFs for emailing, and even to email them to the intended person.

Printing can be done by adding the postscript file to the print queue by issuing an 'lpr' command using the os module system() function:

os.system("lpr -P printer %(filename)s" % {"filename":filename})

If it is required to email a PDF then the output file can be converted to a PDF using the command:

os.system("ps2pdf -sPAPERSIZE %(filename)s" % {"filename":filename})

This would create a PDF file of the same basename as the original postscript file with the '.ps' removed and replaced by '.pdf'.

This PDF file could then be sent by email to the appropriate destination by emailing it using the Python smtplib module.

There are various ways that the program could be made to execute a command. One would be to add a ^command line in the merge file at the point that the command should be issued, just before the next ^form or the ^end lines. The program would supply the filename and the basename, without the .ps extension, of the postscript file to cater for these being incorporated into the command using the % operator.

^command lpr -Pprinter %(filename)s

Creating Merge Files

This series has covered using a vector drawing program to create postscript templates for business forms, the merging of data into these to create documents and the printing of these or the conversion of them to PDF format.

The code supplied used a particular format for the merge data, many other formats could be used, but it will still be necessary to capture this data from the appropriate business system in some usable way.

This is relatively easy to do, given sufficient programming skills, if the source code of the system is available, such as with in-house systems or open-source applications. Without this it may be necessary to intercept data in various ways, perhaps directly from the database, or from documents 'printed' to disk files.

One system that I developed for a client was to control a warehouse. This would be remote from the business's primary computer systems which were used for general business and included sales orders. With no access into their applications, the only way of getting data was that they would print the orders, in the form of packing slips which would normally go onto preprinted continuous stationery, to a serial printer over a remote line to the warehouse. I was able to plug that line into a serial port of the warehouse computer and capture the print lines to disk files, subsequently sending them to an actual printer.

The captured print lines could then be processed to extract the details of the orders and put these into the warehouse system. These orders would then be used to print the various documents, including pick lists, courier and delivery labels, hazardous materials documents and manifests.

Adding a Barcode

Images can be included in Postscript files and tgif will incorporate them in various formats. In order to include a barcode with a particular value, such as an order number on a packing slip, it will be necessary to include a dummy barcode image in the Postscript template and then replace this during the merge procedure.

There are various ways to create barcodes with Python. I originally used the elaphe module that is available in the 'Python Package Index'. A version for Python 3 is elaphe3.

I chose to use code39 as these are generally available to most barcode readers and to write the barcode in embedded postscript (EPS). from elaphe import barcode

number = '1234567'
filename = 'test.eps'
im = barcode('code39', number,
        options=dict(height=0.5, includetext=True),
        margin=10, data_mode='8bits')              
im.save(filename, 'EPS')

A similar module that works with later versions of both Python 2 and Python 3 is 'treepoem' (This is a pun on 'bark ode'). It requires Python 2.7 or Python 3.5 or later.

This module is a wrapper around BWIPP (Barcode Writer in Pure PostScript). This requires GhostScript which should already be installed in most versions of Linux.

import treepoem
image = treepoem.generate_barcode(
        barcode_type='code39',
        data='00123456',
        options={"includetext":True, "height":"0.5", "width":"1.5"})
image.save('test.eps')

The generated file will be similar to:

%!PS-Adobe-3.0 EPSF-3.0
%%Creator: PIL 0.1 EpsEncode
%%BoundingBox: 0 0 148 63
%%Pages: 1
%%EndComments
%%Page: 1 1
%ImageData: 148 63 8 3 0 1 1 "false 3 colorimage"
gsave
10 dict begin
/buf 444 string def
148 63 scale
148 63 8
[148 0 0 -63 0 63]
{ currentfile buf readhexstring pop } bind
false 3 colorimage
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
...
ffffffffffffffffff
%%%%EndBinary

After this has been inserted into the template the resulting postscript file will contain:

414.000 635.625 translate 0.562 0.562 scale 0 rotate
[1.432 -0.000 -0.000 1.762 0 0] concat

%%BeginDocument: test.eps
%!PS-Adobe-3.0 EPSF-3.0
%ImageData: 148 63 8 3 0 1 1 "false 3 
[148 0 0 -63 0 63]
...
{ currentfile buf readhexstring pop } bind
false 3 colorimage
ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff
...
ffffffffffffffffff
%%%%EndBinary
grestore end
%%EndDocument

It will be necessary to replace the lines in the template between '%ImageData: ...' and '%%%%EndBinary' with the corresponding lines in the required barcode.

The program code will need to detect the '%%BeginDocument name' line, ensuring that the name is that of the dummy barcode that had been added to the template, and then hand the template and output files over to a function to do the replacement.

This code is to be included in the program where the template text block up to the next tag is to be output to the file, and also after all the tags have been processed and the remaining template text will be written. This should call a method that will check whether the text contains the trigger %%BeginDocument along with the correct image name. It will then replace the text as above and then return the new text where it will be written. If a trigger is not found then the text is returned without change.

Sending an email

Message Format

The email message format is defined by RFC 5322. This requires certain header fields, the header being terminated by a blank line, the body of the message would follow.

There are many valid header fields that can be included, and additional ones will be be added by email client programs and servers, but a basic set for a simple text message would be:

To:         
From:       
Subject:    
Date:

Sample program

To send a message of this form a Python program can use the email module to assemble the message and the smtplib module to send it to an email server. This code is for Python 3, to run it using Python 2 you would need to import the __future__ and change the input()s to raw_input()s.

You will need to set your own email address as the 'send from' and set your mail server.

The program requests, at the terminal, the address to send the message to, a subject and several lines of text. Entering a full stop on a line by itself will terminate the input. The email module's MIMEText class will assemble the message. The various header details are added, including the date in the correct format using the time module. The smtplib module sends it.

#!/usr/bin/python3
import time
from email.mime.text import MIMEText
import smtplib

if ( __name__ == "__main__" ):
    datefmt = "%a, %d %b %Y %H:%M:%S +0000"
    send_from = "your email"
    smtp_server = "your mail server"

    send_to = input("Send email To: ")
    subject = input("Subject: ")
    message_text = ''
    mline = ''
    print("Message, end with '.'")
    while mline != '.':
        mline = input("> ")
        message_text += mline + '\n'

    msg = MIMEText(message_text)
    msg['Subject'] = subject
    msg['From'] = send_from
    msg['To'] = send_to
    msg['Date:'] = time.strftime(datefmt, \
        time.gmtime())
    s = smtplib.SMTP(smtp_server)
    s.sendmail(send_from, send_to, msg.as_string())
    s.quit()

Attaching a file

If non-ASCII file is to be attached then this is specified by RFC 2045 through RFC 2049, collectively called Multipurpose Internet Mail Extensions or MIME. The MIME version and a content type declaration should be added to the email header.

The attachment would then be added after the blank line terminating the email header and after any text message. The actual file needs to be encoded such that it consists only of ASCII characters. This can be done using an encoding program such as mimeencode or the Python module base64.

The encoded data, and a descriptive header, should be added to the email between boundaries. The text being used as a boundary must be specified in the Content-Type header, this can be any character string that will not be confused with any content:

To:         <receiver address>
From:       <sender's address>
Subject:    <subject>
Date:       <date>
MIME-Version: 1.0
Content-Type: Multipart/Mixed;
     boundary="XXXXboundaryXXXX"

The attachment starts and ends with a boundary line each starting with two hyphens followed by the boundary text given in the email header. The attachment header, which is terminated by a blank line, specifies the MIME type, a filename and the encoding used.

--XXXXboundaryXXXX
Content-Type: application/pdf
Content-Disposition: attachment; 
    filename="invoice123654.pdf"
Content-Transfer-Encoding: 8bit

<encoded attachment data>
--XXXXboundaryXXXX

The MIME type can be specific to the content to direct the receiver to use the correct program to open the attachment once decoded.

The program uses the MIMEMultipart and MIMEBase classes in the email module. To encode the attachment this module has encoding functions.

You will need to set the send_from and smtp_server variables to your own addresses.

The program uses command parameters for the email destination and the filename to be attached. The filename is also used as the email subject.

In this case the base email outer message object is created with the MIMEMultipart class. As before the various headers are set.

The attachment object is created using the MIMEBase class with the maintype and subtype specified. For a PDF file these are 'application' and 'pdf'. The payload is added to the attachment by reading it as bytes with the 'rb' mode used in the open() statement. This payload in then encoded to base64 and the header added specifying the filename for the email receiver to use.

The attachment is attached to the outer message object and the as_string() method used to convert the object into text suitable for sending with the smtp module.

#!/bin/env python3                                                                                
import sys                                                     
import time                                                    
import smtplib                                                 

from email import encoders
from email.mime.base import MIMEBase
from email.mime.multipart import MIMEMultipart

datefmt = '%a, %d %b %Y %H:%M:%S +0000'
send_from = 'your email'                 
smtp_server = 'your isp'                        

if ( __name__ == "__main__" ):
    if ( len(sys.argv) < 3 ):
        print("Email address and filename required.")
        sys.exit(1)

    send_to = sys.argv[1]
    filename = sys.argv[2]
    subject = filename

    outer = MIMEMultipart()
    outer['From'] = send_from
    outer['To'] = send_to
    outer['Date'] = time.strftime(datefmt, \
        time.gmtime())
    outer['Subject'] = subject

    fd = open(filename, 'rb')
    buffer = fd.read()
    fd.close()

    msg = MIMEBase('application', 'pdf')
    msg.set_payload(buffer)
    encoders.encode_base64(msg)
    msg.add_header('Content-Disposition', \
        'attachment', filename=filename)

    outer.attach(msg)
    composed = outer.as_string()

    s = smtplib.SMTP(smtp_server)
    s.sendmail(send_from, send_to, composed)
    s.quit()

Azonic Associates 2021

About the Author

Richard Plinston joined International Computers and Tabulators (ICT) in New Zealand in September 1968 as a trainee mainframe operator for the 1901 system. The company changed its name at the end of that month to International Computers Ltd (ICL) as the parent company joined with English Electric Leo Marconi and other UK computer companies to form one large company to compete with the American computer giants.

Richard stayed with ICL for 17 years working through programming, systems design, technical support, project management and many other areas, latterly working with micro computers and multi-user systems.

He left ICL to form his own company, Azonic Associates, to specialise in building bespoke systems to meet individual clients specialized needs. Most of the systems up to the turn of the centuary were developed in COBOL on multiuser micro computers running Concurrent-DOS or derivatives and, later, Linux.

In the last 20 years most of the development was done using the Python language on Linux for automated Electronic Data Interchange (EDI) between logistics companies and their clients.

Contact email: ps@Azonic.co.nz