Monday, March 26, 2018

Automating Newsletter Creation With LibreOffice and Python


I am involved in the creation of a publication for a non-profit. It is an organization which provides Christian worship services at 18 area elder care facilities (nursing homes, assisted care facilities, etc.). The publication must be customized to have the service time for each facility. There is one publication per week, so it must be customized to have the week dates. We have services over three days (Tuesday-Thursday) and they were at five different times. I have been using LibreOffice to create the documents and then converting them to PDFs and emailing them to the facilities for printing copies for the people who attend the services. I would produce a month at a time. I used fields in the LibreOffice Open-Document-Text (ODT) documents to make editing the dates and times simpler, but this was still taking several hours to complete the job each month. Then the program director came up with two more times. And then two more. This was way too much tediousness for an old guy like me. Too many opportunities for mistakes. Too boring. Let's automate it!

I have done a lot of VBA programming in Microsoft Office tools in the past. I have been poking around the edges of programming in LibreOffice with Uno and Python since the early days, but I always quit because the documentation was too shallow, complexity too deep and patience too short. Then, in my latest spurt of motivation, I ran into this post by Philip at PySpoken.com. It suggested a different approach of unpacking the ODT document (it is really just a zip file), editing the XML contents and then repacking it into a changed ODT document. I had never thought of approaching it this way. It turns out it works beautifully. In addition, Philip recommended the unoconv project for converting the .ODT document to PDF using LibreOffice. Dag Wieers created unoconv. It seems to work well enough, but may be a bit fussy about errors, so I put in a retry mechanism for its use, to ensure the output actully got created. Overall, thanks Philip for giving me the pieces to get this job done!

I used Python 3.6 for this. F-strings are used. ;-)

Dig into the Github repo for the full set of files. Also have a look at the Faith, Hope and Peace Ministries website (where you can read these devotions).

I decided to generalize my approach, so that it could handle additional days being added. Very little would be hard-coded, but the code would figure out what to do. It does assume it is being run the month prior to the document's target dates and at least 3 days into that prior month. It expects to find 4 or 5 documents of the form Devotion-1.odt ... Devotion-4.odt. The number depends on how many weeks our target days hit in that next month. The way to count this is the number of times our starting weekday occurs in the month. As it stands, we start on Tuesday, so if there are 5 Tuesdays in a month, there will be 5 documents for that month. I don't divide the week if a month ends on Tuesday or Wednesday; whatever our date range is. Only the starting weekday matters. Now, I did set parameters based on a couple of data structures declared up top. These could theoretically be loaded from a data file.

# The next two data structures, days and daysTimes must be sorted by weekdays

# days indicates which weekdays are included, and the numerical equivalent for the datetime module
days = (('Tuesday',2),('Wednesday',3),('Thursday',4)) 

# daysTimes shows required days and times, organized in a hierarchy of 
# weekdays corresponding to days above

daysTimes = (('10:00 AM','10:30 AM','4:00 PM'),  # Tues
             ('10:00 AM','10:30 AM'), # Weds
             ('10:00 AM','10:30 AM','1:00 PM','3:00 PM')) # Thurs
A couple more parameters are covered here: how our input files are named and the directory for unzipping.
inputFiles = [f'Devotion-{n}.odt' for n in range(1,6)]
subdir = 'unpack'
The user fields show up in the LibreOffice with some XML tags. These regular expressions will capture them, and are used for making replacements.
# regular expression library
reDay = re.compile(r'<text:user-field-decl office:value-type="string" ' + \
                   r'office:string-value="[^"]+" text:name="Day"/>')
reDate = re.compile(r'<text:user-field-decl office:value-type="string" ' + \
                    r'office:string-value="[^"]+" text:name="DateRange"/>')
reTime = re.compile(r'<text:user-field-decl office:value-type="string" ' + \
                    r'office:string-value="[^"]+" text:name="Time"/>')
reTitle = re.compile(r'<dc:title>[^<]+</dc:title>')
Next, some rather elaborate machinations are taken to find the first of the next month, establish some other calendar properties and create a cycle list. The cycle list will allow the workingDate to be advanced over our weekdays and then into the next week.
# build up calendar
firstDay = days[0][1] # numerical first weekday of our schedule
nextMonth=date.today()+timedelta(days=27) # pick a day in the next month, to get the month/year
                                          # correct
firstNextMonth = date(nextMonth.year,nextMonth.month,1) # find the first of next month
workingMonth = firstNextMonth.month
monthName = firstNextMonth.strftime('%B')
# how many days from the first of the next month to our first active weekday
n = firstNextMonth.isoweekday()
activeOffset = (firstDay - n) if (n < firstDay) else (firstDay + 7 - n)
# this value is initialized, but will be incremented as we do work:
workingDate = firstNextMonth + timedelta(days=activeOffset)
# now come up with the delta times to cycle through dates - so days to go from first weekday
# to second, second to third, ... and last to the first weekday of the next week
cycle = [ days[i+1][1] - days[i][1] for i in range(len(days)-1) ]
cycle.append( 7 + days[0][1] - days[-1][1])
Change to the place where all the action happens. Create the output directory, named after the next month.
# working directory
os.chdir('c:/Users/buchs/odp/Documents/FHP-Ministries/Materials')
# output directory same as month name
if not os.path.exists(monthName):
  os.mkdir(monthName)
Next comes the three nested loops. The outer loop iterates over the weeks. The next loop iterates over the week days. Finally the inner loop iterates over the times for a particular weekday. For each iteration of the inner loops, the values for the content.xml and meta.dat files are updated accordingly. Then that content is used to overwrite those files, and finally, the contents of the unpack directory are zipped up to form a new .ODT document.
# loop over the weeks, stop when we hit the first weekday in the next month
weekIndex = 0
while workingDate.month == workingMonth:

  print("starting with ",inputFiles[weekIndex])
  # unpack our input file.
  zf = zipfile.ZipFile(inputFiles[weekIndex],'r')
  zf.extractall(path = subdir)
  zf.close()

  # work in unpacked dir
  os.chdir(subdir)
  
  # grab content to be ready to edit content
  fp = open('content.xml')
  content = fp.read()
  fp.close()

  # grab meta data to be ready to edit it
  fp = open('meta.xml')
  meta = fp.read()
  fp.close()
  

  # loop over days of week
  for dayIndex in range(len(cycle)):

    print('date is ',workingDate.isoformat())
    # Update the day of the week and date in the content
    daySub = f'<text:user-field-decl office:value-type="string" ' + \
             f'office:string-value="{days[dayIndex][0]}" text:name="Day"/>'
    content = reDay.sub(daySub,content)
    dateString = workingDate.strftime('%B %d, %Y').replace(' 0',' ')
    dateSub = f'<text:user-field-decl office:value-type="string" ' + \
              f'office:string-value="{dateString}" text:name="DateRange"/>'
    content = reDate.sub(dateSub,content)
    
    for timeIndex in range(len(daysTimes[dayIndex])):
      
      thisDayTime = daysTimes[dayIndex][timeIndex]
      # make a simple form of time for naming the files
      timeSimple = '-' + thisDayTime.replace(':','').replace(' AM','').replace(' PM','') + '-'
      
      timeSub = f'<text:user-field-decl office:value-type="string" ' + \
                f'office:string-value="{thisDayTime}" text:name="Time"/>'
      content = reTime.sub(timeSub,content)
      
      # overwrite the content file
      fp = open('content.xml','w')
      fp.write(content)
      fp.close()

      # overwrite the metadata file with document title
      dateStmp = workingDate.strftime('%b-%d-%Y')
      titleSub = f'<dc:title>Devotion {days[dayIndex][0][0:3]} {thisDayTime} ' + \
                 f'{dateStmp}</dc:title>'
      meta = reTitle.sub(titleSub,meta)

      # overwrite the meta file
      fp = open('meta.xml','w')
      fp.write(meta)
      fp.close()

      # Create new output file and open as zipfile
      outputFile = '../' + monthName + '/' + days[dayIndex][0][0:3] +  \
                   timeSimple + dateStmp + '-' + inputFiles[weekIndex]
      # like: Devotion-1-Tue-1000-Apr-01-2018.odt
      zf = zipfile.ZipFile(outputFile,'w')

      # write files, subdirs and files in subdirs to this zip file
      for f in os.listdir('.'):
        zf.write(f)
        mode = os.stat(f).st_mode
        if stat.S_ISDIR(mode):
          for g in os.listdir(f):
            zf.write(f+'/'+g)

      zf.close()
      print(f'Wrote {outputFile}')

    # We reach the end of times for a given day, now advance the date.
    # This will allow tracking when we bump into next month.
    # This will automatically take care of the week jumps too.
    workingDate += timedelta(days=cycle[dayIndex])

  # and we are on to the next week
  weekIndex += 1
  os.chdir('..')
  shutil.rmtree(subdir) # clean up unpacked files to prepare for next
Now all the ODT output files have been created. Time to covert them to PDF.
# Now, convert the ODT files to PDF files
os.chdir(monthName)
# start the converter server
subprocess.run('python c:/python36/Scripts/unoconv --listener &',shell=True)
time.sleep(20)
# make one pass through everything
for fn in glob.glob('*.odt'):
  subprocess.run(f'python c:/python36/Scripts/unoconv -f pdf {fn}',shell=True)

# now cycle through looking for missing pdf files, because unoconv can fail.
missing = 1
while missing > 0:
  missing = 0
  for fn in glob.glob('*.odt'):
    pdfn = fn.replace('.odt','.pdf')
    if not os.path.exists(pdfn):    
      subprocess.run(f'python c:/python36/Scripts/unoconv -f pdf {fn}',shell=True)
      missing += 1
  print('missing ',missing)