IT Blog: 2018

Wednesday, May 2, 2018

How to use the aws cli and jq to list instance id and name tag

Using the Amazon Web Services CLI (command line interpreter tool), you can get information about virtual machines (EC2s). The information is JSON format, but it is quite voluminous. If you just want to see the two fields of Instance Id and the Name tag, it can be a challenge because the tags consist of an array of tag names paired with tag values. So, the magic here is the jq code necessary to grab the right tag.


# build up the jq code in two parts
jqProgram1='.Reservations[].Instances[] | (.Tags | '
jqProgram2='from_entries) as $tags | .InstanceId + ", " + $tags.Name'
jqProgram="$jqProgram1 $jqProgram2"

aws --profile myprofile --region us-east-1 ec2 describe-instances \
   --filters Name=tag-key,Values=Name |   \
   jq "$jqProgram"

Example output:

"i-0f5f9271233816c3f, instance-name-1"
"i-0bd45657eefdb0345, instance-name-2"
"i-00fab46ea64a78997, instance-name-3"

The jq program captures the values of each tag as the variable tags which then has a .Name field to represent the Name tag.

Sunday, April 29, 2018

Javascript coding challenge - async callback/variable scope

The Problem

My son asked about this ECMAScript/Javascript code, where he was using Google's geocoding and mapping APIs to populate a map with markers.


<script type="text/javascript">
function test() {
    var locations = [
        ['100 N Main St Wheaton, IL 60187', 'Place A'],
        ['200 W Apple Dr Winfield, IL 60190', 'Place B']
    ];
    var map = new google.maps.Map(document.getElementById('map'), {
        zoom: 10,
        center: new google.maps.LatLng(41.876905, -88.101131),
        mapTypeId: google.maps.MapTypeId.ROADMAP
    });
    var infowindow = new google.maps.InfoWindow();
    var marker, i;
    var geocoder = new google.maps.Geocoder();
    for (i = 0; i < locations.length; i++) {
        geocoder.geocode({
            'address': locations[i][0]
        }, function(results, status) {
            marker = new google.maps.Marker({
                position: results[0].geometry.location,
                map: map,
                title: locations[i][1]
            });
        });
    }
}

It was broken, being unable to find locations[i][1] to assign to title.

My response:

So, what would I do?

Study the documentation further on Markers and Geocoding.

The second one tells me my understanding/recollection of scope of anonymous functions was wrong. Variable resultsMap is passed as a parameter to geocodeAddress and then it calls geocoder.geocode with an anonymous callback function that uses that variable. However, that variable will never change. So too for you, locations will never change. So, the real problem you have is that the value of i changed. When i hit the end of the for loop, i was = locations.length and that would put it outside the range of valid indexes for locations and it would choke. The async execution is what is causing the problem.

Google's examples all make only one mark, so they can hard code what they want.

So, options: (1) don't initially set the title of the marker, but come back later and do it, (2) specify a title variable that doesn't change, or (3) force waiting on async execution so i doesn't change.

(1) Now, can we count on Javascript not doing out of order execution of the callbacks? I don't think so. Otherwise, you could append the markers to an array and assume they are in order of the locations used in calls to geocode(). If that were true then you could come back and add the titles later.

Is there something returned in the results of geocode that would help us index into the locations array? Reading through https://developers.google.com/maps/documentation/javascript/geocoding doesn't reveal anything. I was hopeful that placeId might work, if it were an arbitrary field because you can pass the value in to the geocode call and you get it back out in results. However, Google has reserved the values for their own meaning. And, the address you send in to geocode is not necessarily the same one you get out of results.formatted_address.

(2) I imagine, with some effort, one could use dynamic code generation (the program writes code and then runs it) to define fixed variables for each of the values of locations, such that you have something like: location1 = locations[0], location2 = locations[1], etc. and then those variables could be referenced in the callback function. The eval() function is used to dynamically evaluate code. Even better might be just using the constant value in the callback function definition. So it would be something like this inside the for (i=0 loop:


var dynamicCode = 'geocoder.geocode(
 { \'address\': ' + locations[i][0] + '},
 function(results,status) { ' +
   'marker = new google.maps.Marker({ position: results[0].geometry.location,' +
   'map:map, title: \'' + locations[i][1] + ' }); });'
alert('about to execute this code:\n' + dynamicCode)
eval(dynamicCode)

You could embed newlines \n into that dynamic code if you want to make it look prettier, but it is not necessary. On the other hand, spaces could be trimmed out too. So, that is one solution.

(3) Another solution, waiting for callback to complete. Now, the modern way to deal with this is to use Javascript Promises. However, there would need to be support for this from the geocode library, where the callback comes from. Reviewing the reference documentation on geocode, does not reveal any support for this. So, a more hacky approach involves a lock. It would look like this:


var lock;
for (i = 0; i < locations.length; i++) {
    lock = 1; // lock set for this loop iteration
    geocoder.geocode({
      'address': locations[i][0]
      }, function(results, status) {
         marker = new google.maps.Marker({
            position: results[0].geometry.location,
            map: map,
            title: locations[i][1]
         });
        lock = 0; //
    });
    while (lock == 1) {
    await wait(100); // wait 100 milliseconds and check again
}

So, the lock is set going into the call to geocode() and only in the callback function is the lock unset. Polling for the lock to be unset happens every 100 ms, but a shorter time interval may make sense.

Monday, March 26, 2018

Automating Newsletter Creation With LibreOffice and Python

I am involved in the creation of a publication for a non-profit. It is an organization which provides Christian worship services at 18 area elder care facilities (nursing homes, assisted care facilities, etc.). The publication must be customized to have the service time for each facility. There is one publication per week, so it must be customized to have the week dates. We have services over three days (Tuesday-Thursday) and they were at five different times. I have been using LibreOffice to create the documents and then converting them to PDFs and emailing them to the facilities for printing copies for the people who attend the services. I would produce a month at a time. I used fields in the LibreOffice Open-Document-Text (ODT) documents to make editing the dates and times simpler, but this was still taking several hours to complete the job each month. Then the program director came up with two more times. And then two more. This was way too much tediousness for an old guy like me. Too many opportunities for mistakes. Too boring. Let's automate it!

I have done a lot of VBA programming in Microsoft Office tools in the past. I have been poking around the edges of programming in LibreOffice with Uno and Python since the early days, but I always quit because the documentation was too shallow, complexity too deep and patience too short. Then, in my latest spurt of motivation, I ran into this post by Philip at PySpoken.com. It suggested a different approach of unpacking the ODT document (it is really just a zip file), editing the XML contents and then repacking it into a changed ODT document. I had never thought of approaching it this way. It turns out it works beautifully. In addition, Philip recommended the unoconv project for converting the .ODT document to PDF using LibreOffice. Dag Wieers created unoconv. It seems to work well enough, but may be a bit fussy about errors, so I put in a retry mechanism for its use, to ensure the output actully got created. Overall, thanks Philip for giving me the pieces to get this job done!

I used Python 3.6 for this. F-strings are used. ;-)

Dig into the Github repo for the full set of files. Also have a look at the Faith, Hope and Peace Ministries website (where you can read these devotions).

I decided to generalize my approach, so that it could handle additional days being added. Very little would be hard-coded, but the code would figure out what to do. It does assume it is being run the month prior to the document's target dates and at least 3 days into that prior month. It expects to find 4 or 5 documents of the form Devotion-1.odt ... Devotion-4.odt. The number depends on how many weeks our target days hit in that next month. The way to count this is the number of times our starting weekday occurs in the month. As it stands, we start on Tuesday, so if there are 5 Tuesdays in a month, there will be 5 documents for that month. I don't divide the week if a month ends on Tuesday or Wednesday; whatever our date range is. Only the starting weekday matters. Now, I did set parameters based on a couple of data structures declared up top. These could theoretically be loaded from a data file.

# The next two data structures, days and daysTimes must be sorted by weekdays

# days indicates which weekdays are included, and the numerical equivalent for the datetime module
days = (('Tuesday',2),('Wednesday',3),('Thursday',4)) 

# daysTimes shows required days and times, organized in a hierarchy of 
# weekdays corresponding to days above

daysTimes = (('10:00 AM','10:30 AM','4:00 PM'),  # Tues
             ('10:00 AM','10:30 AM'), # Weds
             ('10:00 AM','10:30 AM','1:00 PM','3:00 PM')) # Thurs

A couple more parameters are covered here: how our input files are named and the directory for unzipping.

inputFiles = [f'Devotion-{n}.odt' for n in range(1,6)]
subdir = 'unpack'

The user fields show up in the LibreOffice with some XML tags. These regular expressions will capture them, and are used for making replacements.

# regular expression library
reDay = re.compile(r'<text:user-field-decl office:value-type="string" ' + \
                   r'office:string-value="[^"]+" text:name="Day"/>')
reDate = re.compile(r'<text:user-field-decl office:value-type="string" ' + \
                    r'office:string-value="[^"]+" text:name="DateRange"/>')
reTime = re.compile(r'<text:user-field-decl office:value-type="string" ' + \
                    r'office:string-value="[^"]+" text:name="Time"/>')
reTitle = re.compile(r'<dc:title>[^<]+</dc:title>')

Next, some rather elaborate machinations are taken to find the first of the next month, establish some other calendar properties and create a cycle list. The cycle list will allow the workingDate to be advanced over our weekdays and then into the next week.

# build up calendar
firstDay = days[0][1] # numerical first weekday of our schedule
nextMonth=date.today()+timedelta(days=27) # pick a day in the next month, to get the month/year
                                          # correct
firstNextMonth = date(nextMonth.year,nextMonth.month,1) # find the first of next month
workingMonth = firstNextMonth.month
monthName = firstNextMonth.strftime('%B')
# how many days from the first of the next month to our first active weekday
n = firstNextMonth.isoweekday()
activeOffset = (firstDay - n) if (n < firstDay) else (firstDay + 7 - n)
# this value is initialized, but will be incremented as we do work:
workingDate = firstNextMonth + timedelta(days=activeOffset)
# now come up with the delta times to cycle through dates - so days to go from first weekday
# to second, second to third, ... and last to the first weekday of the next week
cycle = [ days[i+1][1] - days[i][1] for i in range(len(days)-1) ]
cycle.append( 7 + days[0][1] - days[-1][1])

Change to the place where all the action happens. Create the output directory, named after the next month.

# working directory
os.chdir('c:/Users/buchs/odp/Documents/FHP-Ministries/Materials')
# output directory same as month name
if not os.path.exists(monthName):
  os.mkdir(monthName)

Next comes the three nested loops. The outer loop iterates over the weeks. The next loop iterates over the week days. Finally the inner loop iterates over the times for a particular weekday. For each iteration of the inner loops, the values for the content.xml and meta.dat files are updated accordingly. Then that content is used to overwrite those files, and finally, the contents of the unpack directory are zipped up to form a new .ODT document.

# loop over the weeks, stop when we hit the first weekday in the next month
weekIndex = 0
while workingDate.month == workingMonth:

  print("starting with ",inputFiles[weekIndex])
  # unpack our input file.
  zf = zipfile.ZipFile(inputFiles[weekIndex],'r')
  zf.extractall(path = subdir)
  zf.close()

  # work in unpacked dir
  os.chdir(subdir)
  
  # grab content to be ready to edit content
  fp = open('content.xml')
  content = fp.read()
  fp.close()

  # grab meta data to be ready to edit it
  fp = open('meta.xml')
  meta = fp.read()
  fp.close()
  

  # loop over days of week
  for dayIndex in range(len(cycle)):

    print('date is ',workingDate.isoformat())
    # Update the day of the week and date in the content
    daySub = f'<text:user-field-decl office:value-type="string" ' + \
             f'office:string-value="{days[dayIndex][0]}" text:name="Day"/>'
    content = reDay.sub(daySub,content)
    dateString = workingDate.strftime('%B %d, %Y').replace(' 0',' ')
    dateSub = f'<text:user-field-decl office:value-type="string" ' + \
              f'office:string-value="{dateString}" text:name="DateRange"/>'
    content = reDate.sub(dateSub,content)
    
    for timeIndex in range(len(daysTimes[dayIndex])):
      
      thisDayTime = daysTimes[dayIndex][timeIndex]
      # make a simple form of time for naming the files
      timeSimple = '-' + thisDayTime.replace(':','').replace(' AM','').replace(' PM','') + '-'
      
      timeSub = f'<text:user-field-decl office:value-type="string" ' + \
                f'office:string-value="{thisDayTime}" text:name="Time"/>'
      content = reTime.sub(timeSub,content)
      
      # overwrite the content file
      fp = open('content.xml','w')
      fp.write(content)
      fp.close()

      # overwrite the metadata file with document title
      dateStmp = workingDate.strftime('%b-%d-%Y')
      titleSub = f'<dc:title>Devotion {days[dayIndex][0][0:3]} {thisDayTime} ' + \
                 f'{dateStmp}</dc:title>'
      meta = reTitle.sub(titleSub,meta)

      # overwrite the meta file
      fp = open('meta.xml','w')
      fp.write(meta)
      fp.close()

      # Create new output file and open as zipfile
      outputFile = '../' + monthName + '/' + days[dayIndex][0][0:3] +  \
                   timeSimple + dateStmp + '-' + inputFiles[weekIndex]
      # like: Devotion-1-Tue-1000-Apr-01-2018.odt
      zf = zipfile.ZipFile(outputFile,'w')

      # write files, subdirs and files in subdirs to this zip file
      for f in os.listdir('.'):
        zf.write(f)
        mode = os.stat(f).st_mode
        if stat.S_ISDIR(mode):
          for g in os.listdir(f):
            zf.write(f+'/'+g)

      zf.close()
      print(f'Wrote {outputFile}')

    # We reach the end of times for a given day, now advance the date.
    # This will allow tracking when we bump into next month.
    # This will automatically take care of the week jumps too.
    workingDate += timedelta(days=cycle[dayIndex])

  # and we are on to the next week
  weekIndex += 1
  os.chdir('..')
  shutil.rmtree(subdir) # clean up unpacked files to prepare for next

Now all the ODT output files have been created. Time to covert them to PDF.

# Now, convert the ODT files to PDF files
os.chdir(monthName)
# start the converter server
subprocess.run('python c:/python36/Scripts/unoconv --listener &',shell=True)
time.sleep(20)
# make one pass through everything
for fn in glob.glob('*.odt'):
  subprocess.run(f'python c:/python36/Scripts/unoconv -f pdf {fn}',shell=True)

# now cycle through looking for missing pdf files, because unoconv can fail.
missing = 1
while missing > 0:
  missing = 0
  for fn in glob.glob('*.odt'):
    pdfn = fn.replace('.odt','.pdf')
    if not os.path.exists(pdfn):    
      subprocess.run(f'python c:/python36/Scripts/unoconv -f pdf {fn}',shell=True)
      missing += 1
  print('missing ',missing)