Tuesday, June 28, 2022

Setup a Ruby Application in a Highly Constrained GitLab Executor

Some IT problems are easy to solve if one has the ability to control parameters and environments. When there are constraints, however, things can get really tough.

I found myself in such a constrained scenario. I needed to develop a GitLab CI/CD pipeline to install and run a Ruby application kitchen-terraform (GitHub link) on a GitLab CI/CD executor on which I had a very limited set of OS packages (GNU/Linux, Centos-based, yum) I could install. The Ruby version available was very old. In addition, the standard way of installing Ruby gems (language packages) using https://rubygems.org/gems was unavailable. Also, the repository where Ruby source is available is not accessible.

Add to this that I had a work environment available but it was a highly constrained Microsoft Windows system. I could not install Ruby or any other software on that machine. I was able to send myself brief emails from outside systems, however. And, I could download a Ruby source bundle via browser and Ruby gems were available. This Windows system has a install of Python 3.9 with the standard library along with a limited set of additional Python PyPI packages. However, I do have a Linux system that is isolated over which I have full control.

So, let's break this problem down into parts. First, let's build Ruby from source. Via the Windows system I downloaded the Ruby source package ruby-2.7.6.tar.gz and I could insert this into my GitLab git repo, and I started populating the vendor directory with this. Vendor is a typical directory that has meaning to Ruby and its gem ecosystem. So, building Ruby from source is straight forward.  Need to install some OS packages - to be able to build Ruby and a few other things I needed to do. Squelch the output of these commands once I know they work well. This will keep the transcript of the GitLab pipeline from being noisy. This will do it:

  yum install 'Development Tools' -y &> /dev/null
  yum install openssl-devel openssl -y &> /dev/null
  tar xf vendor/ruby-2.7.6.tar.gz
  cd ruby-2.7.6
  ./configure --without-rdoc > /dev/null
  make > /dev/null
  # GitLab CI/CD pipelines run with root as the user
  # so no sudo required on this next command
  make install > /dev/null


Building Ruby from source takes a bit of time. My pipeline is going to be running regularly so it would be a nice addition if we don't have to do that every run. I cannot permanently change the GitLab executor, though that would be idea. But, GitLab CI/CD has a cache feature, which allows one to save a portion of the directory structure from the prior run of the pipeline and restore this prior to the next run. So, what if we captured the full Ruby build and set that to be cached. Then just run the 'make install'. Seems to make sense.

However, the Ruby build process using a lot of makefiles is a horrendous mess. It actually compiles C code during 'make install'. Yeah, that is badly broken. Beside not building Ruby every time, it would be ideal to not have to even install the Development Tools via yum, because that takes some unnecessary time. So, begin picking apart the makefiles and figure out what 'make install' is actually doing and what can be done to subset these steps. About 8 hours later, I have the minimal set of commands. However, we need a helper makefile to be added to the suite of other makefiles. Let's call this makefile 'only-install-ext.mak' and we will place it at the root directory. This is the contents to put in that file:

  ${INSTRUBY} --make="$(MAKE)" $(INSTRUBY_ARGS) --install=ext-comm
  ${INSTRUBY} --make="$(MAKE)" $(INSTRUBY_ARGS) --install=ext-arch

Here is the minimal script to install a previously built Ruby:

  yum install make -y > /dev/null
  make do-install-bin
  make do-install-lib
  make -f GNUmakefile -f only-install-ext.make install_ext_special
  make do-install-gem


Ok, one more optimization we can do is to reduce the size of the cache GitLab maintains. This means less time restoring the cache. With some iterations, one can determine which directories of the Ruby build are really not needed by deleting the directories and determine if the above install process still works or the directory is needed. These directories are not needed: basictest benchmark bootstraptest cygwin doc sample spec test win32. So, with the full build process, we remove these directories so that, at the end of the pipeline execution, they will not be present to be added to the cache. In the pipeline configuration, we can add this to capture the cache:

  build_job:
    ...
    cache:
      key: ruby
      paths:
        - ruby-2.7.6


So, here is the full shell script (bash) for the building/installing Ruby. We start by checking if the cache has been restored. If it has, we just do the "make install" process.

  if [ -d ruby-2.7.6 ]; then
    cd ruby-2.7.6
    yum install make bind-utils openssl -y > /dev/null
    make do-install-bin
    make do-install-lib
    make -f GNUmakefile -f only-install-ext.mak install_ext_special
    make do-install-gem
    cd ..
  else
    yum groupinstall 'Development Tools' -y &> /dev/null
    yum install openssl-devel bind-utils openssl -y &> /dev/null
    tar xf vendor/ruby-2.7.6.tar.gz
    cd ruby-2.7.6
    ./configure --without-rdoc > /dev/null
    make > /dev/null
    rm -rf basictest benchmark bootstraptest cygwin doc \

       sample spec test win32
    cd ..
    # some additional work done in this else block ...
    ...

  fi


Ok, Ruby building/installing is taken care of in an optimized fashion. Next, if we want to be able to run kitchen-terraform, we need to have some gems to be installed. In fact, it is quite a large number in a very twisted hierarchy of dependencies. Turns out the easy way to get these gems (found after I started down the hard way path). The easy way is to utilize the Linux system over which I have full control to build an exhaustive list of gems with versions. Do this by first installing the very same ruby-2.7.6 version from source (seems all Linux versions are in the dark ages with Ruby versions the OS packages support). Once that is set up, change to a work directory. Create a file called "Gemfile" with this content:

  source "https://rubygems.org"
  gem "kitchen-terraform", "~> 6.1"


This Gemfile can be used for the Ruby "gem" command but also the super-powered "bundle" command. So, run the bundle command and capture the output.

  bundle install > bundle.log


That output file will contain the names and versions of the full set of required gems for kitchen-terraform. Edit this down to a nice file listing one gem name and version per line. So, this file contents can safely be emailed to the restricted Windows system. Now these gems must be downloaded from rubygems.org. But there are so many that you really don't want to do this via your browser. So, Python to the rescue. This will read a file listing gems and versions.

  import os
  import sys
  import requests
 
  def getgem(name, version):
    filename = f'{name}-{version}.gem'
    url = f'https://rubygems.org/downloads/{filename}'
    r = requests.get(url)
    with open(filename, 'wb') as fp:
      fp.write(r.content)



Now, each of these gems is actually a .tar.gz file. They can be put in the vendor/cache directory in our repo where "bundle" can find them.

Next, create the Gemfile in our repo with this content:

  gem "kitchen-terraform", "6.1.0"

Now, about these gems. Most of them are pure ruby code. Nothing special is required to install these gems besides the .tar.gz files downloaded. However, a small number of these gems require compiling C code. That would mean we need to have the yum group 'Development Tools' be installed. We don't want to have the installed routinely because it slows down the pipeline. So, how about if we build those particular gems at the same time we are building Ruby from source and we have the 'Development Tools' installed. Then, we can zip them up and save the result as a GitLab pipeline artifact, which can be subsequently be downloaded and inserted into our GitLab repo.

To do this, we need to install these compiled gems in a special location because there are lots of files created and we want to separate all those from other gems installed when Ruby was built from source. So, the following shell script code will install the gems in the local directory in the subdirectories of: build_info cache doc extensions gems specifications. The zip will then be unpacked on subsequent runs in the system gem directory.

  # install these compiled gems from vendor directory and

  # capture the zip
  for gem in bcrypt_pbkdf-1.1.0 bson-4.15.0 ed25519-1.3.0 \

      ffi-1.15.5 unf_ext-0.0.8.2 json-2.6.2;do
    gem install -i . -N -V --local "vendor/cache/${gem}.gem" \

       > /dev/null
  done
  # now save results as zip file
  zip -r compiled_gems.zip build_info cache doc extensions \

     gems specifications > /dev/null
  # built it once, set as job artifact and download it

  # and store in vendor/
  # this is used below when not built here.
  compiled_gems="yes"

You will see that compiled_gems is a flag, which if not set, will trigger installing these gems from the zip file later in this code. We change directory to the system gem install directory for unpacking the zip:

  # install pre-built gems so we don't need to install

  # developer tools every run
  if [ -z "$compiled_gems" ]; then
    pushd /usr/local/lib/ruby/gems/2.7.0 > /dev/null
    unzip $CI_PROJECT_DIR/vendor/compiled_gems.zip > /dev/null
    popd > /dev/null
  fi


These installations and the others can be tested by querying with the "gem" command:

  # check the compiled gems are available
  gem list '^(json|unf_ext|bcrypt_pbkdf|bson|ed25519|ffi)' -d
  # and a few of the others
  gem list '^(aws-eventstream|azure_graph_rbac)' -d


By the way, you can put this in the GitLab pipeline to capture compiled_gems.zip.

  build_job:
    ...
    artifacts:
      paths:
        - compiled_gems.zip


So, if everything is put together, this is the full setup shell script:

if [ -d ruby-2.7.6 ]; then
  cd ruby-2.7.6
  yum install make bind-utils openssl -y > /dev/null
  make do-install-bin
  make do-install-lib
  make -f GNUmakefile -f only-install-ext.mak install_ext_special
  make do-install-gem
  cd ..
else
  yum groupinstall 'Development Tools' -y &> /dev/null
  yum install openssl-devel bind-utils openssl -y &> /dev/null
  tar xf vendor/ruby-2.7.6.tar.gz
  cd ruby-2.7.6
  ./configure --without-rdoc > /dev/null
  make > /dev/null
  rm -rf basictest benchmark bootstraptest cygwin doc sample \

       spec test win32
  cd ..

  # install these compiled gems from vendor directory and

  #  capture the zip
  for gem in bcrypt_pbkdf-1.1.0 bson-4.15.0 ed25519-1.3.0 \

        ffi-1.15.5 unf_ext-0.0.8.2 json-2.6.2;do
    gem install -i . -N -V --local "vendor/cache/${gem}.gem" \

       > /dev/null
  done
  # now save results as zip file
  zip -r compiled_gems.zip build_info cache doc extensions gems \

      specifications > /dev/null
  # build it once, set as job artifact and download it and
  #
store in vendor/ this is used below when not built here.
  compiled_gems="yes"
fi

# ruby location, /usr/local/bin, is already on path
echo -n 'Ruby version: '
ruby --version

# install pre-built gems so we don't need to install

# developer tools every run
if [ -z "$compiled_gems" ]; then
  pushd /usr/local/lib/ruby/gems/2.7.0 > /dev/null
  unzip $CI_PROJECT_DIR/vendor/compiled_gems.zip > /dev/null
  popd > /dev/null
fi

# install remaining vendored gems; get gems here: https://rubygems.org/gems
bundle install --local &> bundle.log
log_lines=$(wc -l bundle.log|cut -d' ' -f 1)
normal_log_lines=260
if [ $log_lines -ne $normal_log_lines ]; then
  echo "--- bundle.log lines=$log_lines ---"
  cat bundle.log
fi

# check the compiled gems are available
# gem list '^(json|unf_ext|bcrypt_pbkdf|bson|ed25519|ffi)' -d
# and a few of the others
# gem list '^(aws-eventstream|azure_graph_rbac)' -d