It turns out that configure pages was mostly bust. I don’t need it whatsoever for building and testing jekyll with all the dependencies and plugins that github-pages uses/allows.
So I moved on to actions-jekyll-build-pages. This is where the juicy stuff is, as far as I can tell.
Because I can’t be satisfied with knowing how I’m meant to interact with something, I have a desire to know why and how it works. Again, everything here is open source, so I started poking around the repository. Ruby is largely unfamiliar to me, so it took some adjustment to read everything going on, and understand what’s up with Gemfiles.
I learned that a “gem” is a cutesy name for a ruby module/package which can be
handled programmatically by the gem
ruby-environment package manager, and
the bundle
dependency-resolver. Honestly I think this confuses the matter,
that the package shares the name of the package manager. To their credit, the
Gemfile format that bundle
reads does require that the repository be
specified as a source.
I have opinions about this, TL;DR: that’s silly, and makes adoption more
difficult. Any kind of cutesy clever crap like that which ignores commonly
established naming schemes in favor of one’s own just throws a wrench in the
works. Interpreted languages having their own common package/module/library
repository used from a package manager is nothing new, such as CPAN, npm, pip,
etc. In fooling with callGraph
(see previous entries), I needed to also
fool with CPAN, and I ended up resolving that by using cpanminus for zeroconf
quick setup.
As a sidenote, while I’m thinking of languages and their individual package managers: these are usually platform-independent, and many of their libraries are not found in larger package manager repositories, but only in the language-specific repository. While nix distro package managers are capable of handling, theoretically, any kind of file that can have its installation image standardized, the libraries one is likely to find are almost certainly going to be for C and C++. Likewise, then, as ruby/gem and perl/CPAN and python/pip, etc, deal with environments *specifically for their own language, *nix can be read as an environment specifically for C and friends. Certainly not a novel thought, but I think it’s intriguing. This may change as increasingly many binaries for go and rust are shipped, creating demand for a healthy environment for them.
Interestingly to me, there is pretty bad discoverability and availability of a centralized C language repository, by comparison. Any given *nix package manager worth its salt is likely to be able to connect to a repo which lists many dozens of mature standard libraries of the kind shared by many individual programs – that’s a large part of what lets linux programs take up so little space by comparison with Windows or even macOS programs. Even with their own dynamically loaded libraries, it’s rare for any given several Windows or macOS programs to share more than the barest dynamic libraries provided by the OS or otherwise redistributable, and then many of those programs will just distribute their own copy of, e.g, the vcredist files.
That being the case, individual distro repositories listing some well-known standard C libraries does not a discoverable language repository make.
This action is basically packaged as a docker image ready-to-use. From what I can tell, it has the following functions:
act
also does
this, which took some digging), use those which are necessary for
whitelisted plugins.github-pages
ruby gem as a wrapper around jekyll to compile from
the source to destination. This also, in the docker build
process for
this image, grabs all the enabled and whitelisted dependencies, and sets the
configuration between their defaults, the user-provided inputs, and the
configs they overwrite on top of the user’s inputs.I gathered most of that from entrypoint.sh
, and from inspecting
Dockerfile
, and the Gemfile
. It’s build systems all the way down!
Their Dockerfile
defines an image built from a slim ruby base layer,
updated, copying in the Gemfile
, and installing the required gems and their
dependencies. Well, so that got me curious, in the Gemfile
it asks for a
gem by name of github-pages, which is hosted on rubygems.org
The github pages gem is interesting. How they set
dependencies is a little convoluted, and relies on that gem
will also do
some evaluation of actual ruby source when referenced. The vast bulk of the
package, by line count, is just defining dependencies and locking their
versions. All this information is reflected in some state by their
webpage which similarly lists versions. I could use that
information if I really wanted to, and I certainly do not. The point is not to
develop a new image, the point is to make portable their work and use it
myself.
Ultimately, this gem is designed to be run standalone if desired, and includes
a standalone Dockerfile
for its own image, which is available at github’s
package repository, ghcr. It is unclear to me
why exactly the actions/jekyll-build team went with building a new docker
image, building in a ruby container from their own Gemfile
. Left hand <-?->
right hand? I don’t care that much, I only care which one I’m using, really.
Since the jekyll-build image is doing more legwork on my behalf, and
additionally defines dependencies in its Gemfile
which are missing from the
github-pages gem (reason unknown), I’m moving forward with that.
It took some digging as to what exactly was going on here. When I was examining
the log files from act --verbose >>act.log 2>&1
, I was looking for where
and how environment variables were passed to the docker containers. The various
action.yml files clearly define parameters which would need to be delivered as
environment variables, after all. I was, heretofore, unfamiliar with how
exactly docker inserted environment variables. At the command line this is
just docker -e ENV_VAR ...
, but the docker engine has an API. This, at
first a shock to me, is just straight up a REST API accessible with standard
HTTP interaction. On further examination, it does make sense, given that the
docker daemon and the CLI primarily communicate over a socket, typically at
/var/run/docker.sock
. Anyway, the decision of the act
team to use go
makes more sense, now, since the primary library they’re interacting with is
the docker engine SDK, for which two official libraries are available: go and
python.
The go docker SDK defines a type which includes the
Env property, itself an array of strings. Near as I can tell, act
uses that
in order to actually launch the docker image with all the environment
variables, some of which are defined as a default, and some of which are pulled
during runtime based on arguments and environment. I am unfamiliar enough with
go, and there are enough layers of indirection, that I actually can’t tell
where this type gets used for an API call in the main path of execution. That’s
rather besides the point, though. The main thing I was most interested in was
this block. From what I can tell, very little,
if any, of that environment is necessary for me to replicate, but it explains
what I was seeing in the log file.
I want to poke around the ruby source and see which of the environment variables are actually required, moreover, for which jekyll plugins.
I could go through the trouble of running the bundler and etc locally based on
the Gemfile
that’s provided as part of this docker image. But I don’t need
to, nor do I need to futz with mangling or un-mangling my local ruby
environment: the container image would by definition already have all I need.
Moreover, the container should have the most critical environment variables
defined, absent of course the ones provided by the action runner when started
outside of that context.
The main container I have an interest in at the moment is jekyll-build-pages.
Alas, that has an entrypoint which directly calls the scripts, so if I try to
docker run -it -d ghcr.io/actions/jekyll-build-pages
, I get nothing,
because the main ruby script errors out for lack of inputs. Again, glory to
open source, since I can make a quick edit to Dockerfile
, or to the
entrypoint.sh
script.
act
thankfully pulls all this by way of git clone or some equivalent, so I
have the git worktree available for me to work from here. More to the point, I
can munge up whatever I need and reset it with git reset --hard
.
An ENTRYPOINT will always run, and in this case, I do want entrypoint.sh
to
run, since it sets several environment variables I want to inspect. The final
command, on line 37, is what
actually calls the ruby script. However, even just removing that would not
allow me to let the script run and interactively futz with the container; the
bash script would immediately exit, which is where the container would stop its
process, and I’d be out of luck.
Docker’s best-practices section about ENTRYPOINT thankfully provides an example of how to design a helper script to allow it to be interactive:
exec "$@"
Either directive of CMD
or ENTRYPOINT
can define an initial executable
and its arguments that run when a container starts. Similarly, only one of each
can exist in a Dockerfile
. The main difference is that CMD
, when used
alongside ENTRYPOINT
, provides parameters to whatever executable is
defined by ENTRYPOINT
. Either of these directives can be overwritten at the
command line:
CMD
provides a default which is overwritten by the optional parameters
following the image when using docker run [options] IMAGE [command]
.ENTRYPOINT
will always run, and will not be overwritten without a special
argument of the form docker run --entrypoint="" ...
So a helper script, as we have here, can have its last line defined as that command to execute all arguments, and be run with an interactive shell by way of
docker run -it IMAGE /bin/bash
I happen to know that this works because I was fooling around with it earlier. If we needed to make sure that the command we’re looking for actually exists, we could export the filesystem of the image and read from that. Per this hint on stackoverflow:
docker export $(docker ps -lq) | tar tf - | less
The shell-substitution command there, $(docker ps -lq)
, grabs just the id
of the most recently used container image. Lo and behold, /bin/grep
also
exists in this image.
Per the various exports in entrypoint.sh
, then:
grep -e JEKYLL_ENV \
-e JEKYLL_GITHUB_TOKEN \
-e PAGES_REPO_NWO \
-e JEKYLL_BUILD_REVISION \
-e etc \
-r /usr/ \
--color=always | less -R
gives me nicely colored output I can page through.
The jekyll-github-metadata plugin wrangles information accessible through liquid, by way of site.github. PAGES_REPO_NWO -> “name with owner”.
The jekyll-gist plugin extends liquid with a gist tag that fetches the content of a github gist.
JEKYLL_ENV is probably the only one of those of real interest to me. I don’t know how much use I really have for the github metadata, or for the gist plugin. The only function from the github metadata I might want, which is pulling the most recent public repositories I’ve worked on, is also accessible without authenticating, just making an API call to api.github.com
I think that’s as far as I care to dig into this topic today.