Advanced scripting for lib_deps

Im a currently working on setting up CI for a PlatformIO based project.

Since it is a large project and is closed source, I have a couple private repos that I am using as libraries. These are pulled into the project using GIT+SSH. This is great for developers, since it respects the SSH key access rights for the user and doesnt provide access to repos their gitlab account doesnt have access to.

The issue comes when using CI for builds.

I have set up CI using SSH deploy keys, but this creates a loophole since the SSH key in CI (potentially) has more access rights than the person using it. This can be used to gain access through changing the CI config.

The solution is to use a CI job token. This follows the commiter’s access rights and only allows access if the user has it. The issue is, I cant have both the GIT+SSH address and the HTTPS+TOKEN address in lib_deps.

The solution to this problem is to add the lib_deps values to the env object for scripting. This would allow me to regex replace these addresses and automated CI without effecting Devs.

(Yes, I know there are other ways, like not running CI on CI config changes, or using personal tokens for devs, But these are all compromises and dont work with the security model that im trying to build)

Does this make sense as an addition to the scripting capabilities?

Since the LDF runs after the pre scripts, This should be possible.

The LDF and lib command of pio core are obviously seperate from the build system. Scimming over the PIO core code, this seems to be the case.

It would be possible to alter the lib_deps values before passing them the LDF, but this may not be good. This would allow there to be descrepancy between the apparent values of lib_deps for the pio lib command and the pio run command.

In my case, this is for CI, so it wouldn’t be an issue. For the general purpose, this could cause confusion for devs. It may be better for me to regex replace the addresses in the platformio.ini file using a CI script instead of trying to build it into PIO. This would be more obviously for CI, as it wouldn’t ever be touched by the LDF or PIO in general.

Im going to do it the CI scripting way, Since that makes more overall logical sense. If anyone else has ideas, Im all ears!

Here to post on my own topic again, But maybe I can help some poor shmuck who has the same issue down the road.


Here is the (stripped down) script I wrote:

gitlab_ci_lib_replacer.py in the folder [project_root]/scripts/CI


import os
import re

pio_ini = 'platformio.ini'
search_regex = '^(\s*?)git@(.*?gitlab.*?):(.*?)$'
ci_token = os.environ['CI_JOB_TOKEN']
replace_regex = '\\1https://gitlab-ci-token:'+ ci_token + '@\\2/\\3'

updated_content = ""

with open (pio_ini, 'r') as f:
    content = f.read()
    updated_content = re.sub(search_regex, replace_regex, content, flags = re.M)
    f.close()

with open (pio_ini, 'w') as f:
    f.write(updated_content)
    f.close()

Since I wanted this script to only run for CI builds, I called it from the gitlab ci config:
.gitlab-ci.yml

build:
  stage: build
  image: python:3.9-buster 
  rules:
    - if: $CI_COMMIT_TAG =~ /^v[\d]{1,3}\.[\d]{1,3}\.[\d]{1,3}$/ # Only run on version tags (v1.2.3, etc.)
  before_script:
    - "pip install -U platformio"
    - "python scripts/CI/gitlab_ci_lib_replacer.py"
  script:
    - "pio run -e my_env"

How does this actually work?

When using private git repositories as libraries, it is easy to pull them on dev systems using SSH. As soon as you try to run them in CI, that causes an issue. In order to make the CI work with SSH, you need to have an SSH key for the CI build. This is possible in gitlab using deploy keys but it comes with some safety caveats. Primarily, when the SSH private key is added as a CI variable, you have to be concerned with how other developers have access to it. If they can change the CI config, they could use the SSH key to pull the library and push it to their own upstream (even if they don’t personally have access to that libraries repo!). This creates a security loophole, and that’s not what we want. Of course, you can protect those files through careful merge checks, branch protection, etc. But I didn’t want to make it hard to change stuff, I just wanted private repos to stay private.

The way I solved this above is using the CI_JOB_TOKEN. This is essentially a personal access token generated by gitlab that is only valid while that specific CI job is running. It is created under the name of the user who triggered the CI job. So, it has access to the repos the triggering user does. We can utilize this behavior by regex replacing the SSH git repo URLs with Token-based HTTPS git repo URLs.
When PIO goes to pull the libraries, it uses the HTTPS+Token URL we just changed in the .ini file to pull the library with the CI_JOB_TOKEN (which is redacted using regex in the CI output on gitlab).

The regex in the file above replaces lines like this:
lib_deps = git@gitlab.com:timomerschen/Artnet.git# v1.0
with:
lib_deps = https://gitlab-ci-token:123456789abcdef@gitlab.com/:timomerschen/Artnet.git# v1.0

Of course, if you are using a self-hosted gitlab instance, youll have to fix the regex (maybe make the gitlab.com part a variable).


TLDR

SSH is hard to do securely with CI, So use CI tokens. The script above replaces the SSH URLs with HTTPS+Token URLs.


Dont mind me as a mark my own reply as the solution