Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation-time checking of systemd units #153369

Open
nh2 opened this issue Jan 3, 2022 · 9 comments
Open

Evaluation-time checking of systemd units #153369

nh2 opened this issue Jan 3, 2022 · 9 comments
Labels
2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: systemd

Comments

@nh2
Copy link
Contributor

nh2 commented Jan 3, 2022

This ticket describes a way how we can make writing systemd dependencies much safer, but I need help with making its implementation better.


In my deployments we have many NixOS modules with systemd units that depend on each other.

When expressing dependencies, It is easy to make mistakes that are only noticed at runtime, e.g. in

after = [ "nginx.service" ];

you may forget to enable the nginx module, or make a typo, or have a new NixOS release rename the service so things will break in production.

I have written an evaluation-time check like this:

after = [ (ensureUnitExists config "nginx.service") ];

that will complain if that unit does not exist on the machine.

It is implemented like this:

  # Asserts that a systemd unit (like `nginx.service`) exists; then retuns it.
  ensureUnitExists = config: unitName:
    if builtins.hasAttr unitName config.systemd.units
      then unitName
      else abort "Unit named '${unitName}' does not exist on host '${config.networking.hostName}'";

To do that, I had to modify nixpkgs like this:

niklas@ares ~/src/benaco-review/nix-channel/nixpkgs (git)-[remotes/origin/benaco-nixos-21.11] % git --no-pager show 724ff860e9a2477202efe2c6f7fb8a0d7f5545f8
commit 724ff860e9a2477202efe2c6f7fb8a0d7f5545f8
Author: Niklas Hambüchen <[email protected]>
Date:   Thu Nov 4 02:19:59 2021 +0000

    nixos/systemd: Add `unitName` attribute.
    
    Allows writing expressions like:
    
        systemd.services.nginx.after = [ "${config.systemd.services.redis.unitName}.service" ];
    
    which fail loudly at evaluation time when a unit name is mis-spelled
    or not declared for other reasons.

diff --git a/nixos/modules/system/boot/systemd-unit-options.nix b/nixos/modules/system/boot/systemd-unit-options.nix
index 4154389b2ce..f586215aec8 100644
--- a/nixos/modules/system/boot/systemd-unit-options.nix
+++ b/nixos/modules/system/boot/systemd-unit-options.nix
@@ -96,6 +96,14 @@ in rec {
 
   commonUnitOptions = sharedOptions // {
 
+    unitName = mkOption {
+      type = types.str;
+      default = config._module.args.name;
+      description = ''
+        Name of the unit, without suffix like ".service"
+      '';
+    };
+
     description = mkOption {
       default = "";
       type = types.str;

It works, but the config._module.args.name magic has one drawback:

It generates NixOS manual/options rebuilds, which take long.

Is there a better way we can do this?

@nh2
Copy link
Contributor Author

nh2 commented Jan 3, 2022

On NixOS 20.03, I used this implementation:

    # Example:
    #   serviceUnitOf cfg.systemd.services.myservice == "myservice.service"
    serviceUnitOf = service: "${service._module.args.name}.service";

so I could write:

after = [ (serviceUnitOf config.systemd.services."nginx.service")  ];

and that would also fail at evaluation time if the service wasn't defined.

But that stopped working with PR #82751:

#82751 (comment)

@roberth suggested an improvement in #82751 (comment) that I haven't fully gone through yet.

I opened this issue so that we can discuss the concept here properly, instead of in a closed PR.

@nh2 nh2 mentioned this issue Jan 3, 2022
2 tasks
@nh2
Copy link
Contributor Author

nh2 commented Jan 4, 2022

I have more thoroughly tried the implementation by @roberth in #82751 (comment) now.

Seems to work well so far!

Any points against it?

@nh2
Copy link
Contributor Author

nh2 commented Jan 4, 2022

CC @infinisil @flokli

@flokli
Copy link
Contributor

flokli commented Jan 4, 2022

This will only work for units configured via config.systemd.services, so won't work for units coming via config.systemd.packages, or other units provided by systemd itself.

This means, it can only be used for some units, which might be very confusing.

A "linter" phase while assembling the system closure might work better, and also cover these use cases, without requiring any changes to the module system. What do you think?

@roberth
Copy link
Member

roberth commented Jan 4, 2022

without requiring any changes to the module system

s/module system/the module/

You're making it sound complicated :)

It'd be a good refactor regardless, to use the new option in the systemd module internally, but maybe not for the purpose of checking whether units exist, as you've explained clearly. It should be a build-time check, as eval does not have all the info.

A build-time check may not be as convenient in some cases, but that doesn't seem as important.

@roberth
Copy link
Member

roberth commented Jan 4, 2022

Might want to port system.checks from #148456

@nbp
Copy link
Member

nbp commented Jan 4, 2022

The following is a module, when imported in a submodule, sets the name of the unit to the name which is used to name the attribute set:

{ name, lib, ... }: {
   options.unitName = lib.mkOption {
     type = lib.types.str;
     example = "<name>";
     description = ''
       Name of the unit, without suffix like ".service"
     '';
   };
   config.unitName = lib.mkDefault name;
}

Note, config._module.args.name exists. but using it is not as nice as having a named argument like { name, ... }:.
Setting default = config. ... should be avoided, and mkDefault should be preferred to avoid evaluating the configuration when generating the documentation.

@roberth
Copy link
Member

roberth commented Jan 4, 2022

@nbp the point was to include the suffix. My original suggestion: #82751 (comment)

georgefst added a commit to georgefst/nixos-config that referenced this issue Apr 16, 2022
I _think_ `wants` (or `wantedBy`) is the common link in all the times where I've suffered huge `nixos-rebuild` times. But I'm not certain. It's hard to debug things which take hours and can only seem to be fixed by pulling the power. Which genuinely does seem to "fix" things sometimes - see https://www.reddit.com/r/NixOS/comments/ps4p42/comment/i3t3c4d/?utm_source=share&utm_medium=web2x&context=3.

This issue may be related, but perhaps I'm misreading it: NixOS/nixpkgs#153369.
@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jul 10, 2022
@nh2
Copy link
Contributor Author

nh2 commented Jun 16, 2024

Potentially implemented by

I asked there for a short comparison: #297726 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: systemd
Projects
None yet
Development

No branches or pull requests

5 participants