[gentoo-dev] rfc: Does OpenRC really need mount-ro

Discussion:

[gentoo-dev] rfc: Does OpenRC really need mount-ro

William Hubbs

2016-02-16 18:05:33 UTC

All,

I have a bug that points out a significant issue with
/etc/init.d/mount-ro in OpenRC.

Apparently, there are issues that cause it to not work properly for file
systems which happen to be pre-mounted from an initramfs [1].

This service only exists in the Linux world; there is no equivalent in
OpenRC for any other operating systems we support.

The reason it exists is very vague to me; I think it has something to do
with claims of data loss in the past.

I'm asking for more specific information, and if there is none, due to
the bug I lincluded in this message, I am considering removing this
service in 0.21 since I can't find an equivalent anywhere else.

Thanks,

William

[1] https://bugs.gentoo.org/show_bug.cgi?id=573760

Rich Freeman

2016-02-16 18:22:13 UTC

Post by William Hubbs
The reason it exists is very vague to me; I think it has something to do
with claims of data loss in the past.

Is there some other event that will cause all filesystems to be
remounted read-only or unmounted before shutdown?

You definitely will want to either unmount or remount readonly all
filesystems prior to rebooting. I don't think the kernel guarantees
that this will happen (I'd have to look at it). Just doing a sync
before poweroff doesn't seem ideal - if nothing else it will leave
filesystems marked as dirty and likely force fscks on the next boot
(or at least it should - if it doesn't that is another opportunity for
data loss).

There are different ways of accomplishing this of course, but you
really want to have everything read-only in the end.

--
Rich

William Hubbs

2016-02-16 18:41:29 UTC

Post by Rich Freeman

Post by William Hubbs
The reason it exists is very vague to me; I think it has something to do
with claims of data loss in the past.

Is there some other event that will cause all filesystems to be
remounted read-only or unmounted before shutdown?

When localmount/netmount stop they try to unmount file systems they know
about, but they do not try to remount anything.

Post by Rich Freeman
You definitely will want to either unmount or remount readonly all
filesystems prior to rebooting. I don't think the kernel guarantees
that this will happen (I'd have to look at it). Just doing a sync
before poweroff doesn't seem ideal - if nothing else it will leave
filesystems marked as dirty and likely force fscks on the next boot
(or at least it should - if it doesn't that is another opportunity for
data loss).
There are different ways of accomplishing this of course, but you
really want to have everything read-only in the end.

unmounting is easy enough; we already do that.

What I'm trying to figure out is, what to do about re-mounting file
systems read-only.

How does systemd do this? I didn't find an equivalent of the mount-ro
service there.

William

Duncan

2016-02-17 02:20:04 UTC

Post by William Hubbs
What I'm trying to figure out is, what to do about re-mounting file
systems read-only.
How does systemd do this? I didn't find an equivalent of the mount-ro
service there.

For quite some time now, systemd has actually had a mechanism whereby the
main systemd process reexecs (with a pivot-root) the initr* systemd and
returns control to it during the shutdown process, thereby allowing a
more controlled shutdown than traditional init systems because the final
stages are actually running from the virtual-filesystem of the initr*,
such that after everything running on the main root is shutdown, the main
root itself can actually be unmounted, not just mounted read-only,
because there is literally nothing running on it any longer.

There's still a fallback to read-only mounting if an initr* isn't used or
if reinvoking the initr* version fails for some reason, but with an
initr*, when everything's working properly, while there are still some
bits of userspace running, they're no longer actually running off of the
main root, so main root can actually be unmounted much like any other
filesystem.

The process is explained a bit better in the copious blogposted systemd
documentation. Let's see if I can find a link...

OK, this isn't where I originally read about it, which IIRC was aimed
more at admins, while this is aimed at initr* devs, but that's probably a
good thing as it includes more specific detail...

https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/

And here's some more, this time in the storage daemon controlled root and
initr* context...

https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/

But... all that doesn't answer the original question directly, does it?
Where there's no return to initr*, how /does/ systemd handle read-only
mounting?

First, the nice ascii-diagram flow charts in the bootup (7) manpage may
be useful, in particular here, the shutdown diagram (tho IDK if you can
find such things useful or not??).

https://www.freedesktop.org/software/systemd/man/bootup.html

Here's the shutdown diagram described in words:

Initial shutdown is via two targets (as opposed to specific services),
shutdown.target, which conflicts with all (normal) system services
thereby shutting them down, and umount.target, which conflicts with file
mounts, swaps, cryptsetup device, etc. Here, we're obviously interested
in umount.target. Then after those two targets are reached, various low
level services are run or stopped, in ordered to reach final.target.
After final.target, the appropriate systemd-(reboot|poweroff|halt|kexec)
service is run, to hit the ultimate (reboot|poweroff|halt|kexec).target,
which of course is never actually evaluated, since the service actually
does the intended action.

The primary takeaway is that you might not be finding a specific systemd
remount-ro service, because it might be a target, defined in terms of
conflicts with mount units, etc, rather than a specific service.

Neither shutdown.target nor umount.target have any wants or requires by
default, but the various normal services and mount units conflict with
them, either via default or specifically, so are shut down before the
target can be reached.

final.target has the After=shutdown.target umount.target setting, so
won't be reached until they are reached.

The respective (reboot|poweroff|halt|kexec).target units Requires= and
After= their respective systemd-*.service units, and reboot and poweroff
(but not halt and kexec) have 30-minute timeouts after which they run
reboot-force or poweroff-force, respectively.

The respective systemd-(reboot|poweroff|halt|kexec).service units
Requires= and After= shutdown.target, umount.target and final.target, all
three, so won't be run until those complete. They simply
ExecStart=/usr/bin/systemctl --force their respective actions.

And here's what the systemd.special (7) manpage says about umount.target:

umount.target
A special target unit that umounts all mount and automount points
on system shutdown.

Mounts that shall be unmounted on system shutdown shall add
Conflicts dependencies to this unit for their mount unit,
which is implicitly done when DefaultDependencies=yes is set
(the default).

But that /still/ doesn't reveal what actually does the remount-ro, as
opposed to umount. I don't see that either, at the unit level, nor do I
see anything related to it in for instance my auto-generated from fstab
/run/systemd/generators/-.mount file or in the systemd-fstab-generator
(8) manpage.

Thus I must conclude that it's actually resolved in the mount-unit
conflicts handling in systemd's source code, itself.

And indeed... in systemd's tarball, we see in src/core/umount.c, in
mount_points_list_umount...

That the function actually remounts /everything/ (well, everything not in
a container) read-only, before actually trying to umount them. Indention
restandardized on two-space here, to avoid unnecessary wrapping as
posted. This is from systemd-228:

static int mount_points_list_umount(MountPoint **head, bool *changed, bool
log_error) {
MountPoint *m, *n;
int n_failed = 0;

assert(head);

LIST_FOREACH_SAFE(mount_point, m, n, *head) {

/* If we are in a container, don't attempt to
read-only mount anything as that brings no real
benefits, but might confuse the host, as we remount
the superblock here, not the bind mound. */
if (detect_container() <= 0) {
_cleanup_free_ char *options = NULL;
/* MS_REMOUNT requires that the data parameter
* should be the same from the original mount
* except for the desired changes. Since we want
* to remount read-only, we should filter out
* rw (and ro too, because it confuses the kernel) */
(void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL,
&options);

/* We always try to remount directories read-only
* first, before we go on and umount them.
*
* Mount points can be stacked. If a mount
* point is stacked below / or /usr, we
* cannot umount or remount it directly,
* since there is no way to refer to the
* underlying mount. There's nothing we can do
* about it for the general case, but we can
* do something about it if it is aliased
* somehwere else via a bind mount. If we
* explicitly remount the super block of that
* alias read-only we hence should be
* relatively safe regarding keeping the fs we
* can otherwise not see dirty. */
log_info("Remounting '%s' read-only with options '%s'.", m->path,
options);
(void) mount(NULL, m->path, NULL, MS_REMOUNT|MS_RDONLY, options);
}

/* Skip / and /usr since we cannot unmount that
* anyway, since we are running from it. They have
* already been remounted ro. */
if (path_equal(m->path, "/")
#ifndef HAVE_SPLIT_USR
|| path_equal(m->path, "/usr")
#endif
)
continue;

/* Trying to umount. We don't force here since we rely
* on busy NFS and FUSE file systems to return EBUSY
* until we closed everything on top of them. */
log_info("Unmounting %s.", m->path);
if (umount2(m->path, 0) == 0) {
if (changed)
*changed = true;

mount_point_free(head, m);
} else if (log_error) {
log_warning_errno(errno, "Could not unmount %s: %m", m->path);
n_failed++;
}
}

return n_failed;
}

So the short answer ultimately is... Systemd has a single umount
function, which first does remount-ro, so it's actually remounting
(nearly) everything read-only, then tries umount.

Meanwhile, (semi-)answering the elsewhere implied question of why only
Linux needs the mount-ro service... I'm no BSD expert, but in my
wanderings I came across a remark that they didn't need it, because their
kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and-
remount-ro routine for anything that can't be unmounted, which Linux
lacks. They obviously consider this a Linux deficiency, but while I've
not come across the Linux reason for not doing it, an educated guess is
that it's considered putting policy into the kernel, and that's
considered a no-no, policy is userspace; the kernel simply enforces it as
directed (which is why kernel 2.4's devfs was removed for 2.6, to be
replaced with the userspace-based udev). Additionally, not kernel-
forcing the remount-ro bit does give developers a way to test results of
an uncontrolled shutdown, say on a specific testing filesystem only,
without exposing the rest of the system, which can still be shut down
normally, to it.

So on Linux userspace must do the final umounts and force-read-onlys,
because unlike the BSDs, the Linux kernel doesn't have builtin routines
that automatically force it, regardless of userspace.

But as others have said, on Linux the remount-ro is _definitely_
required, and "bad things _will_ happen" if it's not done. (Just how bad
depends on the filesystem and its mount options, and hardware, among
other things.)

Finally, one more thing to mention. On systems with magic-srq in the
kernel...

echo 0x30 > /proc/sys/kernel/sysrq

... enables the sync (0x10) and remount-readonly (0x20) functions. (Of
course only do this at shutdown/reboot, as you don't want to disturb the
user's configured srq defaults in normal runtime.)

You can then force emergency sync (s) and remount-read-only (u) with...

echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger

As that's kernel emergency priority, it should force-sync and force
everything readonly (and quiesce mid-layer layer block devices such as md
and dm), even if it would normally refuse to do so due to files open for
writing. You might consider something like that as a fallback, if normal
mount-readonly fails. Of course it won't work if magic-srq functionality
isn't built into the kernel, but then you're no worse off than before,
and are far better off on kernels where it's supported, so it's certainly
worth considering. =:^)

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Rich Freeman

2016-02-17 13:46:34 UTC

Post by Duncan
Initial shutdown is via two targets (as opposed to specific services),

Since not everybody in this thread may be familiar with systemd, I'll
just add a quick definition.

When systemd says "target" - think "virtual service." The equivalent
in openrc would be an init.d script that has dependencies but which
doesn't actually launch any processes.

Targets also take the place of runlevels in systemd.

Just as with runlevels in openrc they are used to synchronize
milestones during bootup, but the design is much more generic.
Presumably openrc hard-codes runlevels like sysinit, boot, and
default. In systemd I believe it just looks at the config file and
directly launches the desired target, and then the dependency chain
for that pulls in all the targets that precede it. Targets can depend
on other targets, and services can depend on previous targets.

The other dimension is that unit files describe what target they are
typically associated with if it isn't the default. So, you don't run
into the sorts of situations you have with openrc that if you want to
enable mdraid support you need to remember to add it to the boot
runlevel. That might be a relatively-easy thing to add support for in
openrc actually.

Hopefully that makes Dunan's summary easier to read for anybody who
doesn't speak systemdese.

Another bit of background that might be helpful is that systemd also
manages mounts directly. It parses fstab and creates a network of
mount units with appropriate dependencies. Whether you unmount
/var/tmp using "umount /var/tmp" or "systemctl stop var-tmp.mount"
systemd updates the status of the var-tmp.mount unit to a stopped
status. I believe if you add a noauto line to fstab it will create a
mount unit automatically and not start it, and if you made it mount on
boot I think it would actually be mounted as soon as you save the file
in your editor (systemd can monitor config files for changes - all of
this is governed by scripts/software called generators).

So, systemd in general tries to stay aware of the state of mounts. I
suspect that isn't just firing off a script to find/unmount anything
that is mounted. From Dunan's email it sounds like you could create a
mount unit and explicitly not set a conflict with the unmount target
which would cause the filesystem to remain mounted at shutdown. I
have no idea what that would do to unmounting the root.

--
Rich

Duncan

2016-02-18 08:57:11 UTC

Post by Rich Freeman

Post by Duncan
Initial shutdown is via two targets (as opposed to specific services),

Since not everybody in this thread may be familiar with systemd, I'll
just add a quick definition.

Thanks for this followup. =:^)

Post by Rich Freeman
When systemd says "target" - think "virtual service." The equivalent in
openrc would be an init.d script that has dependencies but which doesn't
actually launch any processes.
Targets also take the place of runlevels in systemd.

The systemd official comparison of targets is to runlevels, except much
more flexible as it's actually possible for multiple targets to be in the
process of being reached at once, not services or "virtual services", and
indeed, my immediate internal reaction at seeing the "virtual services"
definition was "no, they're like runlevels", before I even reached the
next paragraph, where you add that.

Basically, I'd put the runlevel comparison first and primary, as systemd
documentation does, tho now that I've seen the usage, "virtual services"
/does/ add some richness to the definition, helping to accent the fact
that multiple targets can be processed at once. So it's a difference in
emphasis, while agreeing in general.

Post by Rich Freeman
Just as with runlevels in openrc they are used to synchronize milestones
during bootup, but the design is much more generic. Presumably openrc
hard-codes runlevels like sysinit, boot, and default. In systemd I
believe it just looks at the config file and directly launches the
desired target, and then the dependency chain for that pulls in all the
targets that precede it. Targets can depend on other targets, and
services can depend on previous targets.
The other dimension is that unit files describe what target they are
typically associated with if it isn't the default. So, you don't run
into the sorts of situations you have with openrc that if you want to
enable mdraid support you need to remember to add it to the boot
runlevel. That might be a relatively-easy thing to add support for in
openrc actually.
Hopefully that makes Dunan's summary easier to read for anybody who
doesn't speak systemdese.

Again, thanks. =:^)

Post by Rich Freeman
Another bit of background that might be helpful is that systemd also
manages mounts directly.

After actually looking into how systemd processes mounts as I wrote the
earlier post, I believe it's safe to say this is a much bigger difference
than it might seem on the surface, and it was certainly one of the
biggest challenges I faced in attempting to explain systemd's mount
processing in sysvinit/openrc terms.

Even with a reasonable level of personal admin-level systemd familiarity,
I kept expecting and looking for systemd to hand off the unmounting to a
service, which I expected would simply call one of the separate from
systemd itself executables (for example journald, networkd, etc, I was
expecting a mount helper or mountd as well), only after seeing the
shutdown flowchart and looking into each individual component, I still
couldn't find it, which is why I ultimately had to go diving directly
into the systemd sources themselves, which is when I found what I was
looking for! =:^)

But in hindsight, while there are various non-systemd-binary executables
that ship with systemd, systemd itself directly processes all unit files,
including mount units, so I should have known that it'd handle umounting
directly. But I simply didn't make that connection, until it became
obvious after the fact, and indeed, it didn't actually solidify in my
mind until just now as I wrote about it, tho certainly, after writing the
earlier post, all the pieces were there to be assembled, and this simply
triggered it. =:^)

Post by Rich Freeman
It parses fstab and creates a network of mount
units with appropriate dependencies. Whether you unmount /var/tmp using
"umount /var/tmp" or "systemctl stop var-tmp.mount" systemd updates the
status of the var-tmp.mount unit to a stopped status. I believe if you
add a noauto line to fstab it will create a mount unit automatically and
not start it, and if you made it mount on boot I think it would actually
be mounted as soon as you save the file in your editor (systemd can
monitor config files for changes - all of this is governed by
scripts/software called generators).

I'm not sure of the last -- systemd does let you mount and umount at will
[1], and there are enough cases where an admin might setup the fstab
entry before he's actually prepared for the mount to go ahead, that I
don't believe it would be wise for systemd to try to jump the gun.

OTOH, it's possible to setup corresponding *.automount units as well,
with the purpose there being to trigger mounting (using the kernel's
automount services, which of course would have to be enabled -- they
aren't here) as soon as something tries to stat/open a path under that
mount point. If you setup both the fstab (triggering mount unit
generation) and automount units, and then referenced something under it,
/then/ systemd would do the automount. =:^)

Of course, if someone's sufficiently interested in the outcome, it should
be relatively easy to test, at least for those /running/ systemd. I
don't personally happen to be /that/ interested in the outcome, however,
so...

Post by Rich Freeman
So, systemd in general tries to stay aware of the state of mounts. I
suspect that isn't just firing off a script to find/unmount anything
that is mounted. From Dunan's email it sounds like you could create a
mount unit and explicitly not set a conflict with the unmount target
which would cause the filesystem to remain mounted at shutdown. I have
no idea what that would do to unmounting the root.

You'd have to deliberately jump thru at least two hoops in ordered to
fail to have that Conflicts=umount.target applied, however, and thus it's
extremely unlikely that it'd happen by accident.

1) You'd have to manually create the mount unit and place it in the
appropriate override location, instead of using fstab and letting systemd-
fstab-generator create the mount unit dynamically at runtime, because the
generator-created versions will all be using default dependencies
(DefaultDependencies=yes being the default), which will add the
conflicts /implicitly/ as part of those default deps (see the quote from
the systemd.special (7) manpage in my previous post).

Note that while the conflicts line is not to be found in the dynamically
runtime created from fstab mount units and thus isn't explicit, the fact
that DefaultDependencies=no isn't explicitly set either, means those
default dependencies apply, and they /implicitly/ include the
conflicts=line.

So it's not explicit, but is never-the-less implicitly there and applies
to any mount units dynamically generated from fstab.

Thus, to avoid it, you'd have to manually create the mount units and use
them in place of fstab entries, and that's explicitly discouraged in the
documentation -- use of fstab is recommended.

(No, I'm not supplying an explicit documentation reference for this as
I'd have to look it up, but I do remember reading it while I was
researching my switch to systemd, and pausing to explicitly make mental
note of it, since it seemed such a contrast to the usual systemd pattern
of uprooting the traditional tools and config methods in favor of its
own, that would seem to be one of the big reasons so many people have
such a strongly negative reaction to systemd. Here, it was actually
doing the opposite, encouraging people to continue to use the traditional
fstab config, and discouraging use of the systemd-native config method,
mount unit files.)

2) In your manually created mount unit, you'd have to explicitly set
DefaultDependencies=no, because otherwise, it too would be subject to the
implicit Conflicts=umount.target

Of course, jumping thru both those hoops is effectively impossible to do
accidentally -- even if you decided without reading the documentation and
noting that fstab usage was still encouraged, or read it and just decided
to do native mount units anyway, the fact that none of the normal mount
units you'd likely be using as examples in creation of your own, don't
have an explicit DefaultDependiencies= line, means you'd have to learn
about it from the documentation, in ordered to even know it was there to
set to no. And then you'd be doing it deliberately.

And if you did it deliberately for root... or for any other non-virtual
filesystem, you would of course get to keep the pieces. =:^)

---
[1] Mount/umount at will: In normal operating mode, anyway. It doesn't
always work that way early in the initr* or in other cases where systemd
is in control but udev hasn't yet been run, tho I suspect the problem is
limited to multi-device btrfs and any other multi-device filesystems that
may be around, generally because the target triggering udev hasn't been
run yet. The reason is that the device units that a particular mount
requires haven't been initiated yet, then.

The *extremely* puzzling to systemd newbies result being that attempting
to mount a filesystem manually can exit success, but you go to actually
use the mount and you'll find it not mounted, and it won't be listed in
/proc/mounts or the like either. What actually happens is that the mount
executable does indeed make the correct kernel calls to do the mount, and
the kernel does indeed do the mount, but systemd immediately umounts
again, because the prerequisite device units haven't been initialized. I
suspect systemd automatically initializes the device fed in either on the
mount commandline or from fstab, but isn't yet prepared to directly
handle multi-device filesystems yet, and thus doesn't automatically
activate the additional device units that it views as required by the
filesystem, so even tho the kernel knows about them (due to a manual run
of btrfs device scan) and can do the mount, systemd doesn't, and believes
the mount-unit requirements aren't met and thus does the immediate umount.

I actually saw one mention of this on the btrfs list before I had a clue
what was going on, I believe without anyone explaining the situation at
the time, and later had it happen to me. Only quite some time after
having experienced it myself, did I realize why systemd was immediately
umounting it -- because it hadn't seen the prerequisite devices yet. And
only now as I write this footnote, am I making the multi-device
connection, which would explain why the problem isn't more widely seen
among those using both systemd and btrfs, or on most other filesystems,
since I now very strongly suspect a multi-device filesystem is a
prerequisite to seeing the problem, as well.

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Raymond Jennings

2016-02-25 23:46:22 UTC

I think this might be one reason that /etc/mtab was deprecated in favor of
a symlink to /proc/mounts :P

4. In the runlevel paradigm you usually think of services running
inside a runlevel (perhaps this isn't strictly true, but most people
think this way, in part because runlevels don't change much). In
systemd this really isn't the case. Services run before targets, or
after them. A target won't be considered running if anything it depends
on isn't running.

Some minor additional notes, with the first one being here.
Systemd target units are analogous to edge-triggered interrupts, which
they resemble in that they are simply "synchronization points" (the term
used in the systemd.target (5) manpage itself). Level-triggered
interrupts can be held on or held off (high or low), but edge-triggered a
re simply events that occur and then are passed as time moves on. As
such, targets can be started, but not normally (while the job queue is
idle) stopped, as they de-assert as soon as they are actually reached,
tho many of their requirements generally continue to run until stopped by
some other event, often conflicts= against some other target or general
unit being started, or specific admin systemctl stop.
Tho the systemd FAQ suggests this wasn't always so, as it suggests using
systemctl list-units --type=target in answer to a question about how to
find what "runlevel" you're in. That command seems to return nothing,
here, tho, at least while no target is actively starting, so it would
seem that answer's a bit dated.
It can be noted, however, that certain units, most often specific targets
intended to be specifically invokable by users, can be "isolated", as
they have the AllowIsolate unit setting. Systemctl isolate <unit> will
then cause it to be started exclusively, stopping anything that's not a
dependency of that unit. The systemctl emergency, rescue, reboot,
shutdown, etc, commands, then become effectively shortcuts to the longer
systemctl isolate <named-target-unit> command form.

5. I'd have to check, but I wouldn't be surprised if systemd doesn't
actually require specifying a target at all. Your default "runlevel"
could be apache2.service, which means the system would boot and launch
everything necessary to get apache working, and it probably wouldn't
even spawn a getty. This is NOT analogous to just putting only apache2
in /etc/runlevels/default, because in that example openrc is running the
default runlevel, and it only pulls in apache2. Systemd is purely
dependency driven and when you tell it to make graphical.target the
default runlevel it is like running emerge kde-meta. If all you wanted
was kde-runtime you wouldn't redefine kde-meta to pull in only
kde-runtime, you'd just run emerge kde-runtime. Again, I haven't tested
this, but I'd be shocked if it didn't work. Of course, specifying a
service as a default instead of a target is very limiting, but it would
probably work. Heck, you could probably specify a mount as the default
and the system would just boot and mount something and sit there. Or
you could make it a socket and all it would do is sit and listen for a
connection inetd-style.

As mentioned both in the systemd FAQ and in the systemd.special (7)
manpage, under default.target, this is the default unit started at
bootup. Normally, it'll be a symlink to either multi-user.target
(analogous to sysvinit semi-standard runlevel 3, CLI, no X), or
graphical.target (analogous to sysvinit semi-standard runlevel 5,
launching X and and a graphical *DM login).
I don't see specific documentation of whether symlinking to a non-target
unit is allowed, but systemd does have a commandline option --unit=,
which is explicitly documented to take a _unit_, default.target being the
default, but other non-target units being possible as well. Presumably
systemd would examine said unit, looking for DefaultDependencies=no, and
if not specifically set, would start the early "system level" targets,
before starting the named unit in place of the normal default.target.
So it's definitely possible to do via systemd commandline, but I'm not
sure if default.target is followed if it doesn't symlink a target unit,
or not. I'd guess yes, but have neither seen it specifically documented
nor tested it myself, nor read of anyone else actually testing it.

I find it more helpful to think of targets as just units that don't do
anything. We don't use them in openrc but I suspect it would work out
of the box, and maybe we should even consider doing it in at least some
cases. For example, right now /etc/init.d/samba uses some scripting to
launch both nmbd/smbd and fancy config file parsing to let the users
control which ones launch. You could instead break that into three
files - smbd, nmbd, and samba. The first two would launch one daemon
each, and the samba init.d script wouldn't actually launch anything, but
would just depend on the others. That would be the systemd target
approach.

It should work in openrc, yes, because not all functions need filled in.
It's quite possible to have openrc initscripts that have only a start or
a stop, not both, for instance, and I remember actually creating custom
initscripts of that nature, back when I was still on openrc. So without
/both/ start and stop, only dependencies, should work too, unless there's
a specific error check for that in openrc, and I can't see why there
would be.

Apologies if this is a bit off-topic in an openrc discussion, but I
think the concept of virtual services is a potentially powerful one, and
I think that it might be something openrc would actually benefit from
using.

=:^)
It would certainly simplify things like the named chroot stuff, that
being what I'm familiar with from my openrc days, and from the sounds of
it based on other posts, apache, too, as you'd then have a virtual
service pulling in multiple modular dependencies, instead of a seriously
complex hairball of a single service trying to cram in all this
additional functionality that really should be in other modules, making
things less complex for both maintainers and admin-users.

However, what I will say is until you actually appreciate that systemd
targets are just virtual units then you'll probably find the entire
systemd startup process to be an indecipherable mess. Not groking this
stuff also makes it easy to incorrectly specify dependencies. I'm sure a
few of us running openrc over the years have accidentially stuck a
service in the wrong runlevel and had something break. Well, in systemd
you might have 47 "runlevels" not actually starting in any particular
order so it is much easier to get it wrong if you don't realize how they
work. They aren't strictly sequential, so there isn't always one
"runlevel" that always comes last that you can be lazy and stick
something "in."

Umm... for anyone following systemd documentation, including most non-
early/late-system service units whether shipped by systemd itself or
other system service upstreams or distro maintainers...
multi-user.target is roughly equivalent to the standard sysvinit runlevel
3 (that being CLI operation and default system services).
graphical.target is roughly equivalent to the standard sysvinit runlevel
5 (that being X/XDM graphical login).
And graphical.target specifically Requires=multi-user.target, thereby
pulling in all its dependencies as well. So multi-user.target is the
standard "wanted-by/wants" single "runlevel analog" for CLI services and
where nearly everything that doesn't have specific reason to be somewhere
else ends up, if enabled.
But of course, that's not guaranteed, just documented default and the
standard nearly all shipped service units use, and individual distros/
sites/installations may well be setup entirely differently, if they have
specific reason for it.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Richard Yao

2016-02-17 14:24:44 UTC

Post by Duncan

Post by William Hubbs
What I'm trying to figure out is, what to do about re-mounting file
systems read-only.
How does systemd do this? I didn't find an equivalent of the mount-ro
service there.

For quite some time now, systemd has actually had a mechanism whereby the
main systemd process reexecs (with a pivot-root) the initr* systemd and
returns control to it during the shutdown process, thereby allowing a
more controlled shutdown than traditional init systems because the final
stages are actually running from the virtual-filesystem of the initr*,
such that after everything running on the main root is shutdown, the main
root itself can actually be unmounted, not just mounted read-only,
because there is literally nothing running on it any longer.
There's still a fallback to read-only mounting if an initr* isn't used or
if reinvoking the initr* version fails for some reason, but with an
initr*, when everything's working properly, while there are still some
bits of userspace running, they're no longer actually running off of the
main root, so main root can actually be unmounted much like any other
filesystem.

Systemd installs that go back into the initramfs at shutdown are rare because there is a hook for the initramfs to tell systemd that it should re-exec it and very few configurations do that. Even fewer that do it actually need it.

The biggest user of that mechanism of which I am aware is ZFS on EL/Fedora when booted with Dracut. It does not need it and it was only implemented was that someone who did not understand how ZFS was designed to integrate with the boot and startup processes thought it was a good idea.

As it turns out, that behavior actually breaks the mechanism intended to make multipath sane by marking the pool in such a way that it tells all systems with access to the disks that a pool that will be used on next boot is not going to be used by anyone. If they import it and the system boots, the pool can be damaged beyond repair.

Thankfully, no one seems to boot EL/Fedora systems off ZFS pools in multipath environments. The code to hook into this special behavior will be removed in the future, but that is a low priority as none of the developers' employers care about it and the almost negligible possibility that the mechanism would save someone from data loss has made it too low of a priority for any of us to spend our free time on it.

Post by Duncan
The process is explained a bit better in the copious blogposted systemd
documentation. Let's see if I can find a link...
OK, this isn't where I originally read about it, which IIRC was aimed
more at admins, while this is aimed at initr* devs, but that's probably a
good thing as it includes more specific detail...
https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/
And here's some more, this time in the storage daemon controlled root and
initr* context...
https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/
But... all that doesn't answer the original question directly, does it?
Where there's no return to initr*, how /does/ systemd handle read-only
mounting?
First, the nice ascii-diagram flow charts in the bootup (7) manpage may
be useful, in particular here, the shutdown diagram (tho IDK if you can
find such things useful or not??).
https://www.freedesktop.org/software/systemd/man/bootup.html
Initial shutdown is via two targets (as opposed to specific services),
shutdown.target, which conflicts with all (normal) system services
thereby shutting them down, and umount.target, which conflicts with file
mounts, swaps, cryptsetup device, etc. Here, we're obviously interested
in umount.target. Then after those two targets are reached, various low
level services are run or stopped, in ordered to reach final.target.
After final.target, the appropriate systemd-(reboot|poweroff|halt|kexec)
service is run, to hit the ultimate (reboot|poweroff|halt|kexec).target,
which of course is never actually evaluated, since the service actually
does the intended action.
The primary takeaway is that you might not be finding a specific systemd
remount-ro service, because it might be a target, defined in terms of
conflicts with mount units, etc, rather than a specific service.
Neither shutdown.target nor umount.target have any wants or requires by
default, but the various normal services and mount units conflict with
them, either via default or specifically, so are shut down before the
target can be reached.
final.target has the After=shutdown.target umount.target setting, so
won't be reached until they are reached.
The respective (reboot|poweroff|halt|kexec).target units Requires= and
After= their respective systemd-*.service units, and reboot and poweroff
(but not halt and kexec) have 30-minute timeouts after which they run
reboot-force or poweroff-force, respectively.
The respective systemd-(reboot|poweroff|halt|kexec).service units
Requires= and After= shutdown.target, umount.target and final.target, all
three, so won't be run until those complete. They simply
ExecStart=/usr/bin/systemctl --force their respective actions.
umount.target
A special target unit that umounts all mount and automount points
on system shutdown.
Mounts that shall be unmounted on system shutdown shall add
Conflicts dependencies to this unit for their mount unit,
which is implicitly done when DefaultDependencies=yes is set
(the default).
But that /still/ doesn't reveal what actually does the remount-ro, as
opposed to umount. I don't see that either, at the unit level, nor do I
see anything related to it in for instance my auto-generated from fstab
/run/systemd/generators/-.mount file or in the systemd-fstab-generator
(8) manpage.
Thus I must conclude that it's actually resolved in the mount-unit
conflicts handling in systemd's source code, itself.
And indeed... in systemd's tarball, we see in src/core/umount.c, in
mount_points_list_umount...
That the function actually remounts /everything/ (well, everything not in
a container) read-only, before actually trying to umount them. Indention
restandardized on two-space here, to avoid unnecessary wrapping as
static int mount_points_list_umount(MountPoint **head, bool *changed, bool
log_error) {
MountPoint *m, *n;
int n_failed = 0;
assert(head);
LIST_FOREACH_SAFE(mount_point, m, n, *head) {
/* If we are in a container, don't attempt to
read-only mount anything as that brings no real
benefits, but might confuse the host, as we remount
the superblock here, not the bind mound. */
if (detect_container() <= 0) {
_cleanup_free_ char *options = NULL;
/* MS_REMOUNT requires that the data parameter
* should be the same from the original mount
* except for the desired changes. Since we want
* to remount read-only, we should filter out
* rw (and ro too, because it confuses the kernel) */
(void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL,
&options);
/* We always try to remount directories read-only
* first, before we go on and umount them.
*
* Mount points can be stacked. If a mount
* point is stacked below / or /usr, we
* cannot umount or remount it directly,
* since there is no way to refer to the
* underlying mount. There's nothing we can do
* about it for the general case, but we can
* do something about it if it is aliased
* somehwere else via a bind mount. If we
* explicitly remount the super block of that
* alias read-only we hence should be
* relatively safe regarding keeping the fs we
* can otherwise not see dirty. */
log_info("Remounting '%s' read-only with options '%s'.", m->path,
options);
(void) mount(NULL, m->path, NULL, MS_REMOUNT|MS_RDONLY, options);
}
/* Skip / and /usr since we cannot unmount that
* anyway, since we are running from it. They have
* already been remounted ro. */
if (path_equal(m->path, "/")
#ifndef HAVE_SPLIT_USR
|| path_equal(m->path, "/usr")
#endif
)
continue;
/* Trying to umount. We don't force here since we rely
* on busy NFS and FUSE file systems to return EBUSY
* until we closed everything on top of them. */
log_info("Unmounting %s.", m->path);
if (umount2(m->path, 0) == 0) {
if (changed)
*changed = true;
mount_point_free(head, m);
} else if (log_error) {
log_warning_errno(errno, "Could not unmount %s: %m", m->path);
n_failed++;
}
}
return n_failed;
}
So the short answer ultimately is... Systemd has a single umount
function, which first does remount-ro, so it's actually remounting
(nearly) everything read-only, then tries umount.
Meanwhile, (semi-)answering the elsewhere implied question of why only
Linux needs the mount-ro service... I'm no BSD expert, but in my
wanderings I came across a remark that they didn't need it, because their
kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and-
remount-ro routine for anything that can't be unmounted, which Linux
lacks. They obviously consider this a Linux deficiency, but while I've
not come across the Linux reason for not doing it, an educated guess is
that it's considered putting policy into the kernel, and that's
considered a no-no, policy is userspace; the kernel simply enforces it as
directed (which is why kernel 2.4's devfs was removed for 2.6, to be
replaced with the userspace-based udev). Additionally, not kernel-
forcing the remount-ro bit does give developers a way to test results of
an uncontrolled shutdown, say on a specific testing filesystem only,
without exposing the rest of the system, which can still be shut down
normally, to it.
So on Linux userspace must do the final umounts and force-read-onlys,
because unlike the BSDs, the Linux kernel doesn't have builtin routines
that automatically force it, regardless of userspace.
But as others have said, on Linux the remount-ro is _definitely_
required, and "bad things _will_ happen" if it's not done. (Just how bad
depends on the filesystem and its mount options, and hardware, among
other things.)
Finally, one more thing to mention. On systems with magic-srq in the
kernel...
echo 0x30 > /proc/sys/kernel/sysrq
... enables the sync (0x10) and remount-readonly (0x20) functions. (Of
course only do this at shutdown/reboot, as you don't want to disturb the
user's configured srq defaults in normal runtime.)
You can then force emergency sync (s) and remount-read-only (u) with...
echo s > /proc/sysrq-trigger
echo u > /proc/sysrq-trigger
As that's kernel emergency priority, it should force-sync and force
everything readonly (and quiesce mid-layer layer block devices such as md
and dm), even if it would normally refuse to do so due to files open for
writing. You might consider something like that as a fallback, if normal
mount-readonly fails. Of course it won't work if magic-srq functionality
isn't built into the kernel, but then you're no worse off than before,
and are far better off on kernels where it's supported, so it's certainly
worth considering. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Rich Freeman

2016-02-17 17:19:52 UTC

Post by Richard Yao
Systemd installs that go back into the initramfs at shutdown are rare because there is a
hook for the initramfs to tell systemd that it should re-exec it and very few configurations
do that. Even fewer that do it actually need it.

While I won't debate that it probably isn't strictly essential, dracut
handles unmounting root for systemd just fine (well, at least on
non-nfs - the version I'm using with an nfs root struggles in this
regard, though unclean shutdown on nfs with no files open probably
isn't really a problem).

Is dracut still not widely used? I know that it was all the fashion
for a decade or two for every distro to build their own initramfs, but
I don't get why anybody wouldn't just make the switch - it is far more
capable and configurable.

--
Rich

James Le Cuirot

2016-02-17 17:30:09 UTC

On Wed, 17 Feb 2016 12:19:52 -0500

Post by Rich Freeman
Is dracut still not widely used? I know that it was all the fashion
for a decade or two for every distro to build their own initramfs, but
I don't get why anybody wouldn't just make the switch - it is far more
capable and configurable.

Does anyone know what most Gentoo users are doing these days? I don't
recall the handbook mentioning initramfs at all back in 2002 because it
wasn't really needed back then. I did without for years until I finally
put / on LVM. I used lvm2create_initrd for a while but that was still
quite a manual process and I couldn't imagine going back to it now.
I've switched to Dracut and it's great but I don't get the impression
that Gentoo really endorses that option over the more laborious ones.
Maybe it should?

https://wiki.gentoo.org/wiki/Initramfs

--
James Le Cuirot (chewi)
Gentoo Linux Developer

Ian Stakenvicius

2016-02-17 18:06:54 UTC

On Wed, 17 Feb 2016 12:19:52 -0500 Rich Freeman

Post by Rich Freeman
Is dracut still not widely used? I know that it was all the
fashion for a decade or two for every distro to build their own
initramfs, but I don't get why anybody wouldn't just make the
switch - it is far more capable and configurable.

Does anyone know what most Gentoo users are doing these days? I
don't recall the handbook mentioning initramfs at all back in
2002 because it wasn't really needed back then. I did without for
years until I finally put / on LVM. I used lvm2create_initrd for
a while but that was still quite a manual process and I couldn't
imagine going back to it now. I've switched to Dracut and it's
great but I don't get the impression that Gentoo really endorses
that option over the more laborious ones. Maybe it should?
https://wiki.gentoo.org/wiki/Initramfs

Genkernel's initramfs generation was what we endorsed for the most
part, until dracut came around. it's hard to say what "most" are
doing but i expect dracut and genkernel based initramfs's make up
the vast majority in use by gentoo users, with a small minority
rolling their own through other means.

Rich Freeman

2016-02-17 18:32:18 UTC

Post by Ian Stakenvicius
Genkernel's initramfs generation was what we endorsed for the most
part, until dracut came around. it's hard to say what "most" are
doing but i expect dracut and genkernel based initramfs's make up
the vast majority in use by gentoo users, with a small minority
rolling their own through other means.

While I personally endorse dracut over genkernel, the reality is that
only genkernel is actually documented in the handbook. This is due at
least in part to laziness on my part as I've been meaning to add it
since forever.

Likewise I intend to update the handbook to make selection of
openrc/systemd less convoluted as well. The current handbook does
offer systemd as an option but then basically refers you out to
another page that doesn't follow the same flow as the handbook.

In my notes I've found that it is a pretty trivial change to pick one
or the other actually if you do it at the right time, so this could be
added to the handbook with very little disruption to the flow for
non-systemd users. I imagine other service managers would be similar,
or even simpler. I found that switching between the two only requires
two changes - one is to pick a systemd profile relatively early in the
process before doing a world update, and then changing one line in
your grub config at the end. If you emerge world after you do most of
your system configuration systemd will automatically pick up all the
openrc configuration and use it, which as a bonus leaves you with a
system that is easy to boot in either mode.

Getting back to dracut - it is really just a few lines added as an
alternative to the initramfs section. After you build your kernel it
is really just a one-liner, and grub2-mkconfig picks up on it
automatically (as I imagine it probably does with genkernel as well).
Unless you want to play with the configuration there isn't much fuss.

I think we really should give strong consideration to recommending
dracut as a default, while of course preserving the option of
genkernel. I'm certainly open to feedback if there is some use case
where genkernel is better, but dracut is cross-distro, gives you
options to easily maximize or minimize your config, and is really easy
to tailor with modules.

--
Rich

Richard Yao

2016-02-18 03:11:42 UTC

Post by Rich Freeman

Post by Ian Stakenvicius
Genkernel's initramfs generation was what we endorsed for the most
part, until dracut came around. it's hard to say what "most" are
doing but i expect dracut and genkernel based initramfs's make up
the vast majority in use by gentoo users, with a small minority
rolling their own through other means.

While I personally endorse dracut over genkernel, the reality is that
only genkernel is actually documented in the handbook. This is due at
least in part to laziness on my part as I've been meaning to add it
since forever.
Likewise I intend to update the handbook to make selection of
openrc/systemd less convoluted as well. The current handbook does
offer systemd as an option but then basically refers you out to
another page that doesn't follow the same flow as the handbook.
In my notes I've found that it is a pretty trivial change to pick one
or the other actually if you do it at the right time, so this could be
added to the handbook with very little disruption to the flow for
non-systemd users. I imagine other service managers would be similar,
or even simpler. I found that switching between the two only requires
two changes - one is to pick a systemd profile relatively early in the
process before doing a world update, and then changing one line in
your grub config at the end. If you emerge world after you do most of
your system configuration systemd will automatically pick up all the
openrc configuration and use it, which as a bonus leaves you with a
system that is easy to boot in either mode.
Getting back to dracut - it is really just a few lines added as an
alternative to the initramfs section. After you build your kernel it
is really just a one-liner, and grub2-mkconfig picks up on it
automatically (as I imagine it probably does with genkernel as well).
Unless you want to play with the configuration there isn't much fuss.

dracut does not assist those who do not want generic kernel
configurations. Unfortunately, the handbook does not do a good job in
saying that the initramfs generation and generic kernel configurations
are optional.

Post by Rich Freeman
I think we really should give strong consideration to recommending
dracut as a default, while of course preserving the option of
genkernel. I'm certainly open to feedback if there is some use case
where genkernel is better, but dracut is cross-distro, gives you
options to easily maximize or minimize your config, and is really easy
to tailor with modules.

There is no default and system boot without an initramfs not only works,
but is advisable for faster boot unless something fancy is being done
that needs it.

Claiming to pick a default between genkernel and dracut when both are
optional makes no sense, especially since dracut's capabilities
(initramfs generation) are a subset of genkernel's (initramfs generation
and kernel builds). dracut could replace genkernel's initramfs
generation capabilities, but it simply cannot replace genkernel for
building a generic kernel. It was never intended to do that.

By the way, pver the course of time, there have been things genkernel
did better and things dracut did better. It is unlikely one will ever be
superior to the other. However, some feedback on what genkernel does
poorly versus dracut and could therefore improve would be helpful.

Daniel Campbell

2016-02-17 21:50:40 UTC

Post by James Le Cuirot

Post by Rich Freeman
Is dracut still not widely used? I know that it was all the
fashion for a decade or two for every distro to build their own
initramfs, but I don't get why anybody wouldn't just make the
switch - it is far more capable and configurable.

Does anyone know what most Gentoo users are doing these days? I
don't recall the handbook mentioning initramfs at all back in 2002
because it wasn't really needed back then. I did without for years
until I finally put / on LVM. I used lvm2create_initrd for a while
but that was still quite a manual process and I couldn't imagine
going back to it now. I've switched to Dracut and it's great but I
don't get the impression that Gentoo really endorses that option
over the more laborious ones. Maybe it should?
https://wiki.gentoo.org/wiki/Initramfs

I went without an initrd until I encrypted my / and put LVM inside it.
I used genkernel's initrd functions for that with a
manually-configured kernel.

- --
Daniel Campbell - Gentoo Developer
OpenPGP Key: 0x1EA055D6 @ hkp://keys.gnupg.net
fpr: AE03 9064 AE00 053C 270C 1DE4 6F7A 9091 1EA0 55D6

Richard Yao

2016-02-18 03:02:00 UTC

Post by Rich Freeman

Post by Richard Yao
Systemd installs that go back into the initramfs at shutdown are rare because there is a
hook for the initramfs to tell systemd that it should re-exec it and very few configurations
do that. Even fewer that do it actually need it.

While I won't debate that it probably isn't strictly essential, dracut
handles unmounting root for systemd just fine (well, at least on
non-nfs - the version I'm using with an nfs root struggles in this
regard, though unclean shutdown on nfs with no files open probably
isn't really a problem).

Dracut handling it well is not up for dispute. When I checked last year,
dracut simply did not tell systemd to use this functionality because it
was unnecessary functionality that only served to slowed down the
shutdown process. It only enables it when a driver indicates an actual
need, which is the way that it should be.

Post by Rich Freeman
Is dracut still not widely used? I know that it was all the fashion
for a decade or two for every distro to build their own initramfs, but
I don't get why anybody wouldn't just make the switch - it is far more
capable and configurable.

Not many Gentoo users use dracut. It does not handle kernel compilation
or bootloader configuration. It is definite ahead of genkernel in
networking though, but there is not much demand for that among users.

Richard Yao

2016-02-17 14:05:49 UTC

Post by William Hubbs

Post by Rich Freeman

Post by William Hubbs
The reason it exists is very vague to me; I think it has something to do
with claims of data loss in the past.

Is there some other event that will cause all filesystems to be
remounted read-only or unmounted before shutdown?

When localmount/netmount stop they try to unmount file systems they know
about, but they do not try to remount anything.

Post by Rich Freeman
You definitely will want to either unmount or remount readonly all
filesystems prior to rebooting. I don't think the kernel guarantees
that this will happen (I'd have to look at it). Just doing a sync
before poweroff doesn't seem ideal - if nothing else it will leave
filesystems marked as dirty and likely force fscks on the next boot
(or at least it should - if it doesn't that is another opportunity for
data loss).
There are different ways of accomplishing this of course, but you
really want to have everything read-only in the end.

unmounting is easy enough; we already do that.
What I'm trying to figure out is, what to do about re-mounting file
systems read-only.
How does systemd do this? I didn't find an equivalent of the mount-ro
service there.

One idea proposed by systemd that is almost never used in production is to fall back to an initramfs environment to undo the boot process by umounting /. It would not surprise me if the normal case were hard coded to remount / as ro because you risk filesystem corruption otherwise. Journaling filesystems are fairly good at surviving that, but you are still taking a risk due to partial writes and anyone using ext2 would be taking a much bigger gamble.

Post by William Hubbs
William

Patrick Lauer

2016-02-16 19:31:54 UTC

Post by William Hubbs
All,
I have a bug that points out a significant issue with
/etc/init.d/mount-ro in OpenRC.
Apparently, there are issues that cause it to not work properly for file
systems which happen to be pre-mounted from an initramfs [1].

I don't understand how this fails, how does mounting from the initramfs
cause issues?

The failure message comes from rc-mount.sh when the list of PIDs using a
mountpoint includes "$$" which is shell shorthand for self. How can the
current shell claim to be using /usr when it is a shell that only has
dependencies in $LIBDIR ?
As far as I can tell the code at this point calls fuser -k ${list of
pids}, and fuser outputs all PIDs that still use it. I don't see how $$
can end up in there ...

Post by William Hubbs
This service only exists in the Linux world; there is no equivalent in
OpenRC for any other operating systems we support.
The reason it exists is very vague to me; I think it has something to do
with claims of data loss in the past.

Yes, if you just shut down without unmounting file systems -
(1) you may throw away data in the FS cache that hasn't ended up on disk yet
(2) the filesystem has no chance to mark itself cleanly unmounted, so
you will trigger journal replay or fsck or equivalent on boot

That's why sysvinit had a random "sleep(1)" in the halt and "sleep(2)"
in the reboot function, to give computers more of a chance to shutdown
and reboot sanely.

The changes in sysvinit-2.88-r8 and later add the "-n" option:
-n Don't sync before reboot or halt. Note that the kernel and
storage drivers may still sync.

This was added *because* we can guarantee that filesystems are
consistent enough with mount-ro. If you wish to remove it you need to
reconsider all these little details ...

Post by William Hubbs
I'm asking for more specific information, and if there is none, due to
the bug I lincluded in this message, I am considering removing this
service in 0.21 since I can't find an equivalent anywhere else.

Please don't just remove things you don't understand.

Post by William Hubbs
Thanks,
William
[1] https://bugs.gentoo.org/show_bug.cgi?id=573760

Looking at the init script as of openrc-0.20.5:

~line32:
# Bug 381783
local rc_svcdir=$(echo $RC_SVCDIR | sed
's:/lib\(32\|64\)\?/:/lib(32|64)?/:g')
This looks relatively useless with everything migrated to /run and can
most likely be removed

~line35:
local
m="/dev|/dev/.*|/proc|/proc.*|/sys|/sys/.*|/run|${rc_svcdir}" x= fs=
Since this is a regexp it can be cut down to something more simple - why
/dev and /dev/* when the second one is already excluded by the first
one. Also rc_svcdir is most likely a subdir of /run ...

Rich Freeman

2016-02-16 20:18:46 UTC

Post by Patrick Lauer
The failure message comes from rc-mount.sh when the list of PIDs using a
mountpoint includes "$$" which is shell shorthand for self. How can the
current shell claim to be using /usr when it is a shell that only has
dependencies in $LIBDIR ?
As far as I can tell the code at this point calls fuser -k ${list of
pids}, and fuser outputs all PIDs that still use it. I don't see how $$
can end up in there ...

What does openrc do when the script fails? Just shut down the system anyway?

If you're going to shut down the system anyway then I'd just force the
read-only mount even if it is in use. That will cause less risk of
data loss than leaving it read-write.

Of course, it would be better still to kill anything that could
potentially be writing to it.

--
Rich

Richard Yao

2016-02-17 14:06:21 UTC

Post by Rich Freeman

Post by Patrick Lauer
The failure message comes from rc-mount.sh when the list of PIDs using a
mountpoint includes "$$" which is shell shorthand for self. How can the
current shell claim to be using /usr when it is a shell that only has
dependencies in $LIBDIR ?
As far as I can tell the code at this point calls fuser -k ${list of
pids}, and fuser outputs all PIDs that still use it. I don't see how $$
can end up in there ...

What does openrc do when the script fails? Just shut down the system anyway?
If you're going to shut down the system anyway then I'd just force the
read-only mount even if it is in use. That will cause less risk of
data loss than leaving it read-write.
Of course, it would be better still to kill anything that could
potentially be writing to it.

Agreed.

Post by Rich Freeman
--
Rich

Andrew Savchenko

2016-02-17 19:01:54 UTC

Post by Rich Freeman

Post by Patrick Lauer
The failure message comes from rc-mount.sh when the list of PIDs using a
mountpoint includes "$$" which is shell shorthand for self. How can the
current shell claim to be using /usr when it is a shell that only has
dependencies in $LIBDIR ?
As far as I can tell the code at this point calls fuser -k ${list of
pids}, and fuser outputs all PIDs that still use it. I don't see how $$
can end up in there ...

What does openrc do when the script fails? Just shut down the system anyway?
If you're going to shut down the system anyway then I'd just force the
read-only mount even if it is in use. That will cause less risk of
data loss than leaving it read-write.
Of course, it would be better still to kill anything that could
potentially be writing to it.

This is not always possible. Two practical cases from my experience:

1) NFS v4 shares can't be unmounted if server is unreachable (even
with -f). If filesystem (e.g. /home or /) contains such unmounted
mount points, it can't be unmounted as well, because it is still in
use. This happens quite often if both NFS server and client are
running from UPS on low power event (when AC power failed and
battery is almost empty).

2) LUKS device is in frozen state. I use this as a security
precaution if LUKS fails to unmount (or it takes too long), e.g.
due to dead mount point.

As far as I understand, mount-ro may be useful only if unmount
failed, but from my practical experience, openrc just hangs forever
in such case until UPS is shut down by battery drain.

Best regards,
Andrew Savchenko

Rich Freeman

2016-02-17 19:26:45 UTC

Post by Andrew Savchenko
1) NFS v4 shares can't be unmounted if server is unreachable (even
with -f). If filesystem (e.g. /home or /) contains such unmounted
mount points, it can't be unmounted as well, because it is still in
use. This happens quite often if both NFS server and client are
running from UPS on low power event (when AC power failed and
battery is almost empty).

Perhaps at least the behavior in this case should be configurable
(timeouts, infinite or otherwise).

If you can't contact a remote nfs server then I believe it is possible
that unwritten changes are in buffers/etc. Depending on circumstances
I could see either pausing until the server comes back or discarding
changes and powering off could either be the appropriate behavior.

Ultimately, anything not on the disk is always at risk, and any
filesystem needs to provide for unclean shutdown to be truly robust.
A kernel panic/etc could cause loss of all data in buffers without
warning. However, barring that we should of course engineer openrc to
shut down in the most clean manner possible, and this should include
read-only mounts for anything which can't be unmounted.

And even systemd+dracut struggles to cleanly unmount NFS roots in the
versions I'm running at least, so that is an edge case that doesn't
get much testing.

--
Rich

Richard Yao

2016-02-18 03:26:32 UTC

Post by Andrew Savchenko

Post by Rich Freeman

Post by Patrick Lauer
The failure message comes from rc-mount.sh when the list of PIDs using a
mountpoint includes "$$" which is shell shorthand for self. How can the
current shell claim to be using /usr when it is a shell that only has
dependencies in $LIBDIR ?
As far as I can tell the code at this point calls fuser -k ${list of
pids}, and fuser outputs all PIDs that still use it. I don't see how $$
can end up in there ...

What does openrc do when the script fails? Just shut down the system anyway?
If you're going to shut down the system anyway then I'd just force the
read-only mount even if it is in use. That will cause less risk of
data loss than leaving it read-write.
Of course, it would be better still to kill anything that could
potentially be writing to it.

1) NFS v4 shares can't be unmounted if server is unreachable (even
with -f). If filesystem (e.g. /home or /) contains such unmounted
mount points, it can't be unmounted as well, because it is still in
use. This happens quite often if both NFS server and client are
running from UPS on low power event (when AC power failed and
battery is almost empty).

Does `umount -l /path/to/mnt` work on those?

Post by Andrew Savchenko
2) LUKS device is in frozen state. I use this as a security
precaution if LUKS fails to unmount (or it takes too long), e.g.
due to dead mount point.

This gives me another reason to justify being a fan of integrating
encryption directly into a filesystem or using ecryptfs on top of the
VFS. The others were possible integrity concerns (which definitely
happen with a frozen state, although mine were about excessive layering
adding opportunities for bugs) and performance concerns from doing
unnecessary calculations on filesystems that span multiple disks (e.g.
each mirror member gets encrypted independently).

Post by Andrew Savchenko
As far as I understand, mount-ro may be useful only if unmount
failed, but from my practical experience, openrc just hangs forever
in such case until UPS is shut down by battery drain.

It is useful even if mounting everything succeeds because of /. That
said, I do not think we can sanely handle every possible configuration
because someone will always come up with something new.

Post by Andrew Savchenko
Best regards,
Andrew Savchenko

Daniel Campbell

2016-02-16 20:03:53 UTC

Post by William Hubbs
All,
I have a bug that points out a significant issue with
/etc/init.d/mount-ro in OpenRC.
Apparently, there are issues that cause it to not work properly for
file systems which happen to be pre-mounted from an initramfs [1].
This service only exists in the Linux world; there is no equivalent
in OpenRC for any other operating systems we support.
The reason it exists is very vague to me; I think it has something
to do with claims of data loss in the past.
I'm asking for more specific information, and if there is none, due
to the bug I lincluded in this message, I am considering removing
this service in 0.21 since I can't find an equivalent anywhere
else.
Thanks,
William
[1] https://bugs.gentoo.org/show_bug.cgi?id=573760

I have a LUKS-encrypted / with LVM volumes within it, and they
(apparently?) need to be remounted read-only upon shutdown. I've not
found an easy way to take a picture of the process so I can indicate
*why*, but I've seen it scroll by.

I use an initramfs to decrypt the LUKS partition and discover the LVM
volumes. I can only assume it goes read-only before shutdown/reboot
because it needs to.

Just my two cents.

- --
Daniel Campbell - Gentoo Developer
OpenPGP Key: 0x1EA055D6 @ hkp://keys.gnupg.net
fpr: AE03 9064 AE00 053C 270C 1DE4 6F7A 9091 1EA0 55D6

Luca Barbato

2016-02-17 08:24:58 UTC

Post by William Hubbs
All,
I have a bug that points out a significant issue with
/etc/init.d/mount-ro in OpenRC.
Apparently, there are issues that cause it to not work properly for file
systems which happen to be pre-mounted from an initramfs [1].

Who is using that file system? Ideally if "we" are the last user of the
file system it should be safe to mount ro it as well.

In general this happens when there is a "too smart to fit /" filesystem
in use.

In general that means that the same stuff used in /usr to mount it
should live in the initrd...

In general deprecating split-/usr moves the problem in in supporting fat
initrds to begin with. (I guess needing a boot filesystem that is fuse
based and needs rabbitmq or postgresql might be extra fun btw).

Post by William Hubbs
This service only exists in the Linux world; there is no equivalent in
OpenRC for any other operating systems we support.

Given it is a safety feature I do not know how the other kernels achieve
the same out of box.

Post by William Hubbs
The reason it exists is very vague to me; I think it has something to do
with claims of data loss in the past.

I think any fuse-supporting system should have it for more or less
obvious reasons (see the evil example above).

lu

Richard Yao

2016-02-17 14:00:20 UTC

Post by William Hubbs
All,
I have a bug that points out a significant issue with
/etc/init.d/mount-ro in OpenRC.
Apparently, there are issues that cause it to not work properly for file
systems which happen to be pre-mounted from an initramfs [1].
This service only exists in the Linux world; there is no equivalent in
OpenRC for any other operating systems we support.
The reason it exists is very vague to me; I think it has something to do
with claims of data loss in the past.
I'm asking for more specific information, and if there is none, due to
the bug I lincluded in this message, I am considering removing this
service in 0.21 since I can't find an equivalent anywhere else.

If you shutdown the system while ext4 or XFS is mounted rw, then there is no guarantee that dirty data writeout finishes and you can get situations where partial writes to disks from power being cut at the end of the shutdown process kills the filesystem.

That said, ZFS does not need to be remounted rw because it is an atomic filesystem. There might be data loss, but it would only be the last 5 seconds of unsynced data (which is safe to use) and there would be no risk of killing it.

Post by William Hubbs
Thanks,
William
[1] https://bugs.gentoo.org/show_bug.cgi?id=573760

Robin H. Johnson

2016-02-18 07:02:33 UTC

Post by William Hubbs
I have a bug that points out a significant issue with
/etc/init.d/mount-ro in OpenRC.
Apparently, there are issues that cause it to not work properly for file
systems which happen to be pre-mounted from an initramfs [1].

I'll look at why it's failing to change the mount to read-only, rather
than just ignore it.

I have a use case for mount-ro not previously discussed here:
kexec

--
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
E-Mail : ***@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85

25 Replies
96 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

William Hubbs 2016-02-16 18:05:33 UTC

Rich Freeman 2016-02-16 18:22:13 UTC

William Hubbs 2016-02-16 18:41:29 UTC

Duncan 2016-02-17 02:20:04 UTC

Rich Freeman 2016-02-17 13:46:34 UTC

Duncan 2016-02-18 08:57:11 UTC

Raymond Jennings 2016-02-25 23:46:22 UTC

Richard Yao 2016-02-17 14:24:44 UTC

Rich Freeman 2016-02-17 17:19:52 UTC

James Le Cuirot 2016-02-17 17:30:09 UTC

Ian Stakenvicius 2016-02-17 18:06:54 UTC

Rich Freeman 2016-02-17 18:32:18 UTC

Richard Yao 2016-02-18 03:11:42 UTC

Daniel Campbell 2016-02-17 21:50:40 UTC

Richard Yao 2016-02-18 03:02:00 UTC

Richard Yao 2016-02-17 14:05:49 UTC

Patrick Lauer 2016-02-16 19:31:54 UTC

Rich Freeman 2016-02-16 20:18:46 UTC

Richard Yao 2016-02-17 14:06:21 UTC

Andrew Savchenko 2016-02-17 19:01:54 UTC

Rich Freeman 2016-02-17 19:26:45 UTC

Richard Yao 2016-02-18 03:26:32 UTC

Daniel Campbell 2016-02-16 20:03:53 UTC

Luca Barbato 2016-02-17 08:24:58 UTC

Richard Yao 2016-02-17 14:00:20 UTC

Robin H. Johnson 2016-02-18 07:02:33 UTC

about - legalese

Loading...