Discussion:
[gentoo-dev] [pre-GLEP] Gentoo binary package container format [gentoo@jonesmz.com]
Roy Bamford
2018-11-18 21:10:03 UTC
Permalink
See attached.

Replying off list because I am not on the whitelist ...
--
Regards,

Roy Bamford
(Neddyseagoon) a member of
elections
gentoo-ops
forum-mods
Rich Freeman
2018-11-18 21:55:19 UTC
Permalink
Replying off list because I am not on the whitelist.
That seems odd.
1) append a uuid to each filename. Generated when the bin package file is generated.
2) encode the hostname of the machine that generated the file
3) encode the use flags in the filename.
So, I brought up this same issue in the earlier discussion and it was
considered out of scope, and I think this is fair. The GLEP does not
specify filename, and IMO the standard for what goes INSIDE the file
will work just fine with any future enhancements that address exactly
this use case.

Besides your case of building for a cluster, another use case is
having a central binary repo that portage could check and utilize when
a user's preferences happen to match what is pre-built.

I suggest we start a different thread for any additional discussion of
this use case. I was thinking and it probably wouldn't be super-hard
to actually start building something like this. But, I don't want to
derail this GLEP as I don't see any reason designing something like
this needs to hold up the binary package format. Both the existing
and proposed binary package formats will encode any metadata needed by
the package manager inside the file, and the only extension we need is
to encode identifying info in the filename.

My idea is to basically have portage generate a tag with all the info
needed to identify the "right" package, take a hash of it, and then
stick that in the filename. Then when portage is looking for a binary
package to use at install time it generates the same tag using the
same algorithm and looks for a matching hash. If a hit is found then
it reads the complete metadata in the file and applies all the sanity
checks it already does. Generating of binary packages with the hash
cold be made optional, and portage could also be configured to first
look for the matching hash, then fall back to the existing naming
convention, so that it would be compatible with existing generic
names. So, users would get a choice as to whether they want to build
up a library of these packages, or just have each build overwrite the
last.

Then the next step would be to allow these files to be fetched from a
binary repo optionally, and then finally we'd need tools to create the
repo. But, this step isn't needed for your use case. With the proper
optional switches you could utilize as much of this scheme as you
like.

Also, you could optionally choose how much you want portage to encode
in the tag and look for. Are you very fussy and only want a binary
package with matching CFLAGS/USE/whatever? Or is just matching
USE/arch/etc enough? Some of the existing portage options could
potentially be re-used here.

Please make any replies in a new thread.
--
Rich
Zac Medico
2018-11-18 22:40:59 UTC
Permalink
Post by Rich Freeman
Replying off list because I am not on the whitelist.
That seems odd.
1) append a uuid to each filename. Generated when the bin package file is generated.
2) encode the hostname of the machine that generated the file
3) encode the use flags in the filename.
So, I brought up this same issue in the earlier discussion and it was
considered out of scope, and I think this is fair. The GLEP does not
specify filename, and IMO the standard for what goes INSIDE the file
will work just fine with any future enhancements that address exactly
this use case.
Besides your case of building for a cluster, another use case is
having a central binary repo that portage could check and utilize when
a user's preferences happen to match what is pre-built.
I suggest we start a different thread for any additional discussion of
this use case. I was thinking and it probably wouldn't be super-hard
to actually start building something like this. But, I don't want to
derail this GLEP as I don't see any reason designing something like
this needs to hold up the binary package format. Both the existing
and proposed binary package formats will encode any metadata needed by
the package manager inside the file, and the only extension we need is
to encode identifying info in the filename.
My idea is to basically have portage generate a tag with all the info
needed to identify the "right" package, take a hash of it, and then
stick that in the filename. Then when portage is looking for a binary
package to use at install time it generates the same tag using the
same algorithm and looks for a matching hash. If a hit is found then
it reads the complete metadata in the file and applies all the sanity
checks it already does. Generating of binary packages with the hash
cold be made optional, and portage could also be configured to first
look for the matching hash, then fall back to the existing naming
convention, so that it would be compatible with existing generic
names. So, users would get a choice as to whether they want to build
up a library of these packages, or just have each build overwrite the
last.
Then the next step would be to allow these files to be fetched from a
binary repo optionally, and then finally we'd need tools to create the
repo. But, this step isn't needed for your use case. With the proper
optional switches you could utilize as much of this scheme as you
like.
Also, you could optionally choose how much you want portage to encode
in the tag and look for. Are you very fussy and only want a binary
package with matching CFLAGS/USE/whatever? Or is just matching
USE/arch/etc enough? Some of the existing portage options could
potentially be re-used here.
We've already had this handled for a couple years now, via
FEATURES=binpkg-multi-instance.
--
Thanks,
Zac
Rich Freeman
2018-11-19 02:51:53 UTC
Permalink
Post by Zac Medico
Post by Rich Freeman
My idea is to basically have portage generate a tag with all the info
needed to identify the "right" package, take a hash of it, and then
stick that in the filename. Then when portage is looking for a binary
package to use at install time it generates the same tag using the
same algorithm and looks for a matching hash.
We've already had this handled for a couple years now, via
FEATURES=binpkg-multi-instance.
According to the make.conf manpage this simply numbers builds. So, if
you build something twice with the same config you end up with two
duplicate files (wasteful). Presumably if you had a large collection
of these packages portage would have to read the metadata within each
one to figure out which one is appropriate to install. That would be
expensive if IO is slow, such as when fetching packages online
on-demand.

But, it obviously is somewhat of an improvement for Roy's use case.

IMO using a content-hash of certain metadata would eliminate
duplication, and based on filename alone it would be clear whether the
sought-after binary package exists or not. As with the build numbers
you couldn't tell from filename inspection what packages you have, but
if you know what you want you could immediately find it. IMO trying
to cram all that metadata into a filename to make them more
transparent isn't a good idea, and using hashes lets the user set
their own policy regarding flexibility. Heck, you could auto-gen
symlinks for subsets of metadata (ie, the same file could be linked
from a file that specifies its USE flags but not its CFLAGS, so it
would be found if either an exact hit on CFLAGS was sought or if
CFLAGS were considered unimportant).

But, I'm certainly not suggesting that you're not allowed to go to bed
until you've built it. :)
--
Rich
Zac Medico
2018-11-19 18:45:19 UTC
Permalink
Post by Rich Freeman
Post by Zac Medico
Post by Rich Freeman
My idea is to basically have portage generate a tag with all the info
needed to identify the "right" package, take a hash of it, and then
stick that in the filename. Then when portage is looking for a binary
package to use at install time it generates the same tag using the
same algorithm and looks for a matching hash.
We've already had this handled for a couple years now, via
FEATURES=binpkg-multi-instance.
According to the make.conf manpage this simply numbers builds. So, if
you build something twice with the same config you end up with two
duplicate files (wasteful). Presumably if you had a large collection
of these packages portage would have to read the metadata within each
one to figure out which one is appropriate to install. That would be
expensive if IO is slow, such as when fetching packages online
on-demand.
But, it obviously is somewhat of an improvement for Roy's use case.
IMO using a content-hash of certain metadata would eliminate
duplication, and based on filename alone it would be clear whether the
sought-after binary package exists or not. As with the build numbers
you couldn't tell from filename inspection what packages you have, but
if you know what you want you could immediately find it. IMO trying
to cram all that metadata into a filename to make them more
transparent isn't a good idea, and using hashes lets the user set
their own policy regarding flexibility. Heck, you could auto-gen
symlinks for subsets of metadata (ie, the same file could be linked
from a file that specifies its USE flags but not its CFLAGS, so it
would be found if either an exact hit on CFLAGS was sought or if
CFLAGS were considered unimportant).
But, I'm certainly not suggesting that you're not allowed to go to bed
until you've built it. :)
The existing ${PKGDIR}/Packages file optimizes metadata access for both
local an remote access, and performs well for reasonable numbers of
packages.

If you insist on mixing binary packages in the same ${PKGDIR} for a
large number of alternative configurations, then it will not scale
unless you create a way to send your local configuration to the server
so that it can select the relevant package list for you.

However, bear in mind that mixing alternative configurations in the same
${PKGDIR} might lead to undesirable results if there is anything
relevant that is unaccounted for in the package metadata. Possible
unaccounted things may include:

1) glibc version the package was built against
2) symbols and/or sonames not accounted for by slot operator dependencies
3) soname dependencies (--usepkgonly + --ignore-soname-deps=n handles this)
--
Thanks,
Zac
M. J. Everitt
2018-11-19 10:45:57 UTC
Permalink
Post by Zac Medico
Post by Rich Freeman
Replying off list because I am not on the whitelist.
That seems odd.
1) append a uuid to each filename. Generated when the bin package file is generated.
2) encode the hostname of the machine that generated the file
3) encode the use flags in the filename.
So, I brought up this same issue in the earlier discussion and it was
considered out of scope, and I think this is fair. The GLEP does not
specify filename, and IMO the standard for what goes INSIDE the file
will work just fine with any future enhancements that address exactly
this use case.
Besides your case of building for a cluster, another use case is
having a central binary repo that portage could check and utilize when
a user's preferences happen to match what is pre-built.
I suggest we start a different thread for any additional discussion of
this use case. I was thinking and it probably wouldn't be super-hard
to actually start building something like this. But, I don't want to
derail this GLEP as I don't see any reason designing something like
this needs to hold up the binary package format. Both the existing
and proposed binary package formats will encode any metadata needed by
the package manager inside the file, and the only extension we need is
to encode identifying info in the filename.
My idea is to basically have portage generate a tag with all the info
needed to identify the "right" package, take a hash of it, and then
stick that in the filename. Then when portage is looking for a binary
package to use at install time it generates the same tag using the
same algorithm and looks for a matching hash. If a hit is found then
it reads the complete metadata in the file and applies all the sanity
checks it already does. Generating of binary packages with the hash
cold be made optional, and portage could also be configured to first
look for the matching hash, then fall back to the existing naming
convention, so that it would be compatible with existing generic
names. So, users would get a choice as to whether they want to build
up a library of these packages, or just have each build overwrite the
last.
Then the next step would be to allow these files to be fetched from a
binary repo optionally, and then finally we'd need tools to create the
repo. But, this step isn't needed for your use case. With the proper
optional switches you could utilize as much of this scheme as you
like.
Also, you could optionally choose how much you want portage to encode
in the tag and look for. Are you very fussy and only want a binary
package with matching CFLAGS/USE/whatever? Or is just matching
USE/arch/etc enough? Some of the existing portage options could
potentially be re-used here.
We've already had this handled for a couple years now, via
FEATURES=binpkg-multi-instance.
Working fine for me for catalyst ARM runs ...
Loading...