checker.go: added error message if 'selectedTrees' is empty
checker_test.go: modified test checking using test.OK() and test.Assert() to
make the checking more compact.
Hand down filtered tree IDs to CheckWithSnapshots which builds 'usedBlobs'.
repo.LookupBlob is used to derive packfiles from 'usedBlobs' and constructs
'snapPacks' and overwrites c.packs.
reworked the code using snapshotFilter.FindAll to find all snapshots and
restic.FindUsedBlobs to retrieve all used blobs.
range repo.LookupBlob (as before) to convert the blobs to their containing packfiles
and c.repo.List(ctx, restic.PackFile, ...) to retrieve the sizes of those packfiles.
Additional documentation and tests are still outstanding.
Added code for selecting multiple snapshots.
Added message how many packfiles and their cumulative size were selected.
In internal/checker/checker.go replaced the datablob / packfile selection from using walker.Walk
to restic.StreamTrees - parallelizing the packfile selection.
resolved conflict in cmd_check: allow check for snapshot filter
This ensures that the pack header is actually read completely.
Previously, for a truncated file it was possible to only read a part of
the header, as backend.Load(...) is not guaranteed to return as many
bytes as requested by the length parameter.
Due to the interface of streamPack, we cannot guarantee that operations
progress fast enough that the underlying connections remains open. This
introduces partial failures which massively complicate the error
handling.
Switch to a simpler approach that retrieves the pack in chunks of 32MB.
If a blob is larger than this limit, then it is downloaded separately.
To avoid multiple copies in memory, an auxiliary interface
`discardReader` is introduced that allows directly accessing the
downloaded byte slices, while still supporting the streaming used by the
`check` command.
To only stream the content of a pack file once, check used StreamPack
with a custom pack load function. This combination was always brittle
and complicates using StreamPack everywhere else. Now that StreamPack
internally uses PackBlobIterator use that primitive instead, which is a
much better fit for what the check command requires.
For now, the guide is only shown if the blob content does not match its
hash. The main intended usage is to handle data corruption errors when
using maximum compression in restic 0.16.0
The ioutil functions are deprecated since Go 1.17 and only wrap another
library function. Thus directly call the underlying function.
This commit only mechanically replaces the function calls.
Repositories with mixed packs are probably quite rare by now. When
loading data blobs from a mixed pack file, this will no longer trigger
caching that file. However, usually tree blobs are accessed first such
that this shouldn't make much of a difference.
The checker gets a simpler replacement.
Sending data through a channel at very high frequency is extremely
inefficient. Thus use simple callbacks instead of channels.
> name old time/op new time/op delta
> MasterIndexEach-16 6.68s ±24% 0.96s ± 2% -85.64% (p=0.008 n=5+5)
Use runtime.GOMAXPROCS(0) as worker count for CPU-bound tasks,
repo.Connections() for IO-bound task and a combination if a task can be
both. Streaming packs is treated as IO-bound as adding more worker
cannot provide a speedup.
Typical IO-bound tasks are download / uploading / deleting files.
Decoding / Encoding / Verifying are usually CPU-bound. Several tasks are
a combination of both, e.g. for combined download and decode functions.
In the latter case add both limits together. As the backends have their
own concurrency limits restic still won't download more than
repo.Connections() files in parallel, but the additional workers can
decode already downloaded data in parallel.
The short ids are not always unique. In addition, recovering from
damages is easier when having the full ids as that makes it easier to
access the corresponding files.