We have long been working on testing desktop wallets but it's really tricky as there is just so many binaries floating around for what claims to be the same product. Even Bitcoin Core is showing 8 download options depending on your operating system or distribution channel preference:
With snapcraft obviously being tricky:
Either way, for desktop wallets, most of the time people have download links and want to verify those downloads, so Chris is working on a binary checker. It's still only a draft merge request and clearly needs a design but what it will enable is actually pretty cool:
WalletScrutiny calculates the hash of the file dropped onto it and if it's an apk, it also determines the appId which allows finding the right product. If the hash is known, the verdict is immediately displayed. If not, the page invites the user to upload the file for analysis.
The attestations for artifacts will live on nostr as signed events and nostr will also be used to advertise the existance of new binaries for reviewers.
With snapcraft obviously being tricky:
Either way, for desktop wallets, most of the time people have download links and want to verify those downloads, so Chris is working on a binary checker. It's still only a draft merge request and clearly needs a design but what it will enable is actually pretty cool:
WalletScrutiny calculates the hash of the file dropped onto it and if it's an apk, it also determines the appId which allows finding the right product. If the hash is known, the verdict is immediately displayed. If not, the page invites the user to upload the file for analysis.
The attestations for artifacts will live on nostr as signed events and nostr will also be used to advertise the existance of new binaries for reviewers.
But running these tests must have been great fun, right 
Most people don't dig that deeply but when they do, they have this question. Computers are 1s and 0s. They are digital. How can they be non-deterministic??
Software development mostly revolves around performance both in the end product and the development process. Only very few developers even spend a thought on reproducibility. So if they compile something and it compiles 5 seconds faster, they can test the feature they were working on 5 seconds quicker. These two reasons result in stuff being non-reproducible as:
Files are processed in the order they come and that order depends on many factors. For example some file systems sort by date and others by file name.
Compilers can optimize the result, so compiling something with one version of the compiler will often give a different result than when compiled with another version.
The compiler might process multiple files in parallel and pack them into the result as they finish compiling.
Other sources of problems are timestamps or file paths that end up in the result.
Some tools on purpose use randomnes to generate IDs that are unique to every build.
Of the above issues, all result in non-reproducibility by our standards. While some lead us to comment on the build looking benign as the diff is only some random number appearing twice, others might also be benign but result in differences far too big to quickly judge with the tools we are using.
The more developers care about reproducibility over only performance, the better it will get but there are some widely used tools that consistently cause issues and maybe should just be avoided in wallets.