Koitenshin
AI & ML interests
Recent Activity
Organizations
Image Dimension Test Results
btw - is the deduplication done after upload then, or xet somehow compares diffs vs 20+ PB of data prior to upload?
Great question - some of that is covered in a section of another blog post. TL;DR, it's handled before upload using a few heuristics. Also important to note that for performance reasons, we treat deduplication as an optimization, not the foundational goal of the system. That way we can make a best effort to provide global dedupe without significantly impacting the speed of file transfers.
Which makes it absolute garbage due to the 100 GB Limit. If you hit the limit during upload to your private repo, trying to reupload the file later will fail due to invalid shards.
An Improvement, But Q3 30b Still Has Very Little General Knowledge
Read the entire article, but I definitely think your 3x regularization image turned out the absolute best. There's so much fine detail in the image, nothing feels flat.