Abstract
Training datasets for AI models are large in size and comprise objects or files that are small in size. Latencies involved in sequentially downloading small individual objects make it infeasible to download the entire dataset in reasonable time. This disclosure describes techniques for fast, parallel download of large numbers of small files or objects. At a server, batches of small objects are listed and grouped together until their total size reaches a certain threshold. The small objects are combined into a temporary larger object, using multiple rounds of composition as necessary. The temporary larger object is downloaded by a client, which splits the larger object into its constituent small objects. A clean-up procedure removes the temporary larger object from server and client.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Deshpande, Mayur, "Fast Parallel Downloads of Small Files Using Concatenation", Technical Disclosure Commons, (March 05, 2024)
https://www.tdcommons.org/dpubs_series/6744