Abstract

Training datasets for AI models are large in size and comprise objects or files that are small in size. Latencies involved in sequentially downloading small individual objects make it infeasible to download the entire dataset in reasonable time. This disclosure describes techniques for fast, parallel download of large numbers of small files or objects. At a server, batches of small objects are listed and grouped together until their total size reaches a certain threshold. The small objects are combined into a temporary larger object, using multiple rounds of composition as necessary. The temporary larger object is downloaded by a client, which splits the larger object into its constituent small objects. A clean-up procedure removes the temporary larger object from server and client.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Deshpande, Mayur, "Fast Parallel Downloads of Small Files Using Concatenation", Technical Disclosure Commons, (March 05, 2024)
https://www.tdcommons.org/dpubs_series/6744

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Fast Parallel Downloads of Small Files Using Concatenation

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Fast Parallel Downloads of Small Files Using Concatenation

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information