Modern computers have many CPU cores, but unless the problem to be solved is highly parallel, these CPU cores cannot be used efficiently: some cores are often left sitting idle while the others perform the bulk of the computation. Even for highly parallel problems, for small procedures, the overhead of requesting a thread to receive the processing request and begin execution is comparable to the time of execution thus resulting in little to no benefits from parallel execution. This disclosure describes techniques to improve the efficiency of communications between threads by reducing overhead. Per the techniques, the runnable threads of a process or a defined subset thereof are scheduled to run simultaneously on the CPU together such that the threads either all run or none of the thread runs.

