Oct 22, 2024
Thanks Sarthak !
From request batching, I got almost the same response times as in single instance .... I went till batches of 8 .... Probably the tensors are very small in size and batching doesn't really increase ant GPU load in this case.....
Breaking GIL would be fun !!