This turned out to be a bit trickier to implement than first anticipated, due to the fact that we can run out of internal resources in certain of certain drivers (n10) if the same buffer is submitted repeatedly. We have to shift how we wait for fences a bit, but it seems like too much of a refactoring to undertake for these performance scenarios right before rtm.
This turned out to be a bit trickier to implement than first anticipated, due to the fact that we can run out of internal resources in certain of certain drivers (n10) if the same buffer is submitted repeatedly. We have to shift how we wait for fences a bit, but it seems like too much of a refactoring to undertake for these performance scenarios right before rtm.