- Deployment of ML apps is challenging due to large number of computations at training and real-time latency require-ments & limited resources for inference.
- Approximation can trade-off output quality for reduced execution time or same execution time with reduced resources i.e. can provide provide real-time latency for inference tasks and use less resources for intensive training tasks.
- Employed approximations to explore this accuracy and performance & resource requirements trade-off.
- Working on exploring possibilities to dynamically sacrifice accuracy in high load and limited resources scenarios.