Building EdgeMind: Experimenting with LLM Inference on Android
Edge MLOps experiments: implementing Phi-3 mini on Android from scratch with NNAPI acceleration, custom tokenization, and KV caching
Thoughts on AI, mobile development, and building things.
Edge MLOps experiments: implementing Phi-3 mini on Android from scratch with NNAPI acceleration, custom tokenization, and KV caching
Deep dive into the architecture decisions and implementation details of CountIn, a real-time occupancy tracking app