Keen AI open-sources Zap

We’re very happy to make a public announcement about our recent release of Zap, our in-house tool for rapid training and testing of computer vision models.

There’s never been a better time to build an AI pipeline, with a myriad of tools, libraries and frameworks to choose from. But the power of choice does not come for free. The sheer scope of the current ML and AI ecosystem can easily induce choice-paralysis. Taking a decision is made even harder by commercial products and services that charge for ease-of-use, compared to open-source alternatives that require the dev-ops know-how to set up for the rest of your team. 

So why another tool?

Pytorch Lightning plus MLFlow equals Zap

Zap is an extremely lightweight wrapper around Pytorch Lightning and MLFlow. Its aims are simple: 

  1. Alleviate the developer from writing boilerplate
  2. Reduce effort when it comes to recording results

Pytorch Lightning (we’ll just refer to it as Lightning from here on) is a fantastic framework that already removes a lot of the boilerplate from regular Pytorch. How many times have you written the same DataLoader and Dataset code? Or the same training and evaluation loops? Lightning removes a large chunk of the repeatable parts.

So why Zap? Lightning gives you the tools to build a house, not the house itself; Zap takes things further by bundling in some predefined models for classification, object detection and segmentation, implemented with Lightning. We’re talking about the leviathans of the computer vision world, the ResNets and UNets, as well as some new-comers like DETA. 

Zap also makes some of the goodies that Lightning offers as defaults. Mixed Precision Training and Early Stopping are already set up for you. Zap doesn’t prevent you from developing your own Lightning modules and customising the training parameters.

Lightning allows you to specify your model’s hyperparameters and other settings in a YAML configuration file. You no longer need to pollute your Git repository with menial changes to the batch size, train split or loss function; you just change the configuration file.

The second pain point we addressed was logging and sharing of results. Without a centralised depository, sharing data is messy and leads to errors and missing information. This is where the excellent, open-source MLFlow comes in. In a nutshell, MLFlow records the data of your experiment, making it easy to compare against other experiments, and helps you manage the lifecycle of the model. 

Example MLFlow dashboard

You just need to tell Zap where to store the data – we recommend a centralised database and bucket so that anyone in your team can interact with the results but a local database and folder will work.

Did we talk enough about boilerplate? It’s worth mentioning that Zap’s preconfigured models also come with preconfigured metrics. No more writing the same precision and recall code – results are calculated and saved automatically. 

Currently Zap is still in alpha, but we’re actively developing it and using it at Keen AI. We plan on adding more models, more metrics and eventually implementing DVC or an alternative method of tracking datasets. Speaking of which, we also want to add automatic dataset analysis that informs you on class distribution, image similarity and more.

If you’re curious and want to give Zap a go, visit our Github repo. Contributions welcome!

About Petar Gyurov

Petar is passionate about climate change and conservation, and wants to apply novel machine learning methods to revolutionise these fields.

More from

Petar Gyurov

Sign Up for More!

Sign up for news, info, and to stay ahead of AI industry trends.