Weights & Biases, also known as WandB, is an MLOps tool for performance visualization and experimental tracking of machine learning models. It helps with automation, tracking, training, and improvement of ML models.
Weights & Biases is a cloud-based service that allows you to host your experiments in a single central repository and if you have a private infrastructure, Weights & Biases can also be deployed on it.
Weights & Biases provides:
- A central, user-friendly and interactive dashboard where you can view your experimentations and track their performance.
- Tracking every part of the model training process, visualizing models, and comparing experiments.
- Automated hyperparameter tuning with the use of Sweeps, which provides a sample of hyperparameter combinations to help with model performance and understanding.
- Collaborative reports for teams, where you can add visualizations, organize, explain and share your model performance, model versions, and progress.
- End-to-end artifact tracking of the machine learning pipeline, from data preparation to model deployment.
- Easy integration with frameworks like Tensorflow, Pytorch, Keras, Hugging Face, and more.
- Collaborative work in a team with multiple features for sharing, experimenting, etc.
These are all useful features that Weights & Biases provides, which makes it a good tool for research teams looking to discover, learn and gain insights into machine learning experiments.
However, when it comes to delivery, Weights & Biases isn’t always the best option. Here are some features Weights & Biases doesn’t currently provide:
- Notebook hosting: Deploying a machine learning model from a Jupyter Notebook to production is every data researcher’s dream because it allows for quick iterations and saves time.
- ML lifecycle management: Managing the complete lifecycle of a model is important during research, i.e from data sourcing to model deployment, because it allows them to correctly monitor, debug any issue during any stage of development.
- Production use-case: For production-based teams or projects, Weights & Biases is not a good option because of its lack of a production engine.
- Model deployment: An important part of research is testing and carrying out real-time inferences. This is why model deployment is needed right after building and evaluating models.
Here are some alternative tools you can try out:
- 6SageMaker Studio
Neptune is a metadata store for MLOps. It allows you to log, store, organize, display, compare, and query all your model-building metadata in a single place. This includes metadata such as model metrics and parameters, model checkpoints, images, videos, audio files, data versions, interactive visualizations, and more.
Neptune was built for research and production teams that run a lot of experiments, want to organize and reproduce them, and want to make sure that the process of moving models to production goes well. The main focus of Neptune features revolves around experiment tracking and model registry, as well as team collaboration.
- Neptune allows you to log and display model metadata in any structure you want. Whether it is a nested parameter structure for your models, different subfolders for training and validation metrics, or a separate space for packaged models or production artifacts. It’s up to you how you organize it.
- The pricing of Neptune is usage-based. It can be more cost-effective for all ML teams, but especially for those that ocasionally don’t run experiments at all, or those that have many stakeholders who are not deeply involved in the experimentation process.
- As mentioned before, Neptune was built for teams that run a lot of experiments, so it can handle thousands of runs and doesn’t slow down. It scales with your team and the size of your project.
- Neptune is available in the SaaS version, but it can also be hosted on-premises. If you choose the second option, the installation process is very easy.
- With Neptune, you can create custom dashboards to combine different metadata types in a preferred way.
Neptune can be used as a hosted solution or deployed on your premises.
The SaaS version is available in the following plans:
- Individual Plan: It’s free, with 200 monitoring hours and 100 GB of metadata storage.
- Team Plan: $49/month + usage above free quota (free for research teams)
- Scale: $499/month
When you deploy Neptune on your infrastructure or in your private cloud you can choose from the following plans:
- Team Plan: $499/month
- Enterprise Plans: $1499/month
Check out Neptune’s pricing to learn more.
Weights & Biases vs Neptune
- Both Neptune and Weights & Biases are hosted services and they provide experiment tracking, model management, and data versioning.
- Neptune puts more focus on the model registry features, while Weights & Biases provides also tools to automate hyperparameter optimization
- There are some differences in the out-of-the-box integrations that both tools can offer; Neptune supports R language, DVC, or Optuna, while WandB supports Spacy, Ray Tune or Kubeflow.
- In general, both tools are quite similar and are equally great solutions, so the main difference can be noticed in the pricing structure (usage-based vs. user-based).
🔎 Check an in-depth comparison between Neptune and WandB
TensorBoard, developed by the TensorFlow team, is an open-source visualization tool for machine learning experiments.
It is used for tracking ML metrics such as loss and accuracy, visualizing the model graph, histograms, projecting embeddings to a lower-dimensional space, and much more.
With TensorBoard you can also share the result of your experiment.
- TensorBoard allows you to track your experiments.
- It also allows you to track experiments that are not based on TensorFlow.
- TensorBoard allows you to share your experiment results via a shareable link with anyone, for publications, collaboration, etc.
- It provides tracking and visualizing of metrics such as loss and accuracy.
- TensorBoard has the What-If Tool (WIT), an easy-to-use interface for explainability and understanding of black-box classification and regression ML models.
- TensorBoard has a strong and big community of users that provides great support.
It is free to use.
Weights & Biases vs TensorBoard
- If you need a tool for personal use, don’t plan to pay for it, and don’t require extra features, TensorBoard can be a good option.
- TensorBoard is better suited for visualizations dealing with TensorFlow.
You might have missed
🔎 In-depth comparison between Neptune and TensorBoard
Comet is a cloud-based machine learning platform where developers can track, compare, analyze and optimize experiments.
Comet is quick to install, with just a few lines of code you can start tracking your ML experiments without any library.
- Comet allows you to create custom visualizations for your experiments and data. You can also use community-provided ones on panels.
- Comet provides real-time stats and graphs about your experiments.
- You can compare your experiments easily including code, metrics, predictions, insights, and more.
- With comet, you can debug model errors, environment-specific errors, etc.
- Comet also allows you to monitor your models and notifies you when issues or bugs occur.
- It allows for collaboration within teams and business stakeholders.
- It can easily integrate with Tensorflow, Pytorch, etc.
Comet offers the following pricing plans:
Weights & Biases vs Comet
- Both tools offer user management features, hosted and on-premise setup, model management, hyperparameter search, and artifact store.
- If you need a gallery with custom visualizations, Comet has one.
- Comet offers a Java and R SDK for development, which is missing in Weights & Biases.
🔎 In-depth comparison between Neptune and Comet
MLflow is an open-source platform that helps manage the whole machine learning lifecycle. It helps with experimental tracking, reproducibility, deployment, and gives a central model registry.
MLflow comprises four main functions:
- MLflow Tracking: an API and UI for logging parameters, code versions, metrics, and artifacts when running machine learning code and for later visualizing and comparing the results.
- MLflow Projects: packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production.
- MLflow Models: managing and deploying models from different ML libraries to a variety of model serving and inference platforms.
- MLflow Model Registry: a central model store to collaboratively manage the full lifecycle of an MLflow Model, including model versioning, stage transitions, and annotations.
- MLflow Model Registry provides a suite of APIs and intuitive UI for organizations to register and share new versions of models as well as perform lifecycle management on their existing models.
- MLflow Model Registry works with the MLflow tracking component, which allows you to trace back the original run where the model and data artifacts were generated from, as well as the source code version for that run, giving a complete lineage of the lifecycle for all models and data transformation.
- Automatically versions the data you store in the data lake as it is stored into the Delta table or directory.
- Allows you to get every version of your data using a version number or a timestamp.
- Allows you to audit and/or roll back data in case of accidental bad writes or deletes.
- Reproduce experiments and reports.
To learn more about MLflow, check out the MLflow docs.
It is free.
Weights & Biases vs MLflow
- If you’re working on a low budget, MLflow is a better option because it is free (open-source) for experimental tracking.
- MLflow is language-agnostic i.e it can be used with any machine learning library in Python or R. While Weights & Biases only works for Python scripts.
- Weights & Biases offers both hosted and on-premises setup, while MLflow is only available as an open-source solution that requires you to maintain it on your server.
- MLflow offers end-to-end ML lifecycle management, while Weights & Biases only offers features like experiment tracking, model management, and data versioning.
🔎 In-depth comparison between Neptune and MLflow
Kubeflow is a free, open-source machine learning platform for building simple, portable (via containers) and scalable models on Kubernetes. Kubeflow does tracking, data versioning, model versioning, and model deployment.
Kubeflow was designed by Google, for data scientists and ML engineers that prefer to develop, test, and deploy ML pipelines, models, and systems to various environments.
Kubeflow consist of the following components:
- Central Dashboard: This dashboard provides a central view and quick access to all your operations. It houses the jobs and components running in your cluster such as Pipelines, Katib, Notebooks, etc.
- Kubeflow Pipelines: The Kubeflow pipeline is a platform that allows ml engineers to build and deploy end-to-end ML workflows packaged in a Docker image. It consists of a UI for tracking experiments and jobs, an SDK for pipelines operations, a multi-step scheduling engine, and notebooks for building ML models.
- KFServing: The KFServing component is a model deployment and serving toolkit for Kubeflow. It does production model serving by enabling serverless inferencing on Kubernetes and providing an abstraction layer for deployment on frameworks like TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX.
- Katib: Katib is a model agnostic, Kubernetes-native project that provides hyperparameter tuning, early stopping, and neural architecture search for AutoML models. It supports various AutoML algorithms and frameworks such as TensorFlow, MXNet, PyTorch, and others.
- Training Operator: This component provides operators for Tensorflow, PyTorch, MXNet, XGBoost, and MPI model training jobs in Kubernetes.
- Kubeflow Notebooks: This Kubeflow’s Notebook component allows you to run your notebook inside the cluster. You can also create your notebooks in the cluster and share them across your organization.
- Kubeflow also stores artifacts data in its artifact store; it uses the artifact to understand how the pipelines of various Kubeflow components work.
- Kubeflow Pipeline can output a simple textual view of the artifact’s data and rich interactive visualizations.
- Kubeflow has a user interface (UI) for managing and tracking experiments, jobs, and runs.
- It provides scheduling for multi-step ML workflows.
- It has an SDK for defining and manipulating pipelines and components.
- Notebooks for interacting with the system using the SDK.
It is free.
Weights & Biases vs Kubeflow
- Using Kubeflow pipeline or KF Serving components in Kubeflow, you can deploy machine learning models on docker, something that is missing on Weights & Biases.
- Kubeflow provides end-to-end machine learning orchestration and management, Weights & Biases doesn’t.
- Kubeflow offers experimental tracking and metadata tracking for all model artifacts.
- For use cases where interactive visualization is not necessary, Kubeflow is a better choice.
🔎 See an in-depth comparison between Neptune and Kubeflow
Amazon SageMaker Studio is a web-based integrated development environment (IDE) for building, training, visualizing, debugging, deploying, and monitoring your ML models. You can write code, track experiments, visualize data, and perform debugging and monitoring within a single, integrated visual interface.
- It provides a model artifact store that stores the s3 bucket location of the model that contains information on the model type and content.
- SageMaker studio also stores artifacts for AutoML experimentations.
- It allows you to easily create and share Jupyter notebooks.
- It provides and manages the hardware infrastructure of your model’s environment so that you can quickly switch from one hardware configuration to another.
- SageMaker Studio supports frameworks such as Tensorflow, PyTorch, MXNet, etc.
- SageMaker Studio has over 150 pre-packaged open-source models for various use cases.
- SageMaker Studio offers end-to-end data preparation. It allows you to run Spark jobs using the language of your choice (SQL, Python, and Scala) and you can also connect to Apache Spark data processing environments running on Amazon EMR with ease.
With Amazon SageMaker you only pay for what you use. It offers two payment choices:
- On-demand pricing that is billed by the second, with no minimum fees and no upfront commitments
- The SageMaker Savings Plans offer a flexible, usage-based pricing model in exchange for a commitment to a consistent amount of usage.
You can use the AWS pricing calculator to plan your billing.
Weights & Biases vs Amazon SageMaker Studio
- SageMaker Studio has an easy setup, unlike Weights & Biases that requires some level of expertise since it is a hosted and on-premises service.
- SageMaker studio provides experimental logs and visualizations during experimental tracking.
- In SageMaker studio you can set up a leaderboard that automatically tracks all your experiments and then ranks their performance.
- Compared to Weights & Biases, SageMaker studio rents out computational resources for a relatively low price.
- SageMaker Studio allows you to interactively query, explore, and visualize data, Apart from experimental tracking, SageMaker studio also provides data annotation, heavy data handling, debugging, and model and data drift detection.
Weights & Biases is a great tool for ML research teams focusing on research because it is good at performing experimental tracking but that alone isn’t enough. The alternative tools listed in this article have some unique value propositions that make them fit into use-cases where Weights & Biases might not be needed.
For open-source experimental tracking tools, TensorBoard, MLflow, Kubeflow would be good alternatives. In terms of visualizations and scalable storage for your metadata and artifacts, paid tools such as Neptune and Comet are better options. Additionally, they also provide high-level security and support for enterprise teams.
So, depending on your requirement, you may choose any of the aforementioned tools.
Simultaneously, I will also advise you to be always on the lookout for more tools that suit your needs, tasks, and give you enough flexibility to get the most out of your work.
Hypefactors Case Study: Metadata Management for Teams that Don’t Run Experiments All the Time
7 mins read | Updated October 25th, 2021
Hypefactors is a technology company that works in the media intelligence and reputation tracking domain. They provide an ML-based Public Relations (PR) automation platform, including all the tools to power the PR workflow and visualize the results.
We spoke to the CTO of Hypefactors, Viet Yen Nguyen who leads the technical teams and is responsible for the technical side of the company. He explains this business as:
“Apple Inc. is worth approximately 2 trillion but if you go to their warehouses, stores, and sum up everything they own you don’t get to 2 trillion. You may get to maybe 60 billion worth of physical goods.
So where is the 1.94 trillion of value that is reflected in the price of the stock? It is the future prospects, the brand potential, the reputation, the intangible things.
We help teams actually measure and track that.”
What is the project about?
The data pipelines at Hypefactors monitor the whole media landscape ranging from their social media pipelines to print media, television, radio, etc, to analyze changes in their customers’ brand reputation. This feat is achieved in two phases:
- 1Getting the data from everywhere
- 2Enriching the data with ML-based features
In order to analyze every form of data ranging from image, text, and tabular, they work on a variety of ML problems:
- NLP classification
- computer vision (e.g. segmentation)
- regression for business metrics
As they train and improve many enrichment models using different ML techniques, this naturally involves running many experiments and articulating ways to store the metadata generated by those experiments.
With competent engineers in the data and AI teams, Viet’s team was able to achieve good results with all the components in the pipeline except for Experiment Tracking. Let’s peek into the problems they faced during the process and what measures they took to solve them.