07 Nov 2022
We change our conversational style from informal to formal speech. We often do this without thinking when talking to our friends compared to addressing a judge. Computers now have this capability! I use textual style transfer in this post to convert informal text to formal text. To make this easy to use, we do it in a spreadsheet.
The first step is identifying an informal to formal text style model. Next, we deploy the model using Hugging Face Inference endpoints. Inference endpoints is a production-grade solution for model deployment.
Let’s incorporate the endpoint into Google Sheets custom function to make the model easy to use.
I added the code to Google Sheets through the Apps Script extension. Grab it here as a gist. Once that is saved, you can use the new function as a formula. Now, I can use one simple command if I want to do textual style transfer!
I created a Youtube 🎥 video for a more detailed walkthrough.
Go try this out with your favorite model! For another example, check out the positive style textual model in a Tik Tok video.
27 Oct 2022
Data scientists often do not have large amounts of labeled data. This issue is even graver when dealing with problems with tens or hundreds of classes. The reality is very few text classification problems get to the point where adding more labeled data isn’t improving performance.
SetFit offers a few-shot learning approach for text classification. The paper’s results show across many datasets, it’s possible to get better performance with less labeled data. This technique uses contrastive learning to build a larger dataset for fine-tuning a text classification model. This approach was new to me and was why I did a video explaining how contrastive learning helps with text classification.
I have created a Colab 📓 companion notebook at https://bit.ly/raj_setfit, and the Youtube 🎥 video that provides a detailed explanation. I walk through a simple churn example to give the intuition behind SetFit. The notebook trains the CR (customer review dataset) highlighted in the SetFit paper.
The SetFit github contains the code, and a great deep dive for text classification is found on Philipp’s blog. For those looking to productionize a SetFit model, Philipp has also documented how to create the Hugging Face endpoint for a SetFit model.
So grab your favorite text classification dataset and give it a try!
24 Sep 2022
Data scientists often overstate the certainty of their predictions. I have had engineers laugh at my point predictions and point out several types of errors in my model that create uncertainty. Prediction intervals are an excellent counterbalance for communicating the uncertainty of predictions.
Conformal inference offers a model agnostic technique for prediction intervals. It’s well known within statistics but not as well established in machine learning. This post focuses on a straightforward conformal inference technique, but there are more sophisticated techniques that provide more adaptable prediction intervals.
I have created a Colab 📓 companion notebook at https://bit.ly/raj_conf, and the Youtube 🎥 video that provides a detailed explanation. This explanation is a toy example to learn how conformal inference works. Typical applications will use a more sophisticated methodology along with implementations found within the resources below.
For python folks, a great package to start using conformal inference is MAPIE - Model Agnostic Prediction Interval Estimator. It works for tabular and time series problems.
Quick intro to conformal prediction using MAPIE in medium
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification, paper link
Awesome Conformal Prediction (lots of resources)
14 Aug 2022
This post covers 3 easy-to-use 📦 packages to get started. You can also check out the Colab 📓 companion notebook at https://bit.ly/raj_explain and the Youtube 🎥 video for a deeper treatment.
Explanations are useful for explaining predictions. In the case of text, they highlight how the text influenced the prediction. They are helpful for 🩺 diagnosing model issues, 👀 showing stakeholders understand how a model is working, and 🧑⚖️ meeting regulatory requirements. Here is an explanation 👇 using shap. For more on explanations, check out the explanations in machine learning video.
Let’s review 3 packages you can use to get explanations. All of these work with transformers, provide visualizations, and only require a few lines of code.
- SHAP is a well-known, well-regarded, and robust package for explanations. In working with text, SHAP typically defers to using a Partition Shap explainer. This method makes the shap computation tractable by using hierarchical clustering and Owens values. The image here shows the clustering for a simple phrase. If you want to learn more about Shapley values, I have a video on shapley values and a deep dive on Partition Shap explainer is here.
- Transformers Interpret uses Integrated Gradients from Captum to calculate the explanations. This approach is 🐇 quicker than shap! Check out this space to see a demo.
Ferret is built for benchmarking interpretability techniques and includes multiple explanation methodologies (including Partition Shap and Integrated Gradients). A spaces demo for ferret is here along with a paper that explains the various metrics incorporated in ferret.
You can see below how explanations can differ when using different explanation methods. A great reminder that explanations for text are complicated and need to be appropriately caveated.
Ready to dive in? 🟢
For a longer walkthrough of all the 📦 packages with code snippets, web-based demos, and links to documentation/papers, check out:
👉 Colab notebook: https://bit.ly/raj_explain
11 Aug 2022
Are you looking for better training data for your models? Let me tell you about dynamic adversarial data collection!
I had a large enterprise customer asking me to incorporate this workflow into a Hugging Face private hub demo. Here are some resources I found useful:
Chris Emezue put together a blog post: “How to train your model dynamically using adversarial data” and a real-life example using MNIST using Spaces.
If you want an academic paper that details this process, check out:
Analyzing Dynamic Adversarial Training Data in the Limit. By using this approach, this paper found models made 26% fewer errors on the expert-curated test set.
And if you prefer a video — check out my Tik Tok: