menu
announcement

Spectrum is now read-only. Learn more about the decision in our official announcement.

snorkel

Building and managing training datasets for machine learning

Channels
Team

Spam Tutorial: Baseline models perform better than model with snorkel labels

February 25, 2020 at 11:15pm

Spam Tutorial: Baseline models perform better than model with snorkel labels

February 25, 2020 at 11:15pm
I'm just getting started with Snorkel and working through the spam tutorial here:
If I set all the proper seeds, I can reproduce most of the results. However towards the end, when we compare a keras model trained on the dev set and a keras model trained from LabelModel labels, the dev set model performs better when I run the notebook locally (93.6% vs 90.4%).
Even looking at the notebook as is in github, the snorkel-based model has an accuracy of 90% and the dev-set model has an accuracy of 89.6%. This seems like an almost negligible performance improvement.
Am I missing something? Is there something about this data set that makes snorkel ineffective here? If this isn't an ideal use case for snorkel, why use this data set in the tutorial? What are the ideal use cases?

February 25, 2020 at 11:47pm
Hi thanks for pointing this out. To start: these are certainly not the original performance numbers we had when we put this tutorial together; something clearly regressed or broke in a commit at some point... we are looking into it, and thanks for pointing this out!
To be explicitly clear though on two (obvious) points: (A) like any approach, Snorkel is not unambiguously better in all settings; and (B) this tutorial was never meant to optimize for performance improvement using Snorkel (where in general (i) large models and (ii) large unlabeled datasets would be needed, neither of which are present in this dataset), but rather to serve as an introduction to the basics of Snorkel.
That being said... clearly the numbers shown currently would be a silly foot to put forward in our intro tutorial, and I share your confusion here :). We'll let you know when we chase down what's going on, and thank you for catching this for us!
like-fill
1
Thanks for the quick reply! Your points are well taken, I wasn't expecting magic out of a relatively small data set. I'm doing some experiments on my own with a larger data set, but it might be useful to have a disclaimer in the intro tutorial about performance and maybe point to the original snorkel paper for a comprehensive overview of snorkel's performance?
completely agreed- although we actually do think there is a discrete error here that's messing up performance. The gains in this tutorial were never insanely good, but they were there originally :). Either way agree with your points either way
FYI, looks like someone filled an issue on github related to this a few weeks ago: https://github.com/snorkel-team/snorkel/issues/1534
Edited
I’ve seen this same thing in my own stuff based on master, where majority voting beats the LabelModel, so I think maybe something did get messed up.

February 27, 2020 at 10:35pm
Yeah in general many things could be happening, but as far as the intro tutorial goes: there were a few little things, but the biggest smoking gun ended up being the Keras model we had, which was experiencing swings of up to 10 pts. in performance (which swaps the smaller deltas on this toy problem). This was due to several factors, including the tiny size of the training set (which is not at all the intended use case setting of Snorkel anyway, but again, was for simplicity...).
Anyway, we are about to post a streamlined version of the tutorial with two major changes. First, we put a properly configured, simple sklearn logreg model in, to avoid the above.
Second, we made a change that we had been contemplating for a while: we dropped the development and validation sets from the intro tutorial, to emphasize very clearly that you don't need these labeled data splits in order to use Snorkel. (However, if interested, they are still present in more 'advanced' tutorials!)
Anyway, at the end of the day, this again is a toy dataset that is (we hope) good for giving a quick basics walkthrough of Snorkel, but not optimal for showing off Snorkel's best performance. For that, can see some of the papers we and others have published :) Anyway, thank you so much for bringing this to our attention, and hopefully the above helps!
like-fill
2
(And for reference, the final end model gets 94.4% with Snorkel labels, and would get 87.2% with the dev set labels, though no longer in this tutorial)