menu
announcement

Spectrum is now read-only. Learn more about the decision in our official announcement.

snorkel

Building and managing training datasets for machine learning

Channels
Team

Understanding buckets key in error_analysis.py

November 15, 2019 at 9:49pm

Understanding buckets key in error_analysis.py

November 15, 2019 at 9:49pm
In error_analysis.py class documentation for get_label_buckets method, it's mentioned that "The returned buckets[(i, j)] is a NumPy array of data point indices with predicted label i and true label j." Don't you think that i and j are true labels and predicted labels, respectively, though?
When I'm listing the buckets indexes for error analysis, based on documentation, I see "-1" as true/gold label among my dev set samples while there's no abstain sample in my dev set. I just want to make sure if it's a typo or I'm misunderstanding something here.

November 23, 2019 at 10:58pm
Hi — thanks for raising this!
You're right, the docs are wrong — in this case, the i is the gold label and j is the predicted label.
More generally, the index should match the order that labels were passed in. So, buckets = get_label_buckets(Y_gold, Y_pred) implies that the first index will be gold, the second will be pred, etc.
I'll put up a patch to fix this!
like-fill
1
Done! This should be in latest master.
like-fill
1
Cool, thanks for the follow up
like-fill
1