Understanding buckets key in error_analysis.pyNovember 15, 2019 at 9:49pm
In error_analysis.py class documentation for get_label_buckets method, it's mentioned that "The returned
buckets[(i, j)]is a NumPy array of data point indices with predicted label i and true label j." Don't you think that i and j are true labels and predicted labels, respectively, though?
When I'm listing the buckets indexes for error analysis, based on documentation, I see "-1" as true/gold label among my dev set samples while there's no abstain sample in my dev set. I just want to make sure if it's a typo or I'm misunderstanding something here.
November 23, 2019 at 10:58pm
Hi — thanks for raising this!
You're right, the docs are wrong — in this case, the
iis the gold label and
jis the predicted label.
More generally, the index should match the order that labels were passed in. So,
buckets = get_label_buckets(Y_gold, Y_pred)implies that the first index will be gold, the second will be pred, etc.
I'll put up a patch to fix this!
Done! This should be in latest master.