Understanding buckets key in error_analysis.py
November 15, 2019 at 9:49pmUnderstanding buckets key in error_analysis.py
November 15, 2019 at 9:49pmIn error_analysis.py class documentation for get_label_buckets method, it's mentioned that "The returned
buckets[(i, j)]
is a NumPy array of data point indices with predicted label i and true label j." Don't you think that i and j are true labels and predicted labels, respectively, though?When I'm listing the buckets indexes for error analysis, based on documentation, I see "-1" as true/gold label among my dev set samples while there's no abstain sample in my dev set. I just want to make sure if it's a typo or I'm misunderstanding something here.
November 23, 2019 at 10:58pm
Hi — thanks for raising this!
You're right, the docs are wrong — in this case, the
i
is the gold label and j
is the predicted label.More generally, the index should match the order that labels were passed in.
So,
buckets = get_label_buckets(Y_gold, Y_pred)
implies that the first index will be gold, the second will be pred, etc.I'll put up a patch to fix this!
Done! This should be in latest master.