Should all LFs with less than 0.5 accuracy in a binary classification be thrown out?December 4, 2019 at 4:39pm
I have a question. Sometimes I write LFs for a binary classification and they come out just about even, maybe slightly worse than random - 0.49, for example. While these don't work alone I wonder if they still might contribute to the model if Snorkel understands the relationships between the LFs? Is this the case or is it always better to toss worse than random LFs?
December 4, 2019 at 11:36pm
TLDR: better to drop the LFs that are worse than random given how the current implementation of the label model handles ties.
The current implementation of the label model chooses among multiple valid solutions for the accuracies of the LFs by picking the solution that trusts the user the most. Say we have an LF with an accuracy of 20% and our model finds multiple valid solutions. Since it picks the solution that trusts the LFs the most, it will assume 80% accuracy and result in assigning mostly incorrect labels.
There are also other ways of breaking ties as explained in Training Complex Models with Multi-Task Weak Supervision that can handle LFs that are worse than random under certain conditions.