Announcing Snorkel Flow!
We made an exciting announcement on snorkel.org today! We've reposted that message below: The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI application development platform based on the core ideas behind Snorkel—check it out at snorkel.ai! The…
scalability issue from labeling model
I have 150K records, 12 LFs for 12 classes. the labeling model run for forever and then crashed all the time (epoch only set as 100). for 8 classes, then there will be no problem. I am wondering if there is any way to solve the scalability issue? or any other reasons that caused…
Can a data point receive more than one label by the Labeling Functions?
Hi, I am wondering if we have multiple labels, and the relation among the labels is non-exclusive. So LFs can give a data point different values (all true), and the model can produce all the correct labels? For the moment, I consider it's not possible because the LabelModel (a…
Spark Tutorial ?
Hi, I want to use Snorkel with a PySpark dataframe, I came across the SparkLFApplier, not sure what other things I need to keep in mind, is there a tutorial which demonstrated the use of Snorkel with Spark, will be really helpful if someone can point me to some good resources…
Strategies for finding the best n-grams for keyword functions?
I'm working like hell to find enough LFs for the GENERAL label (as in general purpose open source library) compared to API (as in API-specific open source library) for Amazon's Github repositories as my dataset, to create labels for a discriminative classifier for Amazon Github…
Why doesn't snorkel use ground-truth data to train the generative model?
In Ratner et al. (2020) it says "This step uses no ground-truth data, learn- ing instead from the agreements and disagreements of the labeling functions". I just wonder what the rationale for this is. Wouldn't it help to to model the correlations between the weak supervision…