menu

snorkel

Building and managing training datasets for machine learning

Channels
# All channels
view-forward
# announcements
view-forward
# api
view-forward
# applications
view-forward
# help
view-forward
# projects
view-forward
# tutorials
view-forward
Team
Posts
Members
Info
down-caret

Announcing Snorkel Flow!

We made an exciting announcement on snorkel.org today! We've reposted that message below: The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI application development platform based on the core ideas behind Snorkel—check it out at snorkel.ai! The…

thumbsup
5
message-simple
6

Does LabelModel accept large number of classes?

The LabelModel seems can't generate anything with large number of classes( n=12). is there any constraint with the number of classes in this algorithm? can't figure out why.

thumbsup
0
message-simple
0

scalability issue from labeling model

I have 150K records, 12 LFs for 12 classes. the labeling model run for forever and then crashed all the time (epoch only set as 100). for 8 classes, then there will be no problem. I am wondering if there is any way to solve the scalability issue? or any other reasons that caused…

thumbsup
0
message-simple
0

Can a data point receive more than one label by the Labeling Functions?

Hi, I am wondering if we have multiple labels, and the relation among the labels is non-exclusive. So LFs can give a data point different values (all true), and the model can produce all the correct labels? For the moment, I consider it's not possible because the LabelModel (a…

thumbsup
0
message-simple
2

Small to medium datasets

I wonder if Snorkel can be used on small datasets containing 11k rows. Will it be effective?

thumbsup
0
message-simple
1

Fresh installation of snorkel gives errors.

I just installed snorkel with the following python environment. 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] …

thumbsup
0
message-simple
2

Spark Tutorial ?

Hi, I want to use Snorkel with a PySpark dataframe, I came across the SparkLFApplier, not sure what other things I need to keep in mind, is there a tutorial which demonstrated the use of Snorkel with Spark, will be really helpful if someone can point me to some good resources…

thumbsup
0
message-simple
2

Strategies for finding the best n-grams for keyword functions?

I'm working like hell to find enough LFs for the GENERAL label (as in general purpose open source library) compared to API (as in API-specific open source library) for Amazon's Github repositories as my dataset, to create labels for a discriminative classifier for Amazon Github…

thumbsup
1
message-simple
2

how to handle unbalanced data?

I found my labeling results are pretty bad when I applied several label functions to unbalanced datasets. for example only 20+ out of thousands records can be invoked by one LF. Can anyone have the same issue? what's the best practice?

thumbsup
0
message-simple
10

Why doesn't snorkel use ground-truth data to train the generative model?

In Ratner et al. (2020) it says "This step uses no ground-truth data, learn- ing instead from the agreements and disagreements of the labeling functions". I just wonder what the rationale for this is. Wouldn't it help to to model the correlations between the weak supervision…

thumbsup
0
message-simple
2