Announcing Snorkel v0.9!
We’re excited to announce the release of Snorkel v0.9 today! Snorkel v0.9 integrates our recent research advances and Snorkel-based open source projects into one modern Python library for building and managing training datasets. Alongside the release is a new homepage at…
weighting labeling functions
Is is possible to assign weights for labeling functions. In a scenario wherein we have L1 to L10 labeling functions generating noisy results. If L1 to L9 are labeling functions written by a novice domain SME and L10 is actually written by an experienced domain expert, can we use…
Understanding buckets key in error_analysis.py
In error_analysis.py class documentation for get_label_buckets method, it's mentioned that "The returned buckets[(i, j)] is a NumPy array of data point indices with predicted label i and true label j." Don't you think that i and j are true labels and predicted labels,…
Where can I find the code for older papers?
The LabelModel in the latest version of Snorkel code is based on the new AAAI 2019 paper. I wanted to ask where I can find the code for the following papers: 1. Snorkel: Rapid Training Data Creation with Weak Supervision 2. Learning the structure of Generative Models without…
How to use the `PandasParallelApplier`
Is there any trick to using the PandasParallelApplier? The results from the normal PandasLFApplier (commented out above) results are great. However, the results from the PandasParallelLFApplier seem a bit shuffled.... Am I doing it right? Are the rows in L returned from apply in…
Snorkel w/ Dask using Large KBs
I'm using the Snorkel Dask interface w/ a Local Cluster. My heuristic labelers run very fast (parallelism), however, when I add my labeler which uses a large (~50MB) knowledge base (dict) the job doesn't even seem to start and just sits w/ 1 CPU at 100%. I suspect it's trying to…