menu

snorkel

Building and managing training datasets for machine learning

Channels
# All channels
view-forward
# announcements
view-forward
# api
view-forward
# applications
view-forward
# help
view-forward
# projects
view-forward
# tutorials
view-forward
Team
Posts
Members
Info
down-caret

Announcing Snorkel v0.9!

We’re excited to announce the release of Snorkel v0.9 today! Snorkel v0.9 integrates our recent research advances and Snorkel-based open source projects into one modern Python library for building and managing training datasets. Alongside the release is a new homepage at…

thumbsup
11
message-simple
0

non-python LF's

I'm interested in using snorkel with a label-matrix of non-python functions. Is something that the snorkel team has heard about, and if so, would you mind pointing me in the right direction?

thumbsup
1
message-simple
1

dependency modeling between LFs

I am interested in using Snorkel for a sentence classification task and I would like to understand better how the generative model handles correlated LFs. The documentation for the Label Model at…

thumbsup
0
message-simple
5

Has anyone tried doing NER with Snorkel? If so, can you provide an example?

thumbsup
2
message-simple
1

Hi - newbie here (I'm a software engineer @Google).

I'm having trouble following the Get Started tutorial. Specifically, after 'pip install snorkel', "from utils import load_unlabeled_spam_dataset" still fails. Is there another Python package to install?

thumbsup
0
message-simple
6

Guidelines for using Snorkel for very imbalanced dataset

I'm using Snorkel to label a very imbalanced dataset (~1 positive for 500 negative). I found this paper: https://ajratner.github.io/assets/papers/Osprey_DEEM.pdf on the Snorkel website which is useful. They define a balanced 'synthetic' training set by over-sampling certain…

thumbsup
0
message-simple
3

Guidelines for developing Labeling Functions

I'm experimenting with Snorkel to train a classifier for an NLP task with 6 classes. The classes are imbalanced (45%, 26%, 17%, 10%, 1%, 1%). I wrote 52 regex LFs with ~86% coverage on my unlabeled data. The LFs are distributed as follows: Class A: 14 …

thumbsup
0
message-simple
1

Some questions about the output of generative model

When I was experimenting with snorkel, I found that the results of different label functions are quite different, and the generative model seems to fit the results of different label functions, and get a compromise result, which is not necessarily better than the one with the…

thumbsup
1
message-simple
1

Reason for at least three labeling functions in L_train while fitting a…

thumbsup
0
message-simple
2

Deploy multiple instances of a trained MTL Classifier, each serving one task

Let's say I've trained an MTL model on N tasks. I plan to deploy this model in a distributed setup such that on each node, I make predictions for data corresponding to only one task. Two scenarios that I'm currently pondering over(not sure whether these are optimal): 1) Say I've…

thumbsup
0
message-simple
2

Should all LFs with less than 0.5 accuracy in a binary classification be thrown…

I have a question. Sometimes I write LFs for a binary classification and they come out just about even, maybe slightly worse than random - 0.49, for example. While these don't work alone I wonder if they still might contribute to the model if Snorkel understands the…

thumbsup
1
message-simple
1