Building and managing training datasets for machine learning


Hi - newbie here (I'm a software engineer @Google).

December 18, 2019 at 8:57pm

I'm having trouble following the Get Started tutorial. Specifically, after 'pip install snorkel', "from utils import load_unlabeled_spam_dataset" still fails. Is there another Python package to install?

December 18, 2019 at 11:25pm
I guess there is not another Python package to install - so I just run python interactive shell under snorkel-tutorials/getting_started/ directory. Why am I getting segfault on trying to import snorkel.labeling?
(snorkel) [email protected]:~/Snorkel/snorkel-tutorials/getting_started$ python Python 3.7.5rc1 (default, Oct 2 2019, 04:19:31) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.
import snorkel dir(snorkel) ['builtins', 'cached', 'doc', 'file', 'loader', 'name', 'package', 'path', 'spec', 'version', 'version'] snorkel.version '0.9.3' snorkel.package 'snorkel' import snorkel.labeling Segmentation fault
(gdb) run Starting program: /usr/local/google/home/voyager/snorkel/bin/python [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/". [New Thread 0x7ffff4126700 (LWP 89752)] [New Thread 0x7ffff3925700 (LWP 89753)] [New Thread 0x7ffff1124700 (LWP 89754)] [New Thread 0x7fffec923700 (LWP 89755)] [New Thread 0x7fffea122700 (LWP 89756)] [New Thread 0x7fffe7921700 (LWP 89757)] [New Thread 0x7fffe5120700 (LWP 89758)] [New Thread 0x7fffe291f700 (LWP 89759)] [New Thread 0x7fffe011e700 (LWP 89760)] [New Thread 0x7fffdd91d700 (LWP 89761)] [New Thread 0x7fffdd11c700 (LWP 89762)] [Thread 0x7fffdd11c700 (LWP 89762) exited] [Thread 0x7fffdd91d700 (LWP 89761) exited] [Thread 0x7fffe011e700 (LWP 89760) exited] [Thread 0x7fffe291f700 (LWP 89759) exited] [Thread 0x7fffe5120700 (LWP 89758) exited] [Thread 0x7fffe7921700 (LWP 89757) exited] [Thread 0x7fffea122700 (LWP 89756) exited] [Thread 0x7fffec923700 (LWP 89755) exited] [Thread 0x7ffff1124700 (LWP 89754) exited] [Thread 0x7ffff3925700 (LWP 89753) exited] [Thread 0x7ffff4126700 (LWP 89752) exited] [Detaching after fork from child process 89769] [Detaching after fork from child process 89777]
Thread 1 "python" received signal SIGSEGV, Segmentation fault. 0x00007fffc56d1a68 in pybind11::detail::make_new_python_type(pybind11::detail::type_record const&) () from /usr/local/google/home/voyager/snorkel/lib/python3.7/site-packages/torch/lib/
I'm guessing there is compatibility headache with Python 3.7.5rc1 and snorkel-0.9.3? Could the snorkel-team release a newer package that's compatible with Python 3.7.5?
I tried to install snorkel in a virtualenv set to python3.6.10, still getting segfault.

December 21, 2019 at 4:09pm
Hi thanks for posting! Not sure what's going on here--and I'd guess it's something orthogonal to our code--but if you're still running into this, I'd suggest posting to the issues page ( to see if anyone can suggest a fix!