menu

snorkel

Building and managing training datasets for machine learning

Channels
# All channels
view-forward
# announcements
view-forward
# api
view-forward
# applications
view-forward
# help
view-forward
# projects
view-forward
# tutorials
view-forward
Team

Announcing Snorkel Flow!

July 14, 2020 at 5:24pm

Announcing Snorkel Flow!

July 14, 2020 at 5:24pm
We made an exciting announcement on snorkel.org today! We've reposted that message below:
The Snorkel team is now focusing their efforts on Snorkel Flow, an end-to-end AI application development platform based on the core ideas behind Snorkel—check it out at snorkel.ai!
The Snorkel project started at Stanford in 2016 with a simple technical bet: that it would increasingly be the training data, not the models, algorithms, or infrastructure, that decided whether a machine learning project succeeded or failed. Given this premise, we set out to explore the radical idea that you could bring mathematical and systems structure to the messy and often entirely manual process of training data creation and management, starting by empowering users to programmatically label, build, and manage training data.
To say that the Snorkel project succeeded and expanded beyond what we had ever expected would be an understatement. The basic goals of a research repo like Snorkel are to provide a minimum viable framework for testing and validating hypotheses. Four years later, we’ve been fortunate to do not just this, but to develop and deploy early versions of Snorkel in partnership with some of the world’s leading organizations like Google, Intel, Stanford Medicine, and many more; author over thirty-six peer-reviewed publications on our findings around Snorkel and related innovations in weak supervision modeling, data augmentation, multi-task learning, and more; be included in courses at top-tier universities; support production deployments in systems that you’ve likely used in the last few hours; and work with an amazing community of researchers and practitioners from industry, medicine, government, academia, and beyond.
However, we realized increasingly–from conversations with users in weekly office hours, workshops, online discussions, and industry partners–that the Snorkel project was just the very first step. The ideas behind Snorkel change not just how you label training data, but so much of the entire lifecycle and pipeline of building, deploying, and managing ML: how users inject their knowledge; how models are constructed, trained, inspected, versioned, and monitored; how entire pipelines are developed iteratively; and how the full set of stakeholders in any ML deployment, from subject matter experts to ML engineers, are incorporated into the process.
Over the last year, we have been building the platform to support this broader vision: Snorkel Flow, an end-to-end machine learning platform for developing and deploying AI applications. Snorkel Flow incorporates many of the concepts of the Snorkel project with a range of newer techniques around weak supervision modeling, data augmentation, multi-task learning, data slicing and structuring, monitoring and analysis, and more, all of which integrate in a way that is greater than the sum of its parts–and that we believe makes ML truly faster, more flexible, and more practical than ever before.
Moving forward, we will be focusing our efforts on Snorkel Flow. We are extremely grateful for all of you that have contributed to the Snorkel project, and are excited for you to check out our next chapter at snorkel.ai.

July 14, 2020 at 5:30pm
Congratulations! Exciting!
  • reply
  • like
is this an open-core business model where the core project stays open and premium features are sold? What are the implications for users in plain-speak?
  • reply
  • like

July 15, 2020 at 3:51am
Hi sorry for delayed response here! First, the Snorkel OSS repo will remain as is, available under Apache 2.0.
As for Snorkel Flow: Functionally, it's not an open core model in the traditional sense, but a separate product. While it uses some components of the Snorkel OSS, and obviously integrates many of the core ideas, it's 95% new code and combines ideas from other Snorkel-related research we've done to build a full end-to-end ML development platform around programmatic training data. We'd love you to check it out at snorkel.ai and hope to show you a demo / more detail soon!
Edited
  • reply
  • like
Thanks Alex and congrats, looking forward to the demo I signed up for.
  • reply
  • like
Hi, congratulations on this new platform! Is this only for companies? I used your tutorials for some research, and I probably will use it for future study too, but I am not affiliated with companies, so I was wondering will it open for individuals and students(undergraduates, master, PhD, etc.)? And will you continue to support the platform for the tutorials? Thank you in advance!
  • reply
  • like

July 15, 2020 at 5:33pm
Thanks!! We always want to support non-corporate users esp. students and researchers (like ourselves!) as much as possible! In the short term: as noted above, the OSS repo will stay where it's at to be accessible to everyone. In the longer term: we are working on ways to make the new end-to-end platform Snorkel Flow specially accessible to non-profits, researchers, and students, once it's ready for this kind of prime time. Stay tuned!
like-fill
2
  • reply
  • like