Widely varying results when training multi-task model on same inputsApril 9, 2021 at 12:56am
Hello! I'm in the process of writing a paper comparing the efficacy of multi-task models with weak supervision to the performance of their single-task equivalents, and I've been using Snorkel v0.9.6 to explore this. However, I've been experiencing a strange issue in that when running my multi-task model (which is based heavily off the multi-task tutorial here https://www.snorkel.org/use-cases/multitask-tutorial), the accuracies of each task within the model vary widely, even when run repeatedly on the exact same inputs. At times, it performs perfectly well, with high or even perfect accuracy in its predictions. Other times, it fails to predict even a single positive result correctly (see picture below), and I cannot seem to determine what might be causing this variation.
My data is a set of time series measurements containing the round-trip times for ping measurements, but I've been able to replicate this issue with entirely synthetic data as well. I've posted my "testing" notebook here (https://github.com/j-red/MTL-Demo/blob/master/MTL%20Demo.ipynb), in case that may help to shed light on anything. Is there any probable reason as to what might cause such extreme variability in the model's predictions? This is my first deep foray into machine learning, so it's entirely plausible I may have missed something quite obvious to others more experienced in the field. Any insight into what might be causing this would be much appreciated!
Variation in model results:
April 9, 2021 at 3:17am
Hi Jared, that sounds like a great topic for a paper! As you know, there are a whole lot of potential sources of bugs in a full ML pipeline like this. A few questions I would have off the bat:
- Have you looked at the predictions, not just the scores? (e.g., is it possible the model is predicting NaNs somehow)
- Have you looked at the loss curves while training? (e.g., your learning rate may be too high, causing instability and divergence)
- How many classes are in your problem?
Hi Braden, thanks so much for your quick reply! Yes, so far I've looked into the predictions, which all seem to be returning either 0 or 1 (as it should), even on the erroneous tasks. All these separate tasks are binary classification problems, so each should be indicating either a 0 for False or 1 for True if the indicated property appears to be present. I have not yet looked into the loss curves as much though, so I'll try looking more into that route to see if that seems to be the culprit. I was having a bit of trouble identifying what the best route to doing that might be with Snorkel's multi-task methodology; any tips or pointers on where I could begin looking into that? Thank you!