menu

neptune-community

A place where neptune.ai users and developers come together to make things work

Channels
Chat
view-forward
# All channels
view-forward
# General
view-forward
# Blog
view-forward
# Bugs
view-forward
# Feature requests
view-forward
# Random
view-forward
# What's new
view-forward
Team

Neptune in shared environment and computational resources

August 4, 2019 at 7:47pm

Neptune in shared environment and computational resources

August 4, 2019 at 7:47pm (Edited 8 months ago)

August 4, 2019 at 7:49pm
Let me quote it here :
I think a lot of people work in an environment where computational recources are shared and interfaced through a job system like slurm. To keep track of experiments in such a system it is vital to be able to easily continue experiments (say in case my job was suspended to give way for a higher priority job). I really like the dashboard on neptune.ml currently, but this could be a deal breaker for me in the future. So better support for continuing experiments: logging sysout, std err, system info and updating the status of resumed experiments would be a big improvement for me (and I think for many other users as well).
  • reply
  • like
On general level I see that in Neptune we need to provide better experience for Users who use shared resources. Particularly allowing them to re-open / continue suspended / stopped experiment.
Edited
  • reply
  • like
We already have some support. For example you have your stdout, stderr and git info tracked automatically. Check this example experiment: https://ui.neptune.ml/o/neptune-ml/org/credit-default-prediction/e/CRED-123/monitoring .
  • reply
  • like
Let me post a question: Would you prefer to re-open / continue experiment or link multiple experiments together? Latter idea is that if your experiment happens in multiple chunks, you have 1 experiment per chunk, but all of them are linked together, so you can analyse them as single super-experiment.
  • reply
  • like
Regarding system information needed: right now Neptune track hostname and hardware utilization metrics. What additional information do you need to have in Neptune?
  • reply
  • like