Loading remote data?November 18, 2019 at 5:02pm
I am running Gigantum on a remote machine. However, when I try to add a file in input data, it brings up my local filesystem. Seeing as the data I want to load is way too big to store on my local machine, how can I load data sitting on the remote machine?
November 18, 2019 at 5:33pm
- So this isn't supported super well at the moment, but we have been working on a feature for this problem. For now, the workaround would be to copy your data into place manually, and then trigger a version to be created.
- SSH into your remote machine
- If you want to track and version the files, you'd copy to
- If you DO NOT want to track and version the files (which sounds like your case because they are big), at the moment you should copy to
~/gigantum/<username>/<username>/labbooks/<project name>/output/untracked. This is a bit of an anti-pattern, but at the moment it's the only place that is not tracked, synced, or versioned by default. Also, you can probably just symlink your data if you want to do that as well to avoid a copy.
- Start your project container and then stop it. This guarantees everything is clean and versioned properly.
- Finally refresh the page and you should see your data.
If your data is large and you DO want to version it and sync it, you can use a Dataset. If you want to go this route let me know and I can provide a similar workaround for Datasets.
Finally, as I mentioned we are working on a feature for this use case. It is a Dataset type that maps to local storage only. It tracks data, letting you know if things have changed, but does not fully version contents. It also prevents moving data around (which is good if large or sensitive), but if the data is available locally will link and mount files as needed at runtime automatically.
Does this sound like it would meet your needs? How big is "big" in your case? Do you ever want to version and sync your data, or are you happy just having access to it locally?
Also, our data is not that big (think 10g to 100g) range, but usually large enough to avoid storing on our desktops. All our data is generally stored on remote machines and backed up properly anyway, so none of us ever going through the trouble of storing it on our personal machines.
November 28, 2019 at 2:54am
Hopefully its OK if I add to this question. I haven't used docker before so I'm a bit fuzzy on the technicalities of what parts, if any, of your local file system are accessible. Basically my question is; Can I just provide an absolute path to a file on my desktop system, which is running the client and the project, and open a file or equivalently a file on my companies network file system? Or does it have to be within the project docker file system only? (e.g as per the /input folder)
November 28, 2019 at 2:19pm
- Always ok to ask questions! It can be a bit confusing where things are because so much is happening automatically behind the scenes. Everything lives on your local file system and is mounted into Docker containers at runtime as needed. So if you go to your user directory, you should see a
gigantumdirectory. In here is all of your data. When the Client starts, this entire directory is mounted into the Client container. When a Project starts, just the individual Project directory is mounted into the Project container.
Right now to access files in a Project they do need to either be IN the Project (so added to the input, code, or output sections) or part of a Dataset that is linked to the Project.
We have heard this request to access data without full versioning and also things on a networked file system frequently. We are working on a new Dataset type that lets you point to folders that you can access on your local file system. It will track when files change, letting you know, but it won't fully version. This means certain features, such as rollback, won't be supported. We think this will meet the needs of most people in this scenario.
Let me know if that doesn't make sense or if you have any other questions
November 28, 2019 at 8:45pm
December 9, 2019 at 6:35pm