Export Dataset DataOctober 19, 2020 at 9:16pm
I'm working on a dataset with a few .csv files totalling ~30MB. I'd like to review the data and make a couple changes to it. I'm looking for the button to download the data so I can view it, edit it in Excel, and re-upload it. But it doesn't look like this is an option. Even if I export the dataset it only exports the metadata.
I can get by as is, I just need to store the data locally in parallel. But I'd like an option to download/access the data that's stored in the dataset without having to copy files programmatically.
October 19, 2020 at 9:39pm
- Thanks for the feedback on an easier interface to access files in datasets.
While not super obvious, right now you can manually access the files in your dataset on your host. Datasets use a git repository to track metadata and a separate file cache to store the data. Files are linked at runtime into the proper naming and directory structure before mounting into a running Project container.
The files will be located in the dataset's file cache directory:
You should see a folder named
objectsand a folder that looks like a long hash (this hash should correspond to the latest version of your dataset git repository). If you open the folder with the long hash you should be able to see your files. You can copy that file somewhere, open it, edit it, and then drag and drop it back into your dataset. It is important to use the Client interface to do the update because that will trigger the versioning process.
I hope that helps!
Thanks . That's helpful. I also figured out via a previous version of your comment that they can be downloaded via JupyterLab, which is also helpful. I also appreciate the consideration of a built-in way to download files through the GigantumClient sometime in the future.