Skip to content

concat dataset#1411

Open
yueyinqiu wants to merge 10 commits into
dotnet:mainfrom
yueyinqiu:dataset
Open

concat dataset#1411
yueyinqiu wants to merge 10 commits into
dotnet:mainfrom
yueyinqiu:dataset

Conversation

@yueyinqiu

@yueyinqiu yueyinqiu commented Nov 15, 2024

Copy link
Copy Markdown
Contributor
  • add support for torch.utils.data.ConcatDataset
    • a new interface IDataset<out T> is added
      • now Dataset<T> implements IDataset<T>
    • more overloads of DataLoader() has been added, to accept IDataset datasets
      • those overloads directly return a DataLoader<T, S>, rather than its subclasses
    • datasets supported by DataLoader<T, S> have been relaxed to IDataset<T>
  • parameter of collate functions in DataLoader<T, S> has been relaxed to IReadOnlyList

@yueyinqiu yueyinqiu changed the title concat dataset (WIP) concat dataset Nov 15, 2024
@yueyinqiu

yueyinqiu commented Nov 15, 2024

Copy link
Copy Markdown
Contributor Author

related to #1348 (comment) #1354 #1358

@yueyinqiu yueyinqiu changed the title (WIP) concat dataset concat dataset Nov 15, 2024
@yueyinqiu

yueyinqiu commented Nov 15, 2024

Copy link
Copy Markdown
Contributor Author

As mentioned here #1357 (comment), the current version has no API changes (except the type of dataset and collate_fn in DataLoader<T, S>).

However we don't actually need Dataset<T>, Dataset and IterableDataset. Do we have any plan to remove them later? They may also occupy the position of some other classes in PyTorch, like #1353.

(But I find it hard to use [Obsolete] on them since some methods are using them as the return type.)

Comment thread src/TorchSharp/Dataset.cs
/// </summary>
/// <param name="index">Index for tensor</param>
/// <returns>Tensors of index. DataLoader will catenate these tensors into batches.</returns>
T this[long index] { get; }

@yueyinqiu yueyinqiu Nov 15, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are using Int64. However actually most .net containers does not support Int64 indices. And actually it is impractical to have so much data. Should we switch to Int32 instead?

@yueyinqiu yueyinqiu marked this pull request as ready for review November 15, 2024 08:31
@K1T00

K1T00 commented Nov 28, 2025

Copy link
Copy Markdown

What is the status on this one? I waiting for this to be merged.

@yueyinqiu

Copy link
Copy Markdown
Contributor Author

Perhaps you could implement one yourself... This PR is one year old, and I'm not even sure if it conflicts with existing code...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants