这里会显示出您选择的修订版和当前版本之间的差别。
后一修订版 | 前一修订版 | ||
变长序列处理 [2019/11/10 17:03] admin 创建 |
变长序列处理 [2020/02/01 21:41] (当前版本) |
||
---|---|---|---|
行 1: | 行 1: | ||
- | I have stumbled upon this problem too and below is what I figured out. | + | ====== Pytorch Pack目的 ====== |
+ | [[https://stackoverflow.com/questions/51030782/why-do-we-pack-the-sequences-in-pytorch|why do we “pack” the sequences in pytorch? | ||
+ | ]] | ||
- | When training RNN (LSTM or GRU or vanilla-RNN), it is difficult to batch the variable length sequences. For ex: if length of sequences in a size 8 batch is [4,6,8,5,4,3,7,8], you will pad all the sequences and that will results in 8 sequences of length 8. You would end up doing 64 computation (8x8), but you needed to do only 45 computations. Moreover, if you wanted to do something fancy like using a bidirectional-RNN it would be harder to do batch computations just by padding and you might end up doing more computations than required. | + | I have stumbled upon this problem too and below is what I figured out. |
- | Instead, pytorch allows us to pack the sequence, internally packed sequence is a tuple of two lists. One contains the elements of sequences. Elements are interleaved by time steps (see example below) and other contains the size of each sequence the batch size at each step. This is helpful in recovering the actual sequences as well as telling RNN what is the batch size at each time step. This has been pointed by @Aerin. This can be passed to RNN and it will internally optimize the computations. | + | When training RNN (LSTM or GRU or vanilla-RNN), it is difficult to batch the variable length sequences. For ex: if length of sequences in a size 8 batch is [4,6,8,5,4,3,7,8], you will pad all the sequences and that will results in 8 sequences of length 8. You would end up doing 64 computation (8x8), but you needed to do only 45 computations. Moreover, if you wanted to do something fancy like using a bidirectional-RNN it would be harder to do batch computations just by padding and you might end up doing more computations than required. |
- | I might have been unclear at some points, so let me know and I can add more explanations. | + | Instead, pytorch allows us to pack the sequence, internally packed sequence is a tuple of two lists. One contains the elements of sequences. Elements are interleaved by time steps (see example below) and other contains the size of each sequence the batch size at each step. This is helpful in recovering the actual sequences as well as telling RNN what is the batch size at each time step. This has been pointed by @Aerin. This can be passed to RNN and it will internally optimize the computations. |
- | Here a code example: | + | I might have been unclear at some points, so let me know and I can add more explanations. |
- | a = [torch.tensor([1,2,3]), torch.tensor([3,4])] | + | Here a code example: |
- | b = torch.nn.utils.rnn.pad_sequence(a, batch_first=True) | + | |
- | >>>> | + | a = [torch.tensor([1,2,3]), torch.tensor([3,4])] |
- | tensor([[ 1, 2, 3], | + | b = torch.nn.utils.rnn.pad_sequence(a, batch_first=True) |
- | [ 3, 4, 0]]) | + | >>>> |
- | torch.nn.utils.rnn.pack_padded_sequence(b, batch_first=True, lengths=[3,2]) | + | tensor([[ 1, 2, 3], |
- | >>>>PackedSequence(data=tensor([ 1, 3, 2, 4, 3]), batch_sizes=tensor([ 2, 2, 1])) | + | [ 3, 4, 0]]) |
+ | torch.nn.utils.rnn.pack_padded_sequence(b, batch_first=True, lengths=[3,2]) | ||
+ | >>>>PackedSequence(data=tensor([ 1, 3, 2, 4, 3]), batch_sizes=tensor([ 2, 2, 1])) |