Skip to content

[???] encode_memory() dirtiness #15

@kotee4ko

Description

@kotee4ko

Hello. Thanks for you're kindness to share such a good project.

Could you please explain me why does we need
encode_memory

    @staticmethod
    def encode_memory(mems):
        """Encode memory to ids

        <pad>: 0
        <SEP>: 1
        <unk>: 2
        mem_id: mem_offset + 3
        """
        ret = []
        for mem in mems[: VocabEntry.MAX_MEM_LENGTH]:
            if mem == "<SEP>":
                ret.append(1)
            elif mem > VocabEntry.MAX_STACK_SIZE:
                ret.append(2)
            else:
                ret.append(3 + mem)
        return ret

this function, and why does it attempt to compare integers with '' ?
I can't get send of appending int token to array of int tokens instead of int token.

and second question is about this one:

            def var_loc_in_func(loc):
                print(" TODO: fix the magic number for computing vocabulary idx")
                if isinstance(loc, Register):
                    return 1030 + self.vocab.regs[loc.name]
                else:
                    from utils.vocab import VocabEntry

                    return (
                        3 + stack_start_pos - loc.offset
                        if stack_start_pos - loc.offset < VocabEntry.MAX_STACK_SIZE
                        else 2
                    )

what and why is 1030 constant do?

And in general, why we define tokens as that:

            self.word2id["<pad>"] = PAD_ID
            self.word2id["<s>"] = 1
            self.word2id["</s>"] = 2
            self.word2id["<unk>"] = 3
            self.word2id[SAME_VARIABLE_TOKEN] = 4

but using as this:

        <pad>: 0
        <SEP>: 1
        <unk>: 2
        mem_id: mem_offset + 3

Sorry, if my questions in too much, I specialize on system programming, and math with ML is a hobby.
Forward Thanks =)

@pcyin
@qibinc
@jlacomis

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions