For the concept, think of it a bit more like a TV remote control - that sends an address, that ties it that that particular make or series of TVs, and data to identify the button that was pressed.
There is no difference in the data stream or "packet" in how the address is sent to how the data part is sent; after decoding at the receiver it's just a string of binary bits.
It's down to the way the system designer defined how long, how many bits, are used for the address part of the packet.
The receiver compares that number of bits to the address it is configured for, and if they match it can use the rest of the packet and decode that to a button press or any other kind of number or data, depending on how the packet format for that particular system is defined.
(Whether IR, RF or such as Ethernet, there is also some "framing" data before the packet payload, that allows the receiving system to recognise the start of a data packet from random noise, typically a fixed sequence that's searched as received data is shifted through the receiver; it starts accumulating the bits of the packet content after seeing that sequence).
This is an example of a TPMS packet data format; the first eight bits have the address and three bits to indicate which data is included in the packet, then ten bits of data for each included sensor reading.
Plus a four bit checksum, that the receiver can use to detect some errors for eg. interference, and only use the data of that matches the content.
(Not sure what the frame type bit is used for).
As soon as the receiver has the first eight bits, it knows how many more bits there are in that packet from which sensor bitflags are set, so how much to use and where the checksum will be in the data that follows.