Best practice for parsing data of mixed type?

up vote
0
down vote

favorite

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.

For instance, let's say the data is 10 bytes, and it consists of:

Byte 0-1: manufacturer ID (int)

Byte 2: type (int)

Byte 3-4: device id (ascii char)

I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.

edited Nov 10 at 4:39

melpomene

56.8k54488

asked Nov 10 at 4:34

Jinsuk

346

1

is the data binary or text?
– Swordfish
Nov 10 at 4:42

2

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 at 4:44

1

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 at 5:02

|
show 3 more comments

up vote
0
down vote

favorite

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.

For instance, let's say the data is 10 bytes, and it consists of:

Byte 0-1: manufacturer ID (int)

Byte 2: type (int)

Byte 3-4: device id (ascii char)

I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.

edited Nov 10 at 4:39

melpomene

56.8k54488

asked Nov 10 at 4:34

Jinsuk

346

1

is the data binary or text?
– Swordfish
Nov 10 at 4:42

2

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 at 4:44

1

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 at 5:02

|
show 3 more comments

up vote
0
down vote

favorite

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.

For instance, let's say the data is 10 bytes, and it consists of:

Byte 0-1: manufacturer ID (int)

Byte 2: type (int)

Byte 3-4: device id (ascii char)

I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.

edited Nov 10 at 4:39

melpomene

56.8k54488

asked Nov 10 at 4:34

Jinsuk

346

I am wondering whether there is any known best practice/method for parsing mixed type of data packet.

For instance, let's say the data is 10 bytes, and it consists of:

Byte 0-1: manufacturer ID (int)

Byte 2: type (int)

Byte 3-4: device id (ascii char)

I could simply define each data type size and location as #define, and parse it using those defines. But I am wondering if there is any structure to organise this better.

c parsing

edited Nov 10 at 4:39

melpomene

56.8k54488

asked Nov 10 at 4:34

Jinsuk

346

edited Nov 10 at 4:39

melpomene

56.8k54488

asked Nov 10 at 4:34

Jinsuk

346

edited Nov 10 at 4:39

melpomene

56.8k54488

edited Nov 10 at 4:39

melpomene

56.8k54488

edited Nov 10 at 4:39

melpomene

56.8k54488

asked Nov 10 at 4:34

Jinsuk

346

asked Nov 10 at 4:34

Jinsuk

346

asked Nov 10 at 4:34

Jinsuk

346

1

is the data binary or text?
– Swordfish
Nov 10 at 4:42

2

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 at 4:44

1

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 at 5:02

|
show 3 more comments

1

is the data binary or text?
– Swordfish
Nov 10 at 4:42

2

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 at 4:44

1

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 at 5:02

is the data binary or text?
– Swordfish
Nov 10 at 4:42

If you know the packet definition ahead of time, why not just define a custom struct with all of the required fields?
– MikeFromCanmore
Nov 10 at 4:44

You have to read the data just like how you wrote it... Since it looks like it is a binary file...
– Ruks
Nov 10 at 4:44

Does data pass though files for other platforms? Post some sample inputs and exported output data packets. Else this is too broad/unclear.
– chux
Nov 10 at 4:44

@Jinsuk, "I could simply define each data type size and location as #define, and parse it using those defines" --> I truly wish you had done that, posted that code and then asked about best practices. It would add information and make for a good post. This one is too broad.
– chux
Nov 10 at 5:02

|
show 3 more comments

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

Then, based on the assumption of "potential incorrectness" define types to distinguish between "unchecked, potential incorrect data" and "checked, known correct data". For your example, you could use uint8_t packet[10]; as the data type for unchecked data and a normal structure (with padding and without __attribute__((packed));) for the checked data. This makes it extremely difficult for a programmer to accidentally use unsafe data when they think they're using safe/checked data.

Of course you will also need code to convert between these data types, which needs to do as many sanity checks as possible (and possibly also worry about things like endianess). For your example these checks could be:

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

Note that this function should return some kind of status to indicate if the conversion was successful or not, and in most cases this status should also give an indication of what the problem was if the conversion wasn't successful (so that the caller can inform the user or log the problem or handle the problem in the most suitable way for the problem). For example, maybe "unknown manufacturer ID" means that the program needs to be updated to handle a new manufacturer and that the data was correct, and "invalid manufacturer ID" means that the data was definitely wrong.

edited Nov 10 at 13:12

answered Nov 10 at 7:21

Brendan

11.6k1230

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 at 13:12

add a comment |

up vote
0
down vote

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 at 4:54

John Zwinck

149k16175286

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 at 10:54

|
show 1 more comment

Your Answer

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53236024%2fbest-practice-for-parsing-data-of-mixed-type%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

edited Nov 10 at 13:12

answered Nov 10 at 7:21

Brendan

11.6k1230

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 at 13:12

add a comment |

up vote
2
down vote

accepted

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

edited Nov 10 at 13:12

answered Nov 10 at 7:21

Brendan

11.6k1230

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 at 13:12

add a comment |

up vote
2
down vote

accepted

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

edited Nov 10 at 13:12

answered Nov 10 at 7:21

Brendan

11.6k1230

Best practice it to assume all data from outside the program (e.g. from the user, from a file, from a network, from a different process) is potentially incorrect (and potentially unsafe/malicious).

are any of the bytes that are supposed to be ASCII characters >= 0x80, and are any of them invalid (e.g. maybe control characters like backspace are not permitted).

is the manufacturer ID valid (e.g. maybe there's an enumeration that it needs to match)

is the type valid (e.g. maybe there's an enumeration that it needs to match)

edited Nov 10 at 13:12

answered Nov 10 at 7:21

Brendan

11.6k1230

edited Nov 10 at 13:12

answered Nov 10 at 7:21

Brendan

11.6k1230

answered Nov 10 at 7:21

Brendan

11.6k1230

answered Nov 10 at 7:21

Brendan

11.6k1230

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 at 13:12

add a comment |

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 at 13:12

Haha I saw __attribute__((packed)) and instinctively downvoted - well, reversed now.
– Antti Haapala
Nov 10 at 8:17

but perhaps use uint8_t for input.
– Antti Haapala
Nov 10 at 8:18

@AnttiHaapala: Fixed uint8_t :-)
– Brendan
Nov 10 at 13:12

add a comment |

up vote
0
down vote

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 at 4:54

John Zwinck

149k16175286

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 at 10:54

|
show 1 more comment

up vote
0
down vote

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 at 4:54

John Zwinck

149k16175286

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 at 10:54

|
show 1 more comment

up vote
0
down vote

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 at 4:54

John Zwinck

149k16175286

Like this:

struct packet 
 uint16_t mfg;
 uint8_t type;
 uint16_t devid;
 __attribute__((packed));

The packed attribute (or your platform's equivalent) is required to avoid implicit padding which doesn't exist in the protocol.

Once you have the above struct, you simply cast (part of) a char array which you received from wherever:

char buf[1000];
(struct packet*)(buf + N);

answered Nov 10 at 4:54

John Zwinck

149k16175286

answered Nov 10 at 4:54

John Zwinck

149k16175286

answered Nov 10 at 4:54

John Zwinck

149k16175286

answered Nov 10 at 4:54

John Zwinck

149k16175286

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 at 10:54

|
show 1 more comment

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 at 4:58

1

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 at 10:54

Usual suspects: A complaint C complier may not have any packing ability. Should the packet come from another machine, endian may differ.
– chux
Nov 10 at 4:58

Yea. like this :) ...@chux, Presumably if you know the data packet definition, you know what endianness it uses as well. If you need to, you can do your bit/byte-swapping before casting with the struct.
– MikeFromCanmore
Nov 10 at 5:07

Also, packed is a sure way to cause undefined behaviour in even those compilers that support packing. Casting an array of char to a packet is a violation of strict aliasing rule.
– Antti Haapala
Nov 10 at 8:20

An example here: stackoverflow.com/a/46790815/918959
– Antti Haapala
Nov 10 at 8:22

@AnttiHaapala: I did not cast an array of char. I cast buf + N which is a char*. Which to my understanding is allowed to be cast (there's an exception to strict aliasing for char*). What do you think about that?
– John Zwinck
Nov 10 at 10:54

|
show 1 more comment

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Pfthb