How to read columns from a text file and save to separate arrays in C?









up vote
1
down vote

favorite












In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.



The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok, or fgets or sscanf, but when should I use one instead of the other?



Here is my commented code:



#include <stdio.h>
#include <stdlib.h>

char *read_file(char *FILE_INPUT); /*function to read file*/

int main(int argc, char **argv)
char *string; // Pointer to a char

string = read_file("file.txt");
if (string)
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);

return 0;


//Returns a pointer to a char,
char *read_file(char *FILE_INPUT)
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");

//Check if file exists
if (input_stream == NULL)
perror (FILE_INPUT);

else if (input_stream)
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);

// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));

// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);

// fread doesn't set it so put a in the last position
// and buffer is now officially a string
buffer[string_size] = '';

//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size)
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;


// Always remember to close the file.
fclose(input_stream);


return buffer;



How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.



C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.


On further research, I found that fread is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread is intended to do. Thus my program implementation for this kind of job is wrong.



Should I use getc, strtok, sscanf or getline to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.




EDIT:



By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.



My first thought was to replace fread with fgets. Update, I am getting somewhere thanks to your help.



 // Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));

while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream))
printf("%s", buffer);



for the above text file prints:



C 08902019 1020 50 Test1

A 08902666 1040 30 Test2

B 08902768 1060 80 Test3

B 08902768 1060 800 Test3000


I also managed to remove the newline character from fgets() input using:



strtok(buffer, "n"); 


Similar examples here , here and here



How can I proceed to save the columns to separate arrays?










share|improve this question



















  • 2




    "so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
    – Dai
    Nov 10 at 8:31






  • 1




    No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use fseek to jump to particular location in the file to read some bytes.
    – Rishikesh Raje
    Nov 10 at 8:36










  • Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
    – Rrz0
    Nov 10 at 8:38










  • @DavidC.Rankin understood, so the recommended way is to use fgets?
    – Rrz0
    Nov 10 at 8:45










  • If the file is binary, then you are pretty much stuck with the struct approach. If it is just text, then yes, fgets then sscanf (or walk a pair of pointers down the line picking out what you need) Note: you can also use fgets then strtok to separate (tokenize) the fields. You can do the same thing with sscanf using the "%n" specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
    – David C. Rankin
    Nov 10 at 8:46















up vote
1
down vote

favorite












In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.



The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok, or fgets or sscanf, but when should I use one instead of the other?



Here is my commented code:



#include <stdio.h>
#include <stdlib.h>

char *read_file(char *FILE_INPUT); /*function to read file*/

int main(int argc, char **argv)
char *string; // Pointer to a char

string = read_file("file.txt");
if (string)
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);

return 0;


//Returns a pointer to a char,
char *read_file(char *FILE_INPUT)
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");

//Check if file exists
if (input_stream == NULL)
perror (FILE_INPUT);

else if (input_stream)
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);

// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));

// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);

// fread doesn't set it so put a in the last position
// and buffer is now officially a string
buffer[string_size] = '';

//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size)
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;


// Always remember to close the file.
fclose(input_stream);


return buffer;



How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.



C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.


On further research, I found that fread is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread is intended to do. Thus my program implementation for this kind of job is wrong.



Should I use getc, strtok, sscanf or getline to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.




EDIT:



By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.



My first thought was to replace fread with fgets. Update, I am getting somewhere thanks to your help.



 // Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));

while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream))
printf("%s", buffer);



for the above text file prints:



C 08902019 1020 50 Test1

A 08902666 1040 30 Test2

B 08902768 1060 80 Test3

B 08902768 1060 800 Test3000


I also managed to remove the newline character from fgets() input using:



strtok(buffer, "n"); 


Similar examples here , here and here



How can I proceed to save the columns to separate arrays?










share|improve this question



















  • 2




    "so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
    – Dai
    Nov 10 at 8:31






  • 1




    No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use fseek to jump to particular location in the file to read some bytes.
    – Rishikesh Raje
    Nov 10 at 8:36










  • Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
    – Rrz0
    Nov 10 at 8:38










  • @DavidC.Rankin understood, so the recommended way is to use fgets?
    – Rrz0
    Nov 10 at 8:45










  • If the file is binary, then you are pretty much stuck with the struct approach. If it is just text, then yes, fgets then sscanf (or walk a pair of pointers down the line picking out what you need) Note: you can also use fgets then strtok to separate (tokenize) the fields. You can do the same thing with sscanf using the "%n" specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
    – David C. Rankin
    Nov 10 at 8:46













up vote
1
down vote

favorite









up vote
1
down vote

favorite











In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.



The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok, or fgets or sscanf, but when should I use one instead of the other?



Here is my commented code:



#include <stdio.h>
#include <stdlib.h>

char *read_file(char *FILE_INPUT); /*function to read file*/

int main(int argc, char **argv)
char *string; // Pointer to a char

string = read_file("file.txt");
if (string)
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);

return 0;


//Returns a pointer to a char,
char *read_file(char *FILE_INPUT)
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");

//Check if file exists
if (input_stream == NULL)
perror (FILE_INPUT);

else if (input_stream)
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);

// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));

// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);

// fread doesn't set it so put a in the last position
// and buffer is now officially a string
buffer[string_size] = '';

//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size)
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;


// Always remember to close the file.
fclose(input_stream);


return buffer;



How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.



C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.


On further research, I found that fread is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread is intended to do. Thus my program implementation for this kind of job is wrong.



Should I use getc, strtok, sscanf or getline to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.




EDIT:



By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.



My first thought was to replace fread with fgets. Update, I am getting somewhere thanks to your help.



 // Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));

while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream))
printf("%s", buffer);



for the above text file prints:



C 08902019 1020 50 Test1

A 08902666 1040 30 Test2

B 08902768 1060 80 Test3

B 08902768 1060 800 Test3000


I also managed to remove the newline character from fgets() input using:



strtok(buffer, "n"); 


Similar examples here , here and here



How can I proceed to save the columns to separate arrays?










share|improve this question















In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.



The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok, or fgets or sscanf, but when should I use one instead of the other?



Here is my commented code:



#include <stdio.h>
#include <stdlib.h>

char *read_file(char *FILE_INPUT); /*function to read file*/

int main(int argc, char **argv)
char *string; // Pointer to a char

string = read_file("file.txt");
if (string)
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);

return 0;


//Returns a pointer to a char,
char *read_file(char *FILE_INPUT)
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");

//Check if file exists
if (input_stream == NULL)
perror (FILE_INPUT);

else if (input_stream)
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);

// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));

// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);

// fread doesn't set it so put a in the last position
// and buffer is now officially a string
buffer[string_size] = '';

//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size)
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;


// Always remember to close the file.
fclose(input_stream);


return buffer;



How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.



C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.


On further research, I found that fread is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread is intended to do. Thus my program implementation for this kind of job is wrong.



Should I use getc, strtok, sscanf or getline to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.




EDIT:



By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.



My first thought was to replace fread with fgets. Update, I am getting somewhere thanks to your help.



 // Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));

while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream))
printf("%s", buffer);



for the above text file prints:



C 08902019 1020 50 Test1

A 08902666 1040 30 Test2

B 08902768 1060 80 Test3

B 08902768 1060 800 Test3000


I also managed to remove the newline character from fgets() input using:



strtok(buffer, "n"); 


Similar examples here , here and here



How can I proceed to save the columns to separate arrays?







c file pointers






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 10:58

























asked Nov 10 at 8:29









Rrz0

467518




467518







  • 2




    "so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
    – Dai
    Nov 10 at 8:31






  • 1




    No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use fseek to jump to particular location in the file to read some bytes.
    – Rishikesh Raje
    Nov 10 at 8:36










  • Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
    – Rrz0
    Nov 10 at 8:38










  • @DavidC.Rankin understood, so the recommended way is to use fgets?
    – Rrz0
    Nov 10 at 8:45










  • If the file is binary, then you are pretty much stuck with the struct approach. If it is just text, then yes, fgets then sscanf (or walk a pair of pointers down the line picking out what you need) Note: you can also use fgets then strtok to separate (tokenize) the fields. You can do the same thing with sscanf using the "%n" specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
    – David C. Rankin
    Nov 10 at 8:46













  • 2




    "so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
    – Dai
    Nov 10 at 8:31






  • 1




    No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use fseek to jump to particular location in the file to read some bytes.
    – Rishikesh Raje
    Nov 10 at 8:36










  • Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
    – Rrz0
    Nov 10 at 8:38










  • @DavidC.Rankin understood, so the recommended way is to use fgets?
    – Rrz0
    Nov 10 at 8:45










  • If the file is binary, then you are pretty much stuck with the struct approach. If it is just text, then yes, fgets then sscanf (or walk a pair of pointers down the line picking out what you need) Note: you can also use fgets then strtok to separate (tokenize) the fields. You can do the same thing with sscanf using the "%n" specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
    – David C. Rankin
    Nov 10 at 8:46








2




2




"so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
– Dai
Nov 10 at 8:31




"so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
– Dai
Nov 10 at 8:31




1




1




No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use fseek to jump to particular location in the file to read some bytes.
– Rishikesh Raje
Nov 10 at 8:36




No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use fseek to jump to particular location in the file to read some bytes.
– Rishikesh Raje
Nov 10 at 8:36












Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
– Rrz0
Nov 10 at 8:38




Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
– Rrz0
Nov 10 at 8:38












@DavidC.Rankin understood, so the recommended way is to use fgets?
– Rrz0
Nov 10 at 8:45




@DavidC.Rankin understood, so the recommended way is to use fgets?
– Rrz0
Nov 10 at 8:45












If the file is binary, then you are pretty much stuck with the struct approach. If it is just text, then yes, fgets then sscanf (or walk a pair of pointers down the line picking out what you need) Note: you can also use fgets then strtok to separate (tokenize) the fields. You can do the same thing with sscanf using the "%n" specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
– David C. Rankin
Nov 10 at 8:46





If the file is binary, then you are pretty much stuck with the struct approach. If it is just text, then yes, fgets then sscanf (or walk a pair of pointers down the line picking out what you need) Note: you can also use fgets then strtok to separate (tokenize) the fields. You can do the same thing with sscanf using the "%n" specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
– David C. Rankin
Nov 10 at 8:46













4 Answers
4






active

oldest

votes

















up vote
4
down vote



accepted










"Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.



For reading a fixed number of fields (in your case choosing cols 1, 2, 5 as string values of unknown length) and cols 3, 4 as simple int values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.



An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2 or 2 or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.



Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf and validating that 5 conversions took place by checking the sscanf return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.



(there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)



As discussed in the comments, reading a line of data with a line-oriented input function like fgets or POSIX getline and then parsing the data by either tokenizing with strtok, or in this case with a fixed number of fields simply parsing with sscanf is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf, but that also injects the scanf problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)



The easiest way to approach storage of your 5-fields is to declare a simple struct. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int, e.g.



#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024

typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;


Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8 or 16 with a doubling scheme as that will grow reasonably fast), but we have chosen 2 here with #define ARRSZ 2 to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024 for your data (don't skimp on buffer size)



To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size) you realloc, e.g.



int main (int argc, char **argv) 

char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;


/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;


while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */

line++; /* increment line count */

/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;


if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */

data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */



(note: when calling realloc, you never realloc the pointer itself, e.g. data = realloc (data, new_size); If realloc fails (and it does), it returns NULL which would overwrite your original pointer causing a memory leak. Always realloc with a temporary pointer, validate, then assign the new block of memory to your original pointer)



What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.



 /* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */


/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */

if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */

data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */

row++; /* increment number of row pointers used */

if (fp != stdin) /* close file if not stdin */
fclose (fp);

puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);

freemydata (data, row);

return 0;



And we are done (except for the memory use/error check)



Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup() You can use strdup() if you have it, I simply included the function to show you how to manually handle the malloc and copy. Note: how the copy is done with memcpy instead of strcpy -- Why? You already scanned forward in the string to find '' when you found the length with strlen -- no need to have strcpy repeat that process again -- just use memcpy.



/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)

if (!s) /* validate s not NULL */
return NULL;

size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */

if (!sdup) /* validate */
return NULL;

return memcpy (sdup, s, len + 1); /* pointer to copied string */



Last helper function is freemydata() which just calls free() on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc block to move that to it's own function as well)



/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)

for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);

free (data); /* free structs */



Putting all the pieces together would give you:



#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024

typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;

/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)

if (!s) /* validate s not NULL */
return NULL;

size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */

if (!sdup) /* validate */
return NULL;

return memcpy (sdup, s, len + 1); /* pointer to copied string */


/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)

for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);

free (data); /* free structs */


int main (int argc, char **argv)

char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;


/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;


while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */

line++; /* increment line count */

/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;


if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */

data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */


/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */


/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */

if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */

data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */

row++; /* increment number of row pointers used */

if (fp != stdin) /* close file if not stdin */
fclose (fp);

puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);

freemydata (data, row);

return 0;



Now test.



Example Input File



$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3


Example Use/Output



$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct

C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3


Memory Use/Error Check



In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct

C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721== in use at exit: 0 bytes in 0 blocks
==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


Always confirm that you have freed all memory you have allocated and that there are no memory errors.



Look things over and let me know if you have further questions.






share|improve this answer






















  • Are you sure, I line++; before I hit that block :) (didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++; and just added it above by happy-mistake.
    – David C. Rankin
    Nov 10 at 11:17











  • Hardcoding 1023 in the sscanf() calls defeats the purpose of using a define for MAXC. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023 and use %s directly.
    – chqrlie
    Nov 10 at 11:20











  • Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at MAXC the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
    – David C. Rankin
    Nov 10 at 11:23











  • I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
    – chqrlie
    Nov 10 at 11:25










  • First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with struct, and I do not understand why you pass col1, col2 and col5 as pointers to chars, while col3 and col4 as integers. As my sample text file, col4 and col5 may have varying sizes, while the others are fixed. Thanks once again.
    – Rrz0
    Nov 11 at 9:17


















up vote
1
down vote














I want to proceed to read only certain columns of this text file.




You can do this with any input function: getc, fgets, sscanf, getline... but you must first define exactly what you mean by certain columns.



  • columns can be defined as separated by a specific character such as ,, ; or TAB, in which case strtok() is definitely not the right choice because it treats all sequences of separating characters as a single separator: hence a,,b would be seen as having only 2 columns.

  • if they are instead separated by whitespace, any sequence of spaces or tabs, strtok, strpbrk or strspn / strcspn might come in handy.

In any case, you can read the file line by line with fgets but you might have a problem with very long lines. getline is a solution, but it might not be available on all systems.






share|improve this answer






















  • Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as , or else simply white space, depending on which is 'easier' to implement.
    – Rrz0
    Nov 10 at 11:00











  • The answer here uses strtok() for both , and whitespace separated columns. Why is strtok() not a good choice for the first case you mentioned?
    – Rrz0
    Nov 10 at 11:07











  • @Rrz0: I amended the answer to explain why strtok is inappropriate for ,. strtok has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
    – chqrlie
    Nov 10 at 11:13

















up vote
0
down vote













Depending on the data and daring, you could use scanf or a parser created with yacc/lex.






share|improve this answer



























    up vote
    0
    down vote













    If you know what is column separator and how many columns you have you use getline with column separator and then with line separator.



    Here is getline:



    http://man7.org/linux/man-pages/man3/getline.3.html



    It is very good because it allocates space for you, no need to know how many bytes is your column or line.



    Or you just use getline as in code example in link to read whole line then you "parse" and extract columns as you wish....



    If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(



    Or is it somehow you cannot use library?



    Although while waiting for better question I will note that you can use awk to read columns from text file but probably this is not what you want? Because what are you trying to do really?






    share|improve this answer






















    • @DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
      – Mun Dong
      Nov 10 at 8:49










    • Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
      – Rrz0
      Nov 10 at 8:50











    • @DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
      – Mun Dong
      Nov 10 at 8:51










    • @Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
      – Mun Dong
      Nov 10 at 8:52










    • You are right, will edit question.
      – Rrz0
      Nov 10 at 8:53










    Your Answer






    StackExchange.ifUsing("editor", function ()
    StackExchange.using("externalEditor", function ()
    StackExchange.using("snippets", function ()
    StackExchange.snippets.init();
    );
    );
    , "code-snippets");

    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "1"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237280%2fhow-to-read-columns-from-a-text-file-and-save-to-separate-arrays-in-c%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    4
    down vote



    accepted










    "Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.



    For reading a fixed number of fields (in your case choosing cols 1, 2, 5 as string values of unknown length) and cols 3, 4 as simple int values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.



    An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2 or 2 or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.



    Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf and validating that 5 conversions took place by checking the sscanf return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.



    (there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)



    As discussed in the comments, reading a line of data with a line-oriented input function like fgets or POSIX getline and then parsing the data by either tokenizing with strtok, or in this case with a fixed number of fields simply parsing with sscanf is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf, but that also injects the scanf problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)



    The easiest way to approach storage of your 5-fields is to declare a simple struct. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int, e.g.



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
    #define MAXC 1024

    typedef struct
    char *col1, *col2, *col5;
    int col3, col4;
    mydata_t;


    Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8 or 16 with a doubling scheme as that will grow reasonably fast), but we have chosen 2 here with #define ARRSZ 2 to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024 for your data (don't skimp on buffer size)



    To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size) you realloc, e.g.



    int main (int argc, char **argv) 

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) /* validate file open for reading */
    perror ("file open failed");
    return 1;


    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data)))
    perror ("malloc-data");
    return 1;


    while (fgets (buf, MAXC, fp)) /* read each line from file */
    char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
    int c3, c4; /* temp ints for c3,4 */
    size_t len = strlen (buf); /* length for validation */

    line++; /* increment line count */

    /* validate line fit in buffer */
    if (len && buf[len-1] != 'n' && len == MAXC - 1)
    fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
    return 1;


    if (row == arrsz) /* check if all pointers used */
    void *tmp = realloc (data, arrsz * 2 * sizeof *data);
    if (!tmp) /* validate realloc succeeded */
    perror ("realloc-data");
    break; /* break, don't exit, data still valid */

    data = tmp; /* assign realloc'ed block to data */
    arrsz *= 2; /* update arrsz to reflect new allocation */



    (note: when calling realloc, you never realloc the pointer itself, e.g. data = realloc (data, new_size); If realloc fails (and it does), it returns NULL which would overwrite your original pointer causing a memory leak. Always realloc with a temporary pointer, validate, then assign the new block of memory to your original pointer)



    What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.



     /* parse buf into fields, handle error on invalid format of line */
    if (sscanf (buf, "%1023s %1023s %d %d %1023s",
    c1, c2, &c3, &c4, c5) != 5)
    fprintf (stderr, "error: invalid format line %zun", line);
    continue; /* get next line */


    /* allocate copy strings, assign allocated blocks to pointers */
    if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    data[row].col3 = c3; /* assign integer values */
    data[row].col4 = c4;
    if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    row++; /* increment number of row pointers used */

    if (fp != stdin) /* close file if not stdin */
    fclose (fp);

    puts ("values stored in structn");
    for (size_t i = 0; i < row; i++)
    printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
    data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;



    And we are done (except for the memory use/error check)



    Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup() You can use strdup() if you have it, I simply included the function to show you how to manually handle the malloc and copy. Note: how the copy is done with memcpy instead of strcpy -- Why? You already scanned forward in the string to find '' when you found the length with strlen -- no need to have strcpy repeat that process again -- just use memcpy.



    /* simple implementation of strdup - in the event you don't have it */
    char *mystrdup (const char *s)

    if (!s) /* validate s not NULL */
    return NULL;

    size_t len = strlen (s); /* get length */
    char *sdup = malloc (len + 1); /* allocate length + 1 */

    if (!sdup) /* validate */
    return NULL;

    return memcpy (sdup, s, len + 1); /* pointer to copied string */



    Last helper function is freemydata() which just calls free() on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc block to move that to it's own function as well)



    /* simple function to free all data when done */
    void freemydata (mydata_t *data, size_t n)

    for (size_t i = 0; i < n; i++) /* free allocated strings */
    free (data[i].col1);
    free (data[i].col2);
    free (data[i].col5);

    free (data); /* free structs */



    Putting all the pieces together would give you:



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
    #define MAXC 1024

    typedef struct
    char *col1, *col2, *col5;
    int col3, col4;
    mydata_t;

    /* simple implementation of strdup - in the event you don't have it */
    char *mystrdup (const char *s)

    if (!s) /* validate s not NULL */
    return NULL;

    size_t len = strlen (s); /* get length */
    char *sdup = malloc (len + 1); /* allocate length + 1 */

    if (!sdup) /* validate */
    return NULL;

    return memcpy (sdup, s, len + 1); /* pointer to copied string */


    /* simple function to free all data when done */
    void freemydata (mydata_t *data, size_t n)

    for (size_t i = 0; i < n; i++) /* free allocated strings */
    free (data[i].col1);
    free (data[i].col2);
    free (data[i].col5);

    free (data); /* free structs */


    int main (int argc, char **argv)

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) /* validate file open for reading */
    perror ("file open failed");
    return 1;


    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data)))
    perror ("malloc-data");
    return 1;


    while (fgets (buf, MAXC, fp)) /* read each line from file */
    char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
    int c3, c4; /* temp ints for c3,4 */
    size_t len = strlen (buf); /* length for validation */

    line++; /* increment line count */

    /* validate line fit in buffer */
    if (len && buf[len-1] != 'n' && len == MAXC - 1)
    fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
    return 1;


    if (row == arrsz) /* check if all pointers used */
    void *tmp = realloc (data, arrsz * 2 * sizeof *data);
    if (!tmp) /* validate realloc succeeded */
    perror ("realloc-data");
    break; /* break, don't exit, data still valid */

    data = tmp; /* assign realloc'ed block to data */
    arrsz *= 2; /* update arrsz to reflect new allocation */


    /* parse buf into fields, handle error on invalid format of line */
    if (sscanf (buf, "%1023s %1023s %d %d %1023s",
    c1, c2, &c3, &c4, c5) != 5)
    fprintf (stderr, "error: invalid format line %zun", line);
    continue; /* get next line */


    /* allocate copy strings, assign allocated blocks to pointers */
    if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    data[row].col3 = c3; /* assign integer values */
    data[row].col4 = c4;
    if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    row++; /* increment number of row pointers used */

    if (fp != stdin) /* close file if not stdin */
    fclose (fp);

    puts ("values stored in structn");
    for (size_t i = 0; i < row; i++)
    printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
    data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;



    Now test.



    Example Input File



    $ cat dat/fivefields.txt
    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3


    Example Use/Output



    $ ./bin/fgets_fields <dat/fivefields.txt
    values stored in struct

    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3


    Memory Use/Error Check



    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



    It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



    $ valgrind ./bin/fgets_fields <dat/fivefields.txt
    ==1721== Memcheck, a memory error detector
    ==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
    ==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
    ==1721== Command: ./bin/fgets_fields
    ==1721==
    values stored in struct

    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3
    ==1721==
    ==1721== HEAP SUMMARY:
    ==1721== in use at exit: 0 bytes in 0 blocks
    ==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
    ==1721==
    ==1721== All heap blocks were freed -- no leaks are possible
    ==1721==
    ==1721== For counts of detected and suppressed errors, rerun with: -v
    ==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


    Always confirm that you have freed all memory you have allocated and that there are no memory errors.



    Look things over and let me know if you have further questions.






    share|improve this answer






















    • Are you sure, I line++; before I hit that block :) (didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++; and just added it above by happy-mistake.
      – David C. Rankin
      Nov 10 at 11:17











    • Hardcoding 1023 in the sscanf() calls defeats the purpose of using a define for MAXC. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023 and use %s directly.
      – chqrlie
      Nov 10 at 11:20











    • Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at MAXC the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
      – David C. Rankin
      Nov 10 at 11:23











    • I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
      – chqrlie
      Nov 10 at 11:25










    • First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with struct, and I do not understand why you pass col1, col2 and col5 as pointers to chars, while col3 and col4 as integers. As my sample text file, col4 and col5 may have varying sizes, while the others are fixed. Thanks once again.
      – Rrz0
      Nov 11 at 9:17















    up vote
    4
    down vote



    accepted










    "Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.



    For reading a fixed number of fields (in your case choosing cols 1, 2, 5 as string values of unknown length) and cols 3, 4 as simple int values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.



    An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2 or 2 or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.



    Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf and validating that 5 conversions took place by checking the sscanf return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.



    (there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)



    As discussed in the comments, reading a line of data with a line-oriented input function like fgets or POSIX getline and then parsing the data by either tokenizing with strtok, or in this case with a fixed number of fields simply parsing with sscanf is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf, but that also injects the scanf problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)



    The easiest way to approach storage of your 5-fields is to declare a simple struct. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int, e.g.



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
    #define MAXC 1024

    typedef struct
    char *col1, *col2, *col5;
    int col3, col4;
    mydata_t;


    Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8 or 16 with a doubling scheme as that will grow reasonably fast), but we have chosen 2 here with #define ARRSZ 2 to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024 for your data (don't skimp on buffer size)



    To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size) you realloc, e.g.



    int main (int argc, char **argv) 

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) /* validate file open for reading */
    perror ("file open failed");
    return 1;


    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data)))
    perror ("malloc-data");
    return 1;


    while (fgets (buf, MAXC, fp)) /* read each line from file */
    char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
    int c3, c4; /* temp ints for c3,4 */
    size_t len = strlen (buf); /* length for validation */

    line++; /* increment line count */

    /* validate line fit in buffer */
    if (len && buf[len-1] != 'n' && len == MAXC - 1)
    fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
    return 1;


    if (row == arrsz) /* check if all pointers used */
    void *tmp = realloc (data, arrsz * 2 * sizeof *data);
    if (!tmp) /* validate realloc succeeded */
    perror ("realloc-data");
    break; /* break, don't exit, data still valid */

    data = tmp; /* assign realloc'ed block to data */
    arrsz *= 2; /* update arrsz to reflect new allocation */



    (note: when calling realloc, you never realloc the pointer itself, e.g. data = realloc (data, new_size); If realloc fails (and it does), it returns NULL which would overwrite your original pointer causing a memory leak. Always realloc with a temporary pointer, validate, then assign the new block of memory to your original pointer)



    What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.



     /* parse buf into fields, handle error on invalid format of line */
    if (sscanf (buf, "%1023s %1023s %d %d %1023s",
    c1, c2, &c3, &c4, c5) != 5)
    fprintf (stderr, "error: invalid format line %zun", line);
    continue; /* get next line */


    /* allocate copy strings, assign allocated blocks to pointers */
    if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    data[row].col3 = c3; /* assign integer values */
    data[row].col4 = c4;
    if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    row++; /* increment number of row pointers used */

    if (fp != stdin) /* close file if not stdin */
    fclose (fp);

    puts ("values stored in structn");
    for (size_t i = 0; i < row; i++)
    printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
    data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;



    And we are done (except for the memory use/error check)



    Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup() You can use strdup() if you have it, I simply included the function to show you how to manually handle the malloc and copy. Note: how the copy is done with memcpy instead of strcpy -- Why? You already scanned forward in the string to find '' when you found the length with strlen -- no need to have strcpy repeat that process again -- just use memcpy.



    /* simple implementation of strdup - in the event you don't have it */
    char *mystrdup (const char *s)

    if (!s) /* validate s not NULL */
    return NULL;

    size_t len = strlen (s); /* get length */
    char *sdup = malloc (len + 1); /* allocate length + 1 */

    if (!sdup) /* validate */
    return NULL;

    return memcpy (sdup, s, len + 1); /* pointer to copied string */



    Last helper function is freemydata() which just calls free() on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc block to move that to it's own function as well)



    /* simple function to free all data when done */
    void freemydata (mydata_t *data, size_t n)

    for (size_t i = 0; i < n; i++) /* free allocated strings */
    free (data[i].col1);
    free (data[i].col2);
    free (data[i].col5);

    free (data); /* free structs */



    Putting all the pieces together would give you:



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
    #define MAXC 1024

    typedef struct
    char *col1, *col2, *col5;
    int col3, col4;
    mydata_t;

    /* simple implementation of strdup - in the event you don't have it */
    char *mystrdup (const char *s)

    if (!s) /* validate s not NULL */
    return NULL;

    size_t len = strlen (s); /* get length */
    char *sdup = malloc (len + 1); /* allocate length + 1 */

    if (!sdup) /* validate */
    return NULL;

    return memcpy (sdup, s, len + 1); /* pointer to copied string */


    /* simple function to free all data when done */
    void freemydata (mydata_t *data, size_t n)

    for (size_t i = 0; i < n; i++) /* free allocated strings */
    free (data[i].col1);
    free (data[i].col2);
    free (data[i].col5);

    free (data); /* free structs */


    int main (int argc, char **argv)

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) /* validate file open for reading */
    perror ("file open failed");
    return 1;


    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data)))
    perror ("malloc-data");
    return 1;


    while (fgets (buf, MAXC, fp)) /* read each line from file */
    char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
    int c3, c4; /* temp ints for c3,4 */
    size_t len = strlen (buf); /* length for validation */

    line++; /* increment line count */

    /* validate line fit in buffer */
    if (len && buf[len-1] != 'n' && len == MAXC - 1)
    fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
    return 1;


    if (row == arrsz) /* check if all pointers used */
    void *tmp = realloc (data, arrsz * 2 * sizeof *data);
    if (!tmp) /* validate realloc succeeded */
    perror ("realloc-data");
    break; /* break, don't exit, data still valid */

    data = tmp; /* assign realloc'ed block to data */
    arrsz *= 2; /* update arrsz to reflect new allocation */


    /* parse buf into fields, handle error on invalid format of line */
    if (sscanf (buf, "%1023s %1023s %d %d %1023s",
    c1, c2, &c3, &c4, c5) != 5)
    fprintf (stderr, "error: invalid format line %zun", line);
    continue; /* get next line */


    /* allocate copy strings, assign allocated blocks to pointers */
    if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    data[row].col3 = c3; /* assign integer values */
    data[row].col4 = c4;
    if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    row++; /* increment number of row pointers used */

    if (fp != stdin) /* close file if not stdin */
    fclose (fp);

    puts ("values stored in structn");
    for (size_t i = 0; i < row; i++)
    printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
    data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;



    Now test.



    Example Input File



    $ cat dat/fivefields.txt
    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3


    Example Use/Output



    $ ./bin/fgets_fields <dat/fivefields.txt
    values stored in struct

    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3


    Memory Use/Error Check



    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



    It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



    $ valgrind ./bin/fgets_fields <dat/fivefields.txt
    ==1721== Memcheck, a memory error detector
    ==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
    ==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
    ==1721== Command: ./bin/fgets_fields
    ==1721==
    values stored in struct

    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3
    ==1721==
    ==1721== HEAP SUMMARY:
    ==1721== in use at exit: 0 bytes in 0 blocks
    ==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
    ==1721==
    ==1721== All heap blocks were freed -- no leaks are possible
    ==1721==
    ==1721== For counts of detected and suppressed errors, rerun with: -v
    ==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


    Always confirm that you have freed all memory you have allocated and that there are no memory errors.



    Look things over and let me know if you have further questions.






    share|improve this answer






















    • Are you sure, I line++; before I hit that block :) (didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++; and just added it above by happy-mistake.
      – David C. Rankin
      Nov 10 at 11:17











    • Hardcoding 1023 in the sscanf() calls defeats the purpose of using a define for MAXC. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023 and use %s directly.
      – chqrlie
      Nov 10 at 11:20











    • Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at MAXC the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
      – David C. Rankin
      Nov 10 at 11:23











    • I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
      – chqrlie
      Nov 10 at 11:25










    • First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with struct, and I do not understand why you pass col1, col2 and col5 as pointers to chars, while col3 and col4 as integers. As my sample text file, col4 and col5 may have varying sizes, while the others are fixed. Thanks once again.
      – Rrz0
      Nov 11 at 9:17













    up vote
    4
    down vote



    accepted







    up vote
    4
    down vote



    accepted






    "Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.



    For reading a fixed number of fields (in your case choosing cols 1, 2, 5 as string values of unknown length) and cols 3, 4 as simple int values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.



    An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2 or 2 or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.



    Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf and validating that 5 conversions took place by checking the sscanf return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.



    (there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)



    As discussed in the comments, reading a line of data with a line-oriented input function like fgets or POSIX getline and then parsing the data by either tokenizing with strtok, or in this case with a fixed number of fields simply parsing with sscanf is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf, but that also injects the scanf problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)



    The easiest way to approach storage of your 5-fields is to declare a simple struct. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int, e.g.



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
    #define MAXC 1024

    typedef struct
    char *col1, *col2, *col5;
    int col3, col4;
    mydata_t;


    Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8 or 16 with a doubling scheme as that will grow reasonably fast), but we have chosen 2 here with #define ARRSZ 2 to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024 for your data (don't skimp on buffer size)



    To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size) you realloc, e.g.



    int main (int argc, char **argv) 

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) /* validate file open for reading */
    perror ("file open failed");
    return 1;


    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data)))
    perror ("malloc-data");
    return 1;


    while (fgets (buf, MAXC, fp)) /* read each line from file */
    char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
    int c3, c4; /* temp ints for c3,4 */
    size_t len = strlen (buf); /* length for validation */

    line++; /* increment line count */

    /* validate line fit in buffer */
    if (len && buf[len-1] != 'n' && len == MAXC - 1)
    fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
    return 1;


    if (row == arrsz) /* check if all pointers used */
    void *tmp = realloc (data, arrsz * 2 * sizeof *data);
    if (!tmp) /* validate realloc succeeded */
    perror ("realloc-data");
    break; /* break, don't exit, data still valid */

    data = tmp; /* assign realloc'ed block to data */
    arrsz *= 2; /* update arrsz to reflect new allocation */



    (note: when calling realloc, you never realloc the pointer itself, e.g. data = realloc (data, new_size); If realloc fails (and it does), it returns NULL which would overwrite your original pointer causing a memory leak. Always realloc with a temporary pointer, validate, then assign the new block of memory to your original pointer)



    What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.



     /* parse buf into fields, handle error on invalid format of line */
    if (sscanf (buf, "%1023s %1023s %d %d %1023s",
    c1, c2, &c3, &c4, c5) != 5)
    fprintf (stderr, "error: invalid format line %zun", line);
    continue; /* get next line */


    /* allocate copy strings, assign allocated blocks to pointers */
    if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    data[row].col3 = c3; /* assign integer values */
    data[row].col4 = c4;
    if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    row++; /* increment number of row pointers used */

    if (fp != stdin) /* close file if not stdin */
    fclose (fp);

    puts ("values stored in structn");
    for (size_t i = 0; i < row; i++)
    printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
    data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;



    And we are done (except for the memory use/error check)



    Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup() You can use strdup() if you have it, I simply included the function to show you how to manually handle the malloc and copy. Note: how the copy is done with memcpy instead of strcpy -- Why? You already scanned forward in the string to find '' when you found the length with strlen -- no need to have strcpy repeat that process again -- just use memcpy.



    /* simple implementation of strdup - in the event you don't have it */
    char *mystrdup (const char *s)

    if (!s) /* validate s not NULL */
    return NULL;

    size_t len = strlen (s); /* get length */
    char *sdup = malloc (len + 1); /* allocate length + 1 */

    if (!sdup) /* validate */
    return NULL;

    return memcpy (sdup, s, len + 1); /* pointer to copied string */



    Last helper function is freemydata() which just calls free() on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc block to move that to it's own function as well)



    /* simple function to free all data when done */
    void freemydata (mydata_t *data, size_t n)

    for (size_t i = 0; i < n; i++) /* free allocated strings */
    free (data[i].col1);
    free (data[i].col2);
    free (data[i].col5);

    free (data); /* free structs */



    Putting all the pieces together would give you:



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
    #define MAXC 1024

    typedef struct
    char *col1, *col2, *col5;
    int col3, col4;
    mydata_t;

    /* simple implementation of strdup - in the event you don't have it */
    char *mystrdup (const char *s)

    if (!s) /* validate s not NULL */
    return NULL;

    size_t len = strlen (s); /* get length */
    char *sdup = malloc (len + 1); /* allocate length + 1 */

    if (!sdup) /* validate */
    return NULL;

    return memcpy (sdup, s, len + 1); /* pointer to copied string */


    /* simple function to free all data when done */
    void freemydata (mydata_t *data, size_t n)

    for (size_t i = 0; i < n; i++) /* free allocated strings */
    free (data[i].col1);
    free (data[i].col2);
    free (data[i].col5);

    free (data); /* free structs */


    int main (int argc, char **argv)

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) /* validate file open for reading */
    perror ("file open failed");
    return 1;


    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data)))
    perror ("malloc-data");
    return 1;


    while (fgets (buf, MAXC, fp)) /* read each line from file */
    char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
    int c3, c4; /* temp ints for c3,4 */
    size_t len = strlen (buf); /* length for validation */

    line++; /* increment line count */

    /* validate line fit in buffer */
    if (len && buf[len-1] != 'n' && len == MAXC - 1)
    fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
    return 1;


    if (row == arrsz) /* check if all pointers used */
    void *tmp = realloc (data, arrsz * 2 * sizeof *data);
    if (!tmp) /* validate realloc succeeded */
    perror ("realloc-data");
    break; /* break, don't exit, data still valid */

    data = tmp; /* assign realloc'ed block to data */
    arrsz *= 2; /* update arrsz to reflect new allocation */


    /* parse buf into fields, handle error on invalid format of line */
    if (sscanf (buf, "%1023s %1023s %d %d %1023s",
    c1, c2, &c3, &c4, c5) != 5)
    fprintf (stderr, "error: invalid format line %zun", line);
    continue; /* get next line */


    /* allocate copy strings, assign allocated blocks to pointers */
    if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    data[row].col3 = c3; /* assign integer values */
    data[row].col4 = c4;
    if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    row++; /* increment number of row pointers used */

    if (fp != stdin) /* close file if not stdin */
    fclose (fp);

    puts ("values stored in structn");
    for (size_t i = 0; i < row; i++)
    printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
    data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;



    Now test.



    Example Input File



    $ cat dat/fivefields.txt
    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3


    Example Use/Output



    $ ./bin/fgets_fields <dat/fivefields.txt
    values stored in struct

    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3


    Memory Use/Error Check



    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



    It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



    $ valgrind ./bin/fgets_fields <dat/fivefields.txt
    ==1721== Memcheck, a memory error detector
    ==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
    ==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
    ==1721== Command: ./bin/fgets_fields
    ==1721==
    values stored in struct

    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3
    ==1721==
    ==1721== HEAP SUMMARY:
    ==1721== in use at exit: 0 bytes in 0 blocks
    ==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
    ==1721==
    ==1721== All heap blocks were freed -- no leaks are possible
    ==1721==
    ==1721== For counts of detected and suppressed errors, rerun with: -v
    ==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


    Always confirm that you have freed all memory you have allocated and that there are no memory errors.



    Look things over and let me know if you have further questions.






    share|improve this answer














    "Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.



    For reading a fixed number of fields (in your case choosing cols 1, 2, 5 as string values of unknown length) and cols 3, 4 as simple int values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.



    An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2 or 2 or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.



    Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf and validating that 5 conversions took place by checking the sscanf return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.



    (there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)



    As discussed in the comments, reading a line of data with a line-oriented input function like fgets or POSIX getline and then parsing the data by either tokenizing with strtok, or in this case with a fixed number of fields simply parsing with sscanf is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf, but that also injects the scanf problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)



    The easiest way to approach storage of your 5-fields is to declare a simple struct. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int, e.g.



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
    #define MAXC 1024

    typedef struct
    char *col1, *col2, *col5;
    int col3, col4;
    mydata_t;


    Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8 or 16 with a doubling scheme as that will grow reasonably fast), but we have chosen 2 here with #define ARRSZ 2 to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024 for your data (don't skimp on buffer size)



    To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size) you realloc, e.g.



    int main (int argc, char **argv) 

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) /* validate file open for reading */
    perror ("file open failed");
    return 1;


    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data)))
    perror ("malloc-data");
    return 1;


    while (fgets (buf, MAXC, fp)) /* read each line from file */
    char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
    int c3, c4; /* temp ints for c3,4 */
    size_t len = strlen (buf); /* length for validation */

    line++; /* increment line count */

    /* validate line fit in buffer */
    if (len && buf[len-1] != 'n' && len == MAXC - 1)
    fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
    return 1;


    if (row == arrsz) /* check if all pointers used */
    void *tmp = realloc (data, arrsz * 2 * sizeof *data);
    if (!tmp) /* validate realloc succeeded */
    perror ("realloc-data");
    break; /* break, don't exit, data still valid */

    data = tmp; /* assign realloc'ed block to data */
    arrsz *= 2; /* update arrsz to reflect new allocation */



    (note: when calling realloc, you never realloc the pointer itself, e.g. data = realloc (data, new_size); If realloc fails (and it does), it returns NULL which would overwrite your original pointer causing a memory leak. Always realloc with a temporary pointer, validate, then assign the new block of memory to your original pointer)



    What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.



     /* parse buf into fields, handle error on invalid format of line */
    if (sscanf (buf, "%1023s %1023s %d %d %1023s",
    c1, c2, &c3, &c4, c5) != 5)
    fprintf (stderr, "error: invalid format line %zun", line);
    continue; /* get next line */


    /* allocate copy strings, assign allocated blocks to pointers */
    if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    data[row].col3 = c3; /* assign integer values */
    data[row].col4 = c4;
    if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    row++; /* increment number of row pointers used */

    if (fp != stdin) /* close file if not stdin */
    fclose (fp);

    puts ("values stored in structn");
    for (size_t i = 0; i < row; i++)
    printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
    data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;



    And we are done (except for the memory use/error check)



    Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup() You can use strdup() if you have it, I simply included the function to show you how to manually handle the malloc and copy. Note: how the copy is done with memcpy instead of strcpy -- Why? You already scanned forward in the string to find '' when you found the length with strlen -- no need to have strcpy repeat that process again -- just use memcpy.



    /* simple implementation of strdup - in the event you don't have it */
    char *mystrdup (const char *s)

    if (!s) /* validate s not NULL */
    return NULL;

    size_t len = strlen (s); /* get length */
    char *sdup = malloc (len + 1); /* allocate length + 1 */

    if (!sdup) /* validate */
    return NULL;

    return memcpy (sdup, s, len + 1); /* pointer to copied string */



    Last helper function is freemydata() which just calls free() on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc block to move that to it's own function as well)



    /* simple function to free all data when done */
    void freemydata (mydata_t *data, size_t n)

    for (size_t i = 0; i < n; i++) /* free allocated strings */
    free (data[i].col1);
    free (data[i].col2);
    free (data[i].col5);

    free (data); /* free structs */



    Putting all the pieces together would give you:



    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>

    #define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
    #define MAXC 1024

    typedef struct
    char *col1, *col2, *col5;
    int col3, col4;
    mydata_t;

    /* simple implementation of strdup - in the event you don't have it */
    char *mystrdup (const char *s)

    if (!s) /* validate s not NULL */
    return NULL;

    size_t len = strlen (s); /* get length */
    char *sdup = malloc (len + 1); /* allocate length + 1 */

    if (!sdup) /* validate */
    return NULL;

    return memcpy (sdup, s, len + 1); /* pointer to copied string */


    /* simple function to free all data when done */
    void freemydata (mydata_t *data, size_t n)

    for (size_t i = 0; i < n; i++) /* free allocated strings */
    free (data[i].col1);
    free (data[i].col2);
    free (data[i].col5);

    free (data); /* free structs */


    int main (int argc, char **argv)

    char buf[MAXC];
    size_t arrsz = ARRSZ, line = 0, row = 0;
    mydata_t *data = NULL;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) /* validate file open for reading */
    perror ("file open failed");
    return 1;


    /* allocate an 'arrsz' initial number of struct */
    if (!(data = malloc (arrsz * sizeof *data)))
    perror ("malloc-data");
    return 1;


    while (fgets (buf, MAXC, fp)) /* read each line from file */
    char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
    int c3, c4; /* temp ints for c3,4 */
    size_t len = strlen (buf); /* length for validation */

    line++; /* increment line count */

    /* validate line fit in buffer */
    if (len && buf[len-1] != 'n' && len == MAXC - 1)
    fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
    return 1;


    if (row == arrsz) /* check if all pointers used */
    void *tmp = realloc (data, arrsz * 2 * sizeof *data);
    if (!tmp) /* validate realloc succeeded */
    perror ("realloc-data");
    break; /* break, don't exit, data still valid */

    data = tmp; /* assign realloc'ed block to data */
    arrsz *= 2; /* update arrsz to reflect new allocation */


    /* parse buf into fields, handle error on invalid format of line */
    if (sscanf (buf, "%1023s %1023s %d %d %1023s",
    c1, c2, &c3, &c4, c5) != 5)
    fprintf (stderr, "error: invalid format line %zun", line);
    continue; /* get next line */


    /* allocate copy strings, assign allocated blocks to pointers */
    if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    data[row].col3 = c3; /* assign integer values */
    data[row].col4 = c4;
    if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
    fprintf (stderr, "error: malloc-c1 line %zun", line);
    break; /* same reason to break not exit */

    row++; /* increment number of row pointers used */

    if (fp != stdin) /* close file if not stdin */
    fclose (fp);

    puts ("values stored in structn");
    for (size_t i = 0; i < row; i++)
    printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
    data[i].col3, data[i].col4, data[i].col5);

    freemydata (data, row);

    return 0;



    Now test.



    Example Input File



    $ cat dat/fivefields.txt
    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3


    Example Use/Output



    $ ./bin/fgets_fields <dat/fivefields.txt
    values stored in struct

    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3


    Memory Use/Error Check



    In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.



    It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.



    For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.



    $ valgrind ./bin/fgets_fields <dat/fivefields.txt
    ==1721== Memcheck, a memory error detector
    ==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
    ==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
    ==1721== Command: ./bin/fgets_fields
    ==1721==
    values stored in struct

    C 08902019 1020 50 Test1
    A 08902666 1040 30 Test2
    B 08902768 1060 80 Test3
    ==1721==
    ==1721== HEAP SUMMARY:
    ==1721== in use at exit: 0 bytes in 0 blocks
    ==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
    ==1721==
    ==1721== All heap blocks were freed -- no leaks are possible
    ==1721==
    ==1721== For counts of detected and suppressed errors, rerun with: -v
    ==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)


    Always confirm that you have freed all memory you have allocated and that there are no memory errors.



    Look things over and let me know if you have further questions.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 10 at 12:01

























    answered Nov 10 at 11:09









    David C. Rankin

    39.5k32546




    39.5k32546











    • Are you sure, I line++; before I hit that block :) (didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++; and just added it above by happy-mistake.
      – David C. Rankin
      Nov 10 at 11:17











    • Hardcoding 1023 in the sscanf() calls defeats the purpose of using a define for MAXC. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023 and use %s directly.
      – chqrlie
      Nov 10 at 11:20











    • Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at MAXC the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
      – David C. Rankin
      Nov 10 at 11:23











    • I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
      – chqrlie
      Nov 10 at 11:25










    • First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with struct, and I do not understand why you pass col1, col2 and col5 as pointers to chars, while col3 and col4 as integers. As my sample text file, col4 and col5 may have varying sizes, while the others are fixed. Thanks once again.
      – Rrz0
      Nov 11 at 9:17

















    • Are you sure, I line++; before I hit that block :) (didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++; and just added it above by happy-mistake.
      – David C. Rankin
      Nov 10 at 11:17











    • Hardcoding 1023 in the sscanf() calls defeats the purpose of using a define for MAXC. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023 and use %s directly.
      – chqrlie
      Nov 10 at 11:20











    • Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at MAXC the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
      – David C. Rankin
      Nov 10 at 11:23











    • I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
      – chqrlie
      Nov 10 at 11:25










    • First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with struct, and I do not understand why you pass col1, col2 and col5 as pointers to chars, while col3 and col4 as integers. As my sample text file, col4 and col5 may have varying sizes, while the others are fixed. Thanks once again.
      – Rrz0
      Nov 11 at 9:17
















    Are you sure, I line++; before I hit that block :) (didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++; and just added it above by happy-mistake.
    – David C. Rankin
    Nov 10 at 11:17





    Are you sure, I line++; before I hit that block :) (didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++; and just added it above by happy-mistake.
    – David C. Rankin
    Nov 10 at 11:17













    Hardcoding 1023 in the sscanf() calls defeats the purpose of using a define for MAXC. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023 and use %s directly.
    – chqrlie
    Nov 10 at 11:20





    Hardcoding 1023 in the sscanf() calls defeats the purpose of using a define for MAXC. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023 and use %s directly.
    – chqrlie
    Nov 10 at 11:20













    Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at MAXC the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
    – David C. Rankin
    Nov 10 at 11:23





    Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at MAXC the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
    – David C. Rankin
    Nov 10 at 11:23













    I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
    – chqrlie
    Nov 10 at 11:25




    I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
    – chqrlie
    Nov 10 at 11:25












    First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with struct, and I do not understand why you pass col1, col2 and col5 as pointers to chars, while col3 and col4 as integers. As my sample text file, col4 and col5 may have varying sizes, while the others are fixed. Thanks once again.
    – Rrz0
    Nov 11 at 9:17





    First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with struct, and I do not understand why you pass col1, col2 and col5 as pointers to chars, while col3 and col4 as integers. As my sample text file, col4 and col5 may have varying sizes, while the others are fixed. Thanks once again.
    – Rrz0
    Nov 11 at 9:17













    up vote
    1
    down vote














    I want to proceed to read only certain columns of this text file.




    You can do this with any input function: getc, fgets, sscanf, getline... but you must first define exactly what you mean by certain columns.



    • columns can be defined as separated by a specific character such as ,, ; or TAB, in which case strtok() is definitely not the right choice because it treats all sequences of separating characters as a single separator: hence a,,b would be seen as having only 2 columns.

    • if they are instead separated by whitespace, any sequence of spaces or tabs, strtok, strpbrk or strspn / strcspn might come in handy.

    In any case, you can read the file line by line with fgets but you might have a problem with very long lines. getline is a solution, but it might not be available on all systems.






    share|improve this answer






















    • Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as , or else simply white space, depending on which is 'easier' to implement.
      – Rrz0
      Nov 10 at 11:00











    • The answer here uses strtok() for both , and whitespace separated columns. Why is strtok() not a good choice for the first case you mentioned?
      – Rrz0
      Nov 10 at 11:07











    • @Rrz0: I amended the answer to explain why strtok is inappropriate for ,. strtok has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
      – chqrlie
      Nov 10 at 11:13














    up vote
    1
    down vote














    I want to proceed to read only certain columns of this text file.




    You can do this with any input function: getc, fgets, sscanf, getline... but you must first define exactly what you mean by certain columns.



    • columns can be defined as separated by a specific character such as ,, ; or TAB, in which case strtok() is definitely not the right choice because it treats all sequences of separating characters as a single separator: hence a,,b would be seen as having only 2 columns.

    • if they are instead separated by whitespace, any sequence of spaces or tabs, strtok, strpbrk or strspn / strcspn might come in handy.

    In any case, you can read the file line by line with fgets but you might have a problem with very long lines. getline is a solution, but it might not be available on all systems.






    share|improve this answer






















    • Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as , or else simply white space, depending on which is 'easier' to implement.
      – Rrz0
      Nov 10 at 11:00











    • The answer here uses strtok() for both , and whitespace separated columns. Why is strtok() not a good choice for the first case you mentioned?
      – Rrz0
      Nov 10 at 11:07











    • @Rrz0: I amended the answer to explain why strtok is inappropriate for ,. strtok has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
      – chqrlie
      Nov 10 at 11:13












    up vote
    1
    down vote










    up vote
    1
    down vote










    I want to proceed to read only certain columns of this text file.




    You can do this with any input function: getc, fgets, sscanf, getline... but you must first define exactly what you mean by certain columns.



    • columns can be defined as separated by a specific character such as ,, ; or TAB, in which case strtok() is definitely not the right choice because it treats all sequences of separating characters as a single separator: hence a,,b would be seen as having only 2 columns.

    • if they are instead separated by whitespace, any sequence of spaces or tabs, strtok, strpbrk or strspn / strcspn might come in handy.

    In any case, you can read the file line by line with fgets but you might have a problem with very long lines. getline is a solution, but it might not be available on all systems.






    share|improve this answer















    I want to proceed to read only certain columns of this text file.




    You can do this with any input function: getc, fgets, sscanf, getline... but you must first define exactly what you mean by certain columns.



    • columns can be defined as separated by a specific character such as ,, ; or TAB, in which case strtok() is definitely not the right choice because it treats all sequences of separating characters as a single separator: hence a,,b would be seen as having only 2 columns.

    • if they are instead separated by whitespace, any sequence of spaces or tabs, strtok, strpbrk or strspn / strcspn might come in handy.

    In any case, you can read the file line by line with fgets but you might have a problem with very long lines. getline is a solution, but it might not be available on all systems.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 10 at 11:11

























    answered Nov 10 at 10:52









    chqrlie

    58.3k745100




    58.3k745100











    • Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as , or else simply white space, depending on which is 'easier' to implement.
      – Rrz0
      Nov 10 at 11:00











    • The answer here uses strtok() for both , and whitespace separated columns. Why is strtok() not a good choice for the first case you mentioned?
      – Rrz0
      Nov 10 at 11:07











    • @Rrz0: I amended the answer to explain why strtok is inappropriate for ,. strtok has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
      – chqrlie
      Nov 10 at 11:13
















    • Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as , or else simply white space, depending on which is 'easier' to implement.
      – Rrz0
      Nov 10 at 11:00











    • The answer here uses strtok() for both , and whitespace separated columns. Why is strtok() not a good choice for the first case you mentioned?
      – Rrz0
      Nov 10 at 11:07











    • @Rrz0: I amended the answer to explain why strtok is inappropriate for ,. strtok has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
      – chqrlie
      Nov 10 at 11:13















    Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as , or else simply white space, depending on which is 'easier' to implement.
    – Rrz0
    Nov 10 at 11:00





    Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as , or else simply white space, depending on which is 'easier' to implement.
    – Rrz0
    Nov 10 at 11:00













    The answer here uses strtok() for both , and whitespace separated columns. Why is strtok() not a good choice for the first case you mentioned?
    – Rrz0
    Nov 10 at 11:07





    The answer here uses strtok() for both , and whitespace separated columns. Why is strtok() not a good choice for the first case you mentioned?
    – Rrz0
    Nov 10 at 11:07













    @Rrz0: I amended the answer to explain why strtok is inappropriate for ,. strtok has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
    – chqrlie
    Nov 10 at 11:13




    @Rrz0: I amended the answer to explain why strtok is inappropriate for ,. strtok has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
    – chqrlie
    Nov 10 at 11:13










    up vote
    0
    down vote













    Depending on the data and daring, you could use scanf or a parser created with yacc/lex.






    share|improve this answer
























      up vote
      0
      down vote













      Depending on the data and daring, you could use scanf or a parser created with yacc/lex.






      share|improve this answer






















        up vote
        0
        down vote










        up vote
        0
        down vote









        Depending on the data and daring, you could use scanf or a parser created with yacc/lex.






        share|improve this answer












        Depending on the data and daring, you could use scanf or a parser created with yacc/lex.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 9:03









        Igor

        555614




        555614




















            up vote
            0
            down vote













            If you know what is column separator and how many columns you have you use getline with column separator and then with line separator.



            Here is getline:



            http://man7.org/linux/man-pages/man3/getline.3.html



            It is very good because it allocates space for you, no need to know how many bytes is your column or line.



            Or you just use getline as in code example in link to read whole line then you "parse" and extract columns as you wish....



            If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(



            Or is it somehow you cannot use library?



            Although while waiting for better question I will note that you can use awk to read columns from text file but probably this is not what you want? Because what are you trying to do really?






            share|improve this answer






















            • @DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
              – Mun Dong
              Nov 10 at 8:49










            • Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
              – Rrz0
              Nov 10 at 8:50











            • @DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
              – Mun Dong
              Nov 10 at 8:51










            • @Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
              – Mun Dong
              Nov 10 at 8:52










            • You are right, will edit question.
              – Rrz0
              Nov 10 at 8:53














            up vote
            0
            down vote













            If you know what is column separator and how many columns you have you use getline with column separator and then with line separator.



            Here is getline:



            http://man7.org/linux/man-pages/man3/getline.3.html



            It is very good because it allocates space for you, no need to know how many bytes is your column or line.



            Or you just use getline as in code example in link to read whole line then you "parse" and extract columns as you wish....



            If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(



            Or is it somehow you cannot use library?



            Although while waiting for better question I will note that you can use awk to read columns from text file but probably this is not what you want? Because what are you trying to do really?






            share|improve this answer






















            • @DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
              – Mun Dong
              Nov 10 at 8:49










            • Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
              – Rrz0
              Nov 10 at 8:50











            • @DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
              – Mun Dong
              Nov 10 at 8:51










            • @Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
              – Mun Dong
              Nov 10 at 8:52










            • You are right, will edit question.
              – Rrz0
              Nov 10 at 8:53












            up vote
            0
            down vote










            up vote
            0
            down vote









            If you know what is column separator and how many columns you have you use getline with column separator and then with line separator.



            Here is getline:



            http://man7.org/linux/man-pages/man3/getline.3.html



            It is very good because it allocates space for you, no need to know how many bytes is your column or line.



            Or you just use getline as in code example in link to read whole line then you "parse" and extract columns as you wish....



            If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(



            Or is it somehow you cannot use library?



            Although while waiting for better question I will note that you can use awk to read columns from text file but probably this is not what you want? Because what are you trying to do really?






            share|improve this answer














            If you know what is column separator and how many columns you have you use getline with column separator and then with line separator.



            Here is getline:



            http://man7.org/linux/man-pages/man3/getline.3.html



            It is very good because it allocates space for you, no need to know how many bytes is your column or line.



            Or you just use getline as in code example in link to read whole line then you "parse" and extract columns as you wish....



            If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(



            Or is it somehow you cannot use library?



            Although while waiting for better question I will note that you can use awk to read columns from text file but probably this is not what you want? Because what are you trying to do really?







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 10 at 9:05

























            answered Nov 10 at 8:44









            Mun Dong

            237




            237











            • @DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
              – Mun Dong
              Nov 10 at 8:49










            • Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
              – Rrz0
              Nov 10 at 8:50











            • @DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
              – Mun Dong
              Nov 10 at 8:51










            • @Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
              – Mun Dong
              Nov 10 at 8:52










            • You are right, will edit question.
              – Rrz0
              Nov 10 at 8:53
















            • @DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
              – Mun Dong
              Nov 10 at 8:49










            • Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
              – Rrz0
              Nov 10 at 8:50











            • @DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
              – Mun Dong
              Nov 10 at 8:51










            • @Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
              – Mun Dong
              Nov 10 at 8:52










            • You are right, will edit question.
              – Rrz0
              Nov 10 at 8:53















            @DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
            – Mun Dong
            Nov 10 at 8:49




            @DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
            – Mun Dong
            Nov 10 at 8:49












            Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
            – Rrz0
            Nov 10 at 8:50





            Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
            – Rrz0
            Nov 10 at 8:50













            @DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
            – Mun Dong
            Nov 10 at 8:51




            @DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
            – Mun Dong
            Nov 10 at 8:51












            @Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
            – Mun Dong
            Nov 10 at 8:52




            @Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
            – Mun Dong
            Nov 10 at 8:52












            You are right, will edit question.
            – Rrz0
            Nov 10 at 8:53




            You are right, will edit question.
            – Rrz0
            Nov 10 at 8:53

















            draft saved

            draft discarded
















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237280%2fhow-to-read-columns-from-a-text-file-and-save-to-separate-arrays-in-c%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to how show current date and time by default on contact form 7 in WordPress without taking input from user in datetimepicker

            Syphilis

            Darth Vader #20