How to read columns from a text file and save to separate arrays in C?
up vote
1
down vote
favorite
In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.
The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok
, or fgets
or sscanf
, but when should I use one instead of the other?
Here is my commented code:
#include <stdio.h>
#include <stdlib.h>
char *read_file(char *FILE_INPUT); /*function to read file*/
int main(int argc, char **argv)
char *string; // Pointer to a char
string = read_file("file.txt");
if (string)
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);
return 0;
//Returns a pointer to a char,
char *read_file(char *FILE_INPUT)
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");
//Check if file exists
if (input_stream == NULL)
perror (FILE_INPUT);
else if (input_stream)
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);
// fread doesn't set it so put a in the last position
// and buffer is now officially a string
buffer[string_size] = '';
//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size)
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;
// Always remember to close the file.
fclose(input_stream);
return buffer;
How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.
On further research, I found that fread
is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread
is intended to do. Thus my program implementation for this kind of job is wrong.
Should I use getc
, strtok
, sscanf
or getline
to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.
EDIT:
By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.
My first thought was to replace fread
with fgets
. Update, I am getting somewhere thanks to your help.
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream))
printf("%s", buffer);
for the above text file prints:
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
B 08902768 1060 800 Test3000
I also managed to remove the newline character from fgets() input using:
strtok(buffer, "n");
Similar examples here , here and here
How can I proceed to save the columns to separate arrays?
c file pointers
|
show 8 more comments
up vote
1
down vote
favorite
In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.
The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok
, or fgets
or sscanf
, but when should I use one instead of the other?
Here is my commented code:
#include <stdio.h>
#include <stdlib.h>
char *read_file(char *FILE_INPUT); /*function to read file*/
int main(int argc, char **argv)
char *string; // Pointer to a char
string = read_file("file.txt");
if (string)
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);
return 0;
//Returns a pointer to a char,
char *read_file(char *FILE_INPUT)
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");
//Check if file exists
if (input_stream == NULL)
perror (FILE_INPUT);
else if (input_stream)
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);
// fread doesn't set it so put a in the last position
// and buffer is now officially a string
buffer[string_size] = '';
//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size)
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;
// Always remember to close the file.
fclose(input_stream);
return buffer;
How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.
On further research, I found that fread
is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread
is intended to do. Thus my program implementation for this kind of job is wrong.
Should I use getc
, strtok
, sscanf
or getline
to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.
EDIT:
By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.
My first thought was to replace fread
with fgets
. Update, I am getting somewhere thanks to your help.
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream))
printf("%s", buffer);
for the above text file prints:
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
B 08902768 1060 800 Test3000
I also managed to remove the newline character from fgets() input using:
strtok(buffer, "n");
Similar examples here , here and here
How can I proceed to save the columns to separate arrays?
c file pointers
2
"so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
– Dai
Nov 10 at 8:31
1
No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can usefseek
to jump to particular location in the file to read some bytes.
– Rishikesh Raje
Nov 10 at 8:36
Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
– Rrz0
Nov 10 at 8:38
@DavidC.Rankin understood, so the recommended way is to usefgets
?
– Rrz0
Nov 10 at 8:45
If the file is binary, then you are pretty much stuck with thestruct
approach. If it is just text, then yes,fgets
thensscanf
(or walk a pair of pointers down the line picking out what you need) Note: you can also usefgets
thenstrtok
to separate (tokenize) the fields. You can do the same thing withsscanf
using the"%n"
specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
– David C. Rankin
Nov 10 at 8:46
|
show 8 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.
The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok
, or fgets
or sscanf
, but when should I use one instead of the other?
Here is my commented code:
#include <stdio.h>
#include <stdlib.h>
char *read_file(char *FILE_INPUT); /*function to read file*/
int main(int argc, char **argv)
char *string; // Pointer to a char
string = read_file("file.txt");
if (string)
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);
return 0;
//Returns a pointer to a char,
char *read_file(char *FILE_INPUT)
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");
//Check if file exists
if (input_stream == NULL)
perror (FILE_INPUT);
else if (input_stream)
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);
// fread doesn't set it so put a in the last position
// and buffer is now officially a string
buffer[string_size] = '';
//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size)
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;
// Always remember to close the file.
fclose(input_stream);
return buffer;
How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.
On further research, I found that fread
is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread
is intended to do. Thus my program implementation for this kind of job is wrong.
Should I use getc
, strtok
, sscanf
or getline
to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.
EDIT:
By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.
My first thought was to replace fread
with fgets
. Update, I am getting somewhere thanks to your help.
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream))
printf("%s", buffer);
for the above text file prints:
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
B 08902768 1060 800 Test3000
I also managed to remove the newline character from fgets() input using:
strtok(buffer, "n");
Similar examples here , here and here
How can I proceed to save the columns to separate arrays?
c file pointers
In a practice exercise to familiarize myself with pointers, I wrote a short program in C, able to read text from a file. I would like to stick to ANSI C.
The program does its job perfectly, however I want to proceed to read columns from a text file and save to separate arrays. Similar questions have been asked, with replies using strtok
, or fgets
or sscanf
, but when should I use one instead of the other?
Here is my commented code:
#include <stdio.h>
#include <stdlib.h>
char *read_file(char *FILE_INPUT); /*function to read file*/
int main(int argc, char **argv)
char *string; // Pointer to a char
string = read_file("file.txt");
if (string)
// Writes the string pointed to by string to the stream pointed to by stdout, and appends a new-line character to the output.
puts(string);
// Causes space pointed to by string to be deallocated
free(string);
return 0;
//Returns a pointer to a char,
char *read_file(char *FILE_INPUT)
char *buffer = NULL;
int string_size, read_size;
FILE *input_stream = fopen(FILE_INPUT, "r");
//Check if file exists
if (input_stream == NULL)
perror (FILE_INPUT);
else if (input_stream)
// Seek the last byte of the file. Offset is 0 for a text file.
fseek(input_stream, 0, SEEK_END);
// Finds out the position of file pointer in the file with respect to starting of the file
// We get an idea of string_size since ftell returns the last value of the file pos
string_size = ftell(input_stream);
// sets the file position indicator for the stream to the start of the file
rewind(input_stream);
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
// Read it all in one operation, returns the number of elements successfully read,
// Reads into buffer, up to string_size whose size is specified by sizeof(char), from the input_stream !
read_size = fgets(buffer, sizeof(char), string_size, input_stream);
// fread doesn't set it so put a in the last position
// and buffer is now officially a string
buffer[string_size] = '';
//string_size determined by ftell should be equal to read_size from fread
if (string_size != read_size)
// Something went wrong, throw away the memory and set
// the buffer to NULL
free(buffer);
buffer = NULL;
// Always remember to close the file.
fclose(input_stream);
return buffer;
How can I read all columns from a text file of this format, into an array? Number of columns is fixed, but number of rows can vary.
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
.
B 08902768 1060 800 Test3000
.
.
On further research, I found that fread
is used to allow a program to read and write large blocks of data in a single step, so reading columns separately may not be what fread
is intended to do. Thus my program implementation for this kind of job is wrong.
Should I use getc
, strtok
, sscanf
or getline
to read such a text file? I am trying to stick to good programming principles and allocate memory dynamically.
EDIT:
By correct I am mean (but not limited to) using good c programming techniques and dynamic memory allocation.
My first thought was to replace fread
with fgets
. Update, I am getting somewhere thanks to your help.
// Allocate a string that can hold it all
// malloc returns a pointer to a char, +1 to hold the NULL character
// (char*) is the cast return type, this is extra, used for humans
buffer = (char*)malloc(sizeof(char) * (string_size + 1));
while (fgets(buffer, sizeof(char) * (string_size + 1), input_stream), input_stream))
printf("%s", buffer);
for the above text file prints:
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
B 08902768 1060 800 Test3000
I also managed to remove the newline character from fgets() input using:
strtok(buffer, "n");
Similar examples here , here and here
How can I proceed to save the columns to separate arrays?
c file pointers
c file pointers
edited Nov 10 at 10:58
asked Nov 10 at 8:29
Rrz0
467518
467518
2
"so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
– Dai
Nov 10 at 8:31
1
No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can usefseek
to jump to particular location in the file to read some bytes.
– Rishikesh Raje
Nov 10 at 8:36
Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
– Rrz0
Nov 10 at 8:38
@DavidC.Rankin understood, so the recommended way is to usefgets
?
– Rrz0
Nov 10 at 8:45
If the file is binary, then you are pretty much stuck with thestruct
approach. If it is just text, then yes,fgets
thensscanf
(or walk a pair of pointers down the line picking out what you need) Note: you can also usefgets
thenstrtok
to separate (tokenize) the fields. You can do the same thing withsscanf
using the"%n"
specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
– David C. Rankin
Nov 10 at 8:46
|
show 8 more comments
2
"so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
– Dai
Nov 10 at 8:31
1
No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can usefseek
to jump to particular location in the file to read some bytes.
– Rishikesh Raje
Nov 10 at 8:36
Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
– Rrz0
Nov 10 at 8:38
@DavidC.Rankin understood, so the recommended way is to usefgets
?
– Rrz0
Nov 10 at 8:45
If the file is binary, then you are pretty much stuck with thestruct
approach. If it is just text, then yes,fgets
thensscanf
(or walk a pair of pointers down the line picking out what you need) Note: you can also usefgets
thenstrtok
to separate (tokenize) the fields. You can do the same thing withsscanf
using the"%n"
specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.
– David C. Rankin
Nov 10 at 8:46
2
2
"so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
– Dai
Nov 10 at 8:31
"so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
– Dai
Nov 10 at 8:31
1
1
No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use
fseek
to jump to particular location in the file to read some bytes.– Rishikesh Raje
Nov 10 at 8:36
No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use
fseek
to jump to particular location in the file to read some bytes.– Rishikesh Raje
Nov 10 at 8:36
Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
– Rrz0
Nov 10 at 8:38
Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
– Rrz0
Nov 10 at 8:38
@DavidC.Rankin understood, so the recommended way is to use
fgets
?– Rrz0
Nov 10 at 8:45
@DavidC.Rankin understood, so the recommended way is to use
fgets
?– Rrz0
Nov 10 at 8:45
If the file is binary, then you are pretty much stuck with the
struct
approach. If it is just text, then yes, fgets
then sscanf
(or walk a pair of pointers down the line picking out what you need) Note: you can also use fgets
then strtok
to separate (tokenize) the fields. You can do the same thing with sscanf
using the "%n"
specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.– David C. Rankin
Nov 10 at 8:46
If the file is binary, then you are pretty much stuck with the
struct
approach. If it is just text, then yes, fgets
then sscanf
(or walk a pair of pointers down the line picking out what you need) Note: you can also use fgets
then strtok
to separate (tokenize) the fields. You can do the same thing with sscanf
using the "%n"
specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.– David C. Rankin
Nov 10 at 8:46
|
show 8 more comments
4 Answers
4
active
oldest
votes
up vote
4
down vote
accepted
"Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.
For reading a fixed number of fields (in your case choosing cols 1, 2, 5
as string values of unknown length) and cols 3, 4
as simple int
values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.
An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc
for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2
or 2
or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.
Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf
and validating that 5 conversions took place by checking the sscanf
return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.
(there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)
As discussed in the comments, reading a line of data with a line-oriented input function like fgets
or POSIX getline
and then parsing the data by either tokenizing with strtok
, or in this case with a fixed number of fields simply parsing with sscanf
is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf
, but that also injects the scanf
problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)
The easiest way to approach storage of your 5-fields is to declare a simple struct
. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int
, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8
or 16
with a doubling scheme as that will grow reasonably fast), but we have chosen 2
here with #define ARRSZ 2
to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024
for your data (don't skimp on buffer size)
To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size)
you realloc
, e.g.
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
(note: when calling realloc
, you never realloc the pointer itself, e.g. data = realloc (data, new_size);
If realloc
fails (and it does), it returns NULL
which would overwrite your original pointer causing a memory leak. Always realloc
with a temporary pointer, validate, then assign the new block of memory to your original pointer)
What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
And we are done (except for the memory use/error check)
Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup()
You can use strdup()
if you have it, I simply included the function to show you how to manually handle the malloc
and copy. Note: how the copy is done with memcpy
instead of strcpy
-- Why? You already scanned forward in the string to find ''
when you found the length with strlen
-- no need to have strcpy
repeat that process again -- just use memcpy
.
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
Last helper function is freemydata()
which just calls free()
on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc
block to move that to it's own function as well)
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
Putting all the pieces together would give you:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
Now test.
Example Input File
$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Example Use/Output
$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721== in use at exit: 0 bytes in 0 blocks
==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
Are you sure, Iline++;
before I hit that block:)
(didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before theline++;
and just added it above by happy-mistake.
– David C. Rankin
Nov 10 at 11:17
Hardcoding1023
in thesscanf()
calls defeats the purpose of using a define forMAXC
. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the1023
and use%s
directly.
– chqrlie
Nov 10 at 11:20
Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized atMAXC
the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks:)
– David C. Rankin
Nov 10 at 11:23
I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
– chqrlie
Nov 10 at 11:25
First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time withstruct
, and I do not understand why you passcol1
,col2
andcol5
as pointers to chars, whilecol3
andcol4
as integers. As my sample text file,col4
andcol5
may have varying sizes, while the others are fixed. Thanks once again.
– Rrz0
Nov 11 at 9:17
|
show 1 more comment
up vote
1
down vote
I want to proceed to read only certain columns of this text file.
You can do this with any input function: getc
, fgets
, sscanf
, getline
... but you must first define exactly what you mean by certain columns.
- columns can be defined as separated by a specific character such as
,
,;
or TAB, in which casestrtok()
is definitely not the right choice because it treats all sequences of separating characters as a single separator: hencea,,b
would be seen as having only 2 columns. - if they are instead separated by whitespace, any sequence of spaces or tabs,
strtok
,strpbrk
orstrspn
/strcspn
might come in handy.
In any case, you can read the file line by line with fgets
but you might have a problem with very long lines. getline
is a solution, but it might not be available on all systems.
Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as,
or else simply white space, depending on which is 'easier' to implement.
– Rrz0
Nov 10 at 11:00
The answer here usesstrtok()
for both,
and whitespace separated columns. Why isstrtok()
not a good choice for the first case you mentioned?
– Rrz0
Nov 10 at 11:07
@Rrz0: I amended the answer to explain whystrtok
is inappropriate for,
.strtok
has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
– chqrlie
Nov 10 at 11:13
add a comment |
up vote
0
down vote
Depending on the data and daring, you could use scanf
or a parser created with yacc/lex.
add a comment |
up vote
0
down vote
If you know what is column separator and how many columns you have you use getline
with column separator and then with line separator.
Here is getline
:
http://man7.org/linux/man-pages/man3/getline.3.html
It is very good because it allocates space for you, no need to know how many bytes is your column or line.
Or you just use getline
as in code example in link to read whole line then you "parse" and extract columns as you wish....
If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(
Or is it somehow you cannot use library?
Although while waiting for better question I will note that you can use awk
to read columns from text file but probably this is not what you want? Because what are you trying to do really?
@DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
– Mun Dong
Nov 10 at 8:49
Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
– Rrz0
Nov 10 at 8:50
@DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
– Mun Dong
Nov 10 at 8:51
@Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
– Mun Dong
Nov 10 at 8:52
You are right, will edit question.
– Rrz0
Nov 10 at 8:53
|
show 13 more comments
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
"Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.
For reading a fixed number of fields (in your case choosing cols 1, 2, 5
as string values of unknown length) and cols 3, 4
as simple int
values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.
An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc
for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2
or 2
or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.
Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf
and validating that 5 conversions took place by checking the sscanf
return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.
(there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)
As discussed in the comments, reading a line of data with a line-oriented input function like fgets
or POSIX getline
and then parsing the data by either tokenizing with strtok
, or in this case with a fixed number of fields simply parsing with sscanf
is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf
, but that also injects the scanf
problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)
The easiest way to approach storage of your 5-fields is to declare a simple struct
. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int
, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8
or 16
with a doubling scheme as that will grow reasonably fast), but we have chosen 2
here with #define ARRSZ 2
to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024
for your data (don't skimp on buffer size)
To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size)
you realloc
, e.g.
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
(note: when calling realloc
, you never realloc the pointer itself, e.g. data = realloc (data, new_size);
If realloc
fails (and it does), it returns NULL
which would overwrite your original pointer causing a memory leak. Always realloc
with a temporary pointer, validate, then assign the new block of memory to your original pointer)
What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
And we are done (except for the memory use/error check)
Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup()
You can use strdup()
if you have it, I simply included the function to show you how to manually handle the malloc
and copy. Note: how the copy is done with memcpy
instead of strcpy
-- Why? You already scanned forward in the string to find ''
when you found the length with strlen
-- no need to have strcpy
repeat that process again -- just use memcpy
.
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
Last helper function is freemydata()
which just calls free()
on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc
block to move that to it's own function as well)
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
Putting all the pieces together would give you:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
Now test.
Example Input File
$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Example Use/Output
$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721== in use at exit: 0 bytes in 0 blocks
==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
Are you sure, Iline++;
before I hit that block:)
(didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before theline++;
and just added it above by happy-mistake.
– David C. Rankin
Nov 10 at 11:17
Hardcoding1023
in thesscanf()
calls defeats the purpose of using a define forMAXC
. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the1023
and use%s
directly.
– chqrlie
Nov 10 at 11:20
Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized atMAXC
the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks:)
– David C. Rankin
Nov 10 at 11:23
I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
– chqrlie
Nov 10 at 11:25
First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time withstruct
, and I do not understand why you passcol1
,col2
andcol5
as pointers to chars, whilecol3
andcol4
as integers. As my sample text file,col4
andcol5
may have varying sizes, while the others are fixed. Thanks once again.
– Rrz0
Nov 11 at 9:17
|
show 1 more comment
up vote
4
down vote
accepted
"Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.
For reading a fixed number of fields (in your case choosing cols 1, 2, 5
as string values of unknown length) and cols 3, 4
as simple int
values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.
An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc
for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2
or 2
or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.
Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf
and validating that 5 conversions took place by checking the sscanf
return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.
(there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)
As discussed in the comments, reading a line of data with a line-oriented input function like fgets
or POSIX getline
and then parsing the data by either tokenizing with strtok
, or in this case with a fixed number of fields simply parsing with sscanf
is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf
, but that also injects the scanf
problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)
The easiest way to approach storage of your 5-fields is to declare a simple struct
. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int
, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8
or 16
with a doubling scheme as that will grow reasonably fast), but we have chosen 2
here with #define ARRSZ 2
to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024
for your data (don't skimp on buffer size)
To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size)
you realloc
, e.g.
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
(note: when calling realloc
, you never realloc the pointer itself, e.g. data = realloc (data, new_size);
If realloc
fails (and it does), it returns NULL
which would overwrite your original pointer causing a memory leak. Always realloc
with a temporary pointer, validate, then assign the new block of memory to your original pointer)
What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
And we are done (except for the memory use/error check)
Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup()
You can use strdup()
if you have it, I simply included the function to show you how to manually handle the malloc
and copy. Note: how the copy is done with memcpy
instead of strcpy
-- Why? You already scanned forward in the string to find ''
when you found the length with strlen
-- no need to have strcpy
repeat that process again -- just use memcpy
.
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
Last helper function is freemydata()
which just calls free()
on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc
block to move that to it's own function as well)
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
Putting all the pieces together would give you:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
Now test.
Example Input File
$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Example Use/Output
$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721== in use at exit: 0 bytes in 0 blocks
==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
Are you sure, Iline++;
before I hit that block:)
(didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before theline++;
and just added it above by happy-mistake.
– David C. Rankin
Nov 10 at 11:17
Hardcoding1023
in thesscanf()
calls defeats the purpose of using a define forMAXC
. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the1023
and use%s
directly.
– chqrlie
Nov 10 at 11:20
Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized atMAXC
the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks:)
– David C. Rankin
Nov 10 at 11:23
I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
– chqrlie
Nov 10 at 11:25
First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time withstruct
, and I do not understand why you passcol1
,col2
andcol5
as pointers to chars, whilecol3
andcol4
as integers. As my sample text file,col4
andcol5
may have varying sizes, while the others are fixed. Thanks once again.
– Rrz0
Nov 11 at 9:17
|
show 1 more comment
up vote
4
down vote
accepted
up vote
4
down vote
accepted
"Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.
For reading a fixed number of fields (in your case choosing cols 1, 2, 5
as string values of unknown length) and cols 3, 4
as simple int
values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.
An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc
for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2
or 2
or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.
Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf
and validating that 5 conversions took place by checking the sscanf
return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.
(there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)
As discussed in the comments, reading a line of data with a line-oriented input function like fgets
or POSIX getline
and then parsing the data by either tokenizing with strtok
, or in this case with a fixed number of fields simply parsing with sscanf
is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf
, but that also injects the scanf
problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)
The easiest way to approach storage of your 5-fields is to declare a simple struct
. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int
, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8
or 16
with a doubling scheme as that will grow reasonably fast), but we have chosen 2
here with #define ARRSZ 2
to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024
for your data (don't skimp on buffer size)
To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size)
you realloc
, e.g.
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
(note: when calling realloc
, you never realloc the pointer itself, e.g. data = realloc (data, new_size);
If realloc
fails (and it does), it returns NULL
which would overwrite your original pointer causing a memory leak. Always realloc
with a temporary pointer, validate, then assign the new block of memory to your original pointer)
What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
And we are done (except for the memory use/error check)
Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup()
You can use strdup()
if you have it, I simply included the function to show you how to manually handle the malloc
and copy. Note: how the copy is done with memcpy
instead of strcpy
-- Why? You already scanned forward in the string to find ''
when you found the length with strlen
-- no need to have strcpy
repeat that process again -- just use memcpy
.
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
Last helper function is freemydata()
which just calls free()
on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc
block to move that to it's own function as well)
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
Putting all the pieces together would give you:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
Now test.
Example Input File
$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Example Use/Output
$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721== in use at exit: 0 bytes in 0 blocks
==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
"Best Practices" is somewhat subjective, but "fully validated, logical and readable" should always be the goal.
For reading a fixed number of fields (in your case choosing cols 1, 2, 5
as string values of unknown length) and cols 3, 4
as simple int
values), you can read an unknown number of rows from a file simply by allocating storage for some reasonably anticipated number of rows of data, keeping track of how many rows are filled, and then reallocating storage, as required, when you reach the limit of the storage you have allocated.
An efficient way of handling the reallocation is to reallocate by some reasonable number of additional blocks of memory when reallocation is required (rather than making calls to realloc
for every additional row of data). You can either add a fixed number of new blocks, multiply what you have by 3/2
or 2
or some other sane scheme that meets your needs. I generally just double the storage each time the allocation limit is reached.
Since you have a fixed number of fields of unknown size, you can make things easy by simply separating the five-fields with sscanf
and validating that 5 conversions took place by checking the sscanf
return. If you were reading an unknown number of fields, then you would just use the same reallocation scheme to handle the column-wise expansion discussed above for reading an unknown number of rows.
(there is no requirement that any row have the same number of fields in that case, but you can force a check by setting a variable containing the number of fields read with the first row, and then validating that all subsequent rows have that same number...)
As discussed in the comments, reading a line of data with a line-oriented input function like fgets
or POSIX getline
and then parsing the data by either tokenizing with strtok
, or in this case with a fixed number of fields simply parsing with sscanf
is a solid approach. It provides the benefit of allowing independent validation of (1) the read of data from the file; and (2) the parse of data into the needed values. (while less flexible, for some data sets, you can do this in a single step with fscanf
, but that also injects the scanf
problems for user input of what remains unread in the input buffer depending on the conversion-specifiers used...)
The easiest way to approach storage of your 5-fields is to declare a simple struct
. Since the number of characters for each of the character fields is unknown, the struct members for each of these fields will be a character pointer, and the remaining fields int
, e.g.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
Now you can start your allocation for handling an unknown number of these by allocating for some reasonably anticipated amount (I would generally use 8
or 16
with a doubling scheme as that will grow reasonably fast), but we have chosen 2
here with #define ARRSZ 2
to make sure we force one reallocation when handling your 3-line data file. Note also we are setting a maximum number of characters per-line of #define MAXC 1024
for your data (don't skimp on buffer size)
To get started, all we need to do is declare a buffer to hold each line, and a few variables to track the currently allocated number of structs, a line counter (to output accurate error messages) and a counter for the number of rows of data we have filled. Then when (rows_filled == allocated_array_size)
you realloc
, e.g.
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
(note: when calling realloc
, you never realloc the pointer itself, e.g. data = realloc (data, new_size);
If realloc
fails (and it does), it returns NULL
which would overwrite your original pointer causing a memory leak. Always realloc
with a temporary pointer, validate, then assign the new block of memory to your original pointer)
What remains is just splitting the line into our values, handling any errors in line format, adding our field values to our array of struct, increment our row/line counts and repeating until we run out of data to read, e.g.
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
And we are done (except for the memory use/error check)
Note about there are two helper functions to allocate for each string and copy each string to its allocated block of memory and assigning the starting address for that block to our pointer in our struct. mystrdup()
You can use strdup()
if you have it, I simply included the function to show you how to manually handle the malloc
and copy. Note: how the copy is done with memcpy
instead of strcpy
-- Why? You already scanned forward in the string to find ''
when you found the length with strlen
-- no need to have strcpy
repeat that process again -- just use memcpy
.
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
Last helper function is freemydata()
which just calls free()
on each allocated block to insure you free all the memory you have allocated. It also keeps you code tidy. (you can do the same for the realloc
block to move that to it's own function as well)
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
Putting all the pieces together would give you:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define ARRSZ 2 /* use 8 or more, set to 2 here to force realloc */
#define MAXC 1024
typedef struct
char *col1, *col2, *col5;
int col3, col4;
mydata_t;
/* simple implementation of strdup - in the event you don't have it */
char *mystrdup (const char *s)
if (!s) /* validate s not NULL */
return NULL;
size_t len = strlen (s); /* get length */
char *sdup = malloc (len + 1); /* allocate length + 1 */
if (!sdup) /* validate */
return NULL;
return memcpy (sdup, s, len + 1); /* pointer to copied string */
/* simple function to free all data when done */
void freemydata (mydata_t *data, size_t n)
for (size_t i = 0; i < n; i++) /* free allocated strings */
free (data[i].col1);
free (data[i].col2);
free (data[i].col5);
free (data); /* free structs */
int main (int argc, char **argv)
char buf[MAXC];
size_t arrsz = ARRSZ, line = 0, row = 0;
mydata_t *data = NULL;
/* use filename provided as 1st argument (stdin by default) */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) /* validate file open for reading */
perror ("file open failed");
return 1;
/* allocate an 'arrsz' initial number of struct */
if (!(data = malloc (arrsz * sizeof *data)))
perror ("malloc-data");
return 1;
while (fgets (buf, MAXC, fp)) /* read each line from file */
char c1[MAXC], c2[MAXC], c5[MAXC]; /* temp strings for c1,2,5 */
int c3, c4; /* temp ints for c3,4 */
size_t len = strlen (buf); /* length for validation */
line++; /* increment line count */
/* validate line fit in buffer */
if (len && buf[len-1] != 'n' && len == MAXC - 1)
fprintf (stderr, "error: line %zu exceeds MAXC chars.n", line);
return 1;
if (row == arrsz) /* check if all pointers used */
void *tmp = realloc (data, arrsz * 2 * sizeof *data);
if (!tmp) /* validate realloc succeeded */
perror ("realloc-data");
break; /* break, don't exit, data still valid */
data = tmp; /* assign realloc'ed block to data */
arrsz *= 2; /* update arrsz to reflect new allocation */
/* parse buf into fields, handle error on invalid format of line */
if (sscanf (buf, "%1023s %1023s %d %d %1023s",
c1, c2, &c3, &c4, c5) != 5)
fprintf (stderr, "error: invalid format line %zun", line);
continue; /* get next line */
/* allocate copy strings, assign allocated blocks to pointers */
if (!(data[row].col1 = mystrdup (c1))) /* validate copy of c1 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
if (!(data[row].col2 = mystrdup (c2))) /* validate copy of c2 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
data[row].col3 = c3; /* assign integer values */
data[row].col4 = c4;
if (!(data[row].col5 = mystrdup (c5))) /* validate copy of c5 */
fprintf (stderr, "error: malloc-c1 line %zun", line);
break; /* same reason to break not exit */
row++; /* increment number of row pointers used */
if (fp != stdin) /* close file if not stdin */
fclose (fp);
puts ("values stored in structn");
for (size_t i = 0; i < row; i++)
printf ("%-4s %-10s %4d %4d %sn", data[i].col1, data[i].col2,
data[i].col3, data[i].col4, data[i].col5);
freemydata (data, row);
return 0;
Now test.
Example Input File
$ cat dat/fivefields.txt
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Example Use/Output
$ ./bin/fgets_fields <dat/fivefields.txt
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
Memory Use/Error Check
In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.
It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.
For Linux valgrind
is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.
$ valgrind ./bin/fgets_fields <dat/fivefields.txt
==1721== Memcheck, a memory error detector
==1721== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==1721== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==1721== Command: ./bin/fgets_fields
==1721==
values stored in struct
C 08902019 1020 50 Test1
A 08902666 1040 30 Test2
B 08902768 1060 80 Test3
==1721==
==1721== HEAP SUMMARY:
==1721== in use at exit: 0 bytes in 0 blocks
==1721== total heap usage: 11 allocs, 11 frees, 243 bytes allocated
==1721==
==1721== All heap blocks were freed -- no leaks are possible
==1721==
==1721== For counts of detected and suppressed errors, rerun with: -v
==1721== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Always confirm that you have freed all memory you have allocated and that there are no memory errors.
Look things over and let me know if you have further questions.
edited Nov 10 at 12:01
answered Nov 10 at 11:09
David C. Rankin
39.5k32546
39.5k32546
Are you sure, Iline++;
before I hit that block:)
(didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before theline++;
and just added it above by happy-mistake.
– David C. Rankin
Nov 10 at 11:17
Hardcoding1023
in thesscanf()
calls defeats the purpose of using a define forMAXC
. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the1023
and use%s
directly.
– chqrlie
Nov 10 at 11:20
Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized atMAXC
the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks:)
– David C. Rankin
Nov 10 at 11:23
I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
– chqrlie
Nov 10 at 11:25
First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time withstruct
, and I do not understand why you passcol1
,col2
andcol5
as pointers to chars, whilecol3
andcol4
as integers. As my sample text file,col4
andcol5
may have varying sizes, while the others are fixed. Thanks once again.
– Rrz0
Nov 11 at 9:17
|
show 1 more comment
Are you sure, Iline++;
before I hit that block:)
(didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before theline++;
and just added it above by happy-mistake.
– David C. Rankin
Nov 10 at 11:17
Hardcoding1023
in thesscanf()
calls defeats the purpose of using a define forMAXC
. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the1023
and use%s
directly.
– chqrlie
Nov 10 at 11:20
Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized atMAXC
the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks:)
– David C. Rankin
Nov 10 at 11:23
I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
– chqrlie
Nov 10 at 11:25
First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time withstruct
, and I do not understand why you passcol1
,col2
andcol5
as pointers to chars, whilecol3
andcol4
as integers. As my sample text file,col4
andcol5
may have varying sizes, while the others are fixed. Thanks once again.
– Rrz0
Nov 11 at 9:17
Are you sure, I
line++;
before I hit that block :)
(didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++;
and just added it above by happy-mistake.– David C. Rankin
Nov 10 at 11:17
Are you sure, I
line++;
before I hit that block :)
(didn't I -- I'll check) Right you are! Fixed - thanks - I came back and added the fit-check before the line++;
and just added it above by happy-mistake.– David C. Rankin
Nov 10 at 11:17
Hardcoding
1023
in the sscanf()
calls defeats the purpose of using a define for MAXC
. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023
and use %s
directly.– chqrlie
Nov 10 at 11:20
Hardcoding
1023
in the sscanf()
calls defeats the purpose of using a define for MAXC
. The API is somewhat broken, as we all know, but since you define the target arrays with the proper size, you can omit the 1023
and use %s
directly.– chqrlie
Nov 10 at 11:20
Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at
MAXC
the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
– David C. Rankin
Nov 10 at 11:23
Yes, but that too was just by chance -- I harp on always using the field-width modifier to I tend to always include it in the examples, but in this case I also declared 3-temporary buffers of that size making the field-width modifier redundant and since sized at
MAXC
the individual content could never exceed that. Good eyes. I am getting to old to code this late without a few ticks :)
– David C. Rankin
Nov 10 at 11:23
I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
– chqrlie
Nov 10 at 11:25
I'm afraid I am just as old, but I have an unfair advantage: it is 12:24pm over here.
– chqrlie
Nov 10 at 11:25
First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with
struct
, and I do not understand why you pass col1
, col2
and col5
as pointers to chars, while col3
and col4
as integers. As my sample text file, col4
and col5
may have varying sizes, while the others are fixed. Thanks once again.– Rrz0
Nov 11 at 9:17
First of all, thanks @DavidC.Rankin, for the detailed response, and the working code for me to play around with. This is my first time with
struct
, and I do not understand why you pass col1
, col2
and col5
as pointers to chars, while col3
and col4
as integers. As my sample text file, col4
and col5
may have varying sizes, while the others are fixed. Thanks once again.– Rrz0
Nov 11 at 9:17
|
show 1 more comment
up vote
1
down vote
I want to proceed to read only certain columns of this text file.
You can do this with any input function: getc
, fgets
, sscanf
, getline
... but you must first define exactly what you mean by certain columns.
- columns can be defined as separated by a specific character such as
,
,;
or TAB, in which casestrtok()
is definitely not the right choice because it treats all sequences of separating characters as a single separator: hencea,,b
would be seen as having only 2 columns. - if they are instead separated by whitespace, any sequence of spaces or tabs,
strtok
,strpbrk
orstrspn
/strcspn
might come in handy.
In any case, you can read the file line by line with fgets
but you might have a problem with very long lines. getline
is a solution, but it might not be available on all systems.
Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as,
or else simply white space, depending on which is 'easier' to implement.
– Rrz0
Nov 10 at 11:00
The answer here usesstrtok()
for both,
and whitespace separated columns. Why isstrtok()
not a good choice for the first case you mentioned?
– Rrz0
Nov 10 at 11:07
@Rrz0: I amended the answer to explain whystrtok
is inappropriate for,
.strtok
has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
– chqrlie
Nov 10 at 11:13
add a comment |
up vote
1
down vote
I want to proceed to read only certain columns of this text file.
You can do this with any input function: getc
, fgets
, sscanf
, getline
... but you must first define exactly what you mean by certain columns.
- columns can be defined as separated by a specific character such as
,
,;
or TAB, in which casestrtok()
is definitely not the right choice because it treats all sequences of separating characters as a single separator: hencea,,b
would be seen as having only 2 columns. - if they are instead separated by whitespace, any sequence of spaces or tabs,
strtok
,strpbrk
orstrspn
/strcspn
might come in handy.
In any case, you can read the file line by line with fgets
but you might have a problem with very long lines. getline
is a solution, but it might not be available on all systems.
Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as,
or else simply white space, depending on which is 'easier' to implement.
– Rrz0
Nov 10 at 11:00
The answer here usesstrtok()
for both,
and whitespace separated columns. Why isstrtok()
not a good choice for the first case you mentioned?
– Rrz0
Nov 10 at 11:07
@Rrz0: I amended the answer to explain whystrtok
is inappropriate for,
.strtok
has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
– chqrlie
Nov 10 at 11:13
add a comment |
up vote
1
down vote
up vote
1
down vote
I want to proceed to read only certain columns of this text file.
You can do this with any input function: getc
, fgets
, sscanf
, getline
... but you must first define exactly what you mean by certain columns.
- columns can be defined as separated by a specific character such as
,
,;
or TAB, in which casestrtok()
is definitely not the right choice because it treats all sequences of separating characters as a single separator: hencea,,b
would be seen as having only 2 columns. - if they are instead separated by whitespace, any sequence of spaces or tabs,
strtok
,strpbrk
orstrspn
/strcspn
might come in handy.
In any case, you can read the file line by line with fgets
but you might have a problem with very long lines. getline
is a solution, but it might not be available on all systems.
I want to proceed to read only certain columns of this text file.
You can do this with any input function: getc
, fgets
, sscanf
, getline
... but you must first define exactly what you mean by certain columns.
- columns can be defined as separated by a specific character such as
,
,;
or TAB, in which casestrtok()
is definitely not the right choice because it treats all sequences of separating characters as a single separator: hencea,,b
would be seen as having only 2 columns. - if they are instead separated by whitespace, any sequence of spaces or tabs,
strtok
,strpbrk
orstrspn
/strcspn
might come in handy.
In any case, you can read the file line by line with fgets
but you might have a problem with very long lines. getline
is a solution, but it might not be available on all systems.
edited Nov 10 at 11:11
answered Nov 10 at 10:52
chqrlie
58.3k745100
58.3k745100
Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as,
or else simply white space, depending on which is 'easier' to implement.
– Rrz0
Nov 10 at 11:00
The answer here usesstrtok()
for both,
and whitespace separated columns. Why isstrtok()
not a good choice for the first case you mentioned?
– Rrz0
Nov 10 at 11:07
@Rrz0: I amended the answer to explain whystrtok
is inappropriate for,
.strtok
has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
– chqrlie
Nov 10 at 11:13
add a comment |
Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as,
or else simply white space, depending on which is 'easier' to implement.
– Rrz0
Nov 10 at 11:00
The answer here usesstrtok()
for both,
and whitespace separated columns. Why isstrtok()
not a good choice for the first case you mentioned?
– Rrz0
Nov 10 at 11:07
@Rrz0: I amended the answer to explain whystrtok
is inappropriate for,
.strtok
has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.
– chqrlie
Nov 10 at 11:13
Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as
,
or else simply white space, depending on which is 'easier' to implement.– Rrz0
Nov 10 at 11:00
Thanks for your helpful answer. I edited the question. I want to read all columns, and each column into separate arrays. I am open to use either specific characters such as
,
or else simply white space, depending on which is 'easier' to implement.– Rrz0
Nov 10 at 11:00
The answer here uses
strtok()
for both ,
and whitespace separated columns. Why is strtok()
not a good choice for the first case you mentioned?– Rrz0
Nov 10 at 11:07
The answer here uses
strtok()
for both ,
and whitespace separated columns. Why is strtok()
not a good choice for the first case you mentioned?– Rrz0
Nov 10 at 11:07
@Rrz0: I amended the answer to explain why
strtok
is inappropriate for ,
. strtok
has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.– chqrlie
Nov 10 at 11:13
@Rrz0: I amended the answer to explain why
strtok
is inappropriate for ,
. strtok
has other issues such as non-reentrancy that make it a poor candidate for most parsing tasks.– chqrlie
Nov 10 at 11:13
add a comment |
up vote
0
down vote
Depending on the data and daring, you could use scanf
or a parser created with yacc/lex.
add a comment |
up vote
0
down vote
Depending on the data and daring, you could use scanf
or a parser created with yacc/lex.
add a comment |
up vote
0
down vote
up vote
0
down vote
Depending on the data and daring, you could use scanf
or a parser created with yacc/lex.
Depending on the data and daring, you could use scanf
or a parser created with yacc/lex.
answered Nov 10 at 9:03
Igor
555614
555614
add a comment |
add a comment |
up vote
0
down vote
If you know what is column separator and how many columns you have you use getline
with column separator and then with line separator.
Here is getline
:
http://man7.org/linux/man-pages/man3/getline.3.html
It is very good because it allocates space for you, no need to know how many bytes is your column or line.
Or you just use getline
as in code example in link to read whole line then you "parse" and extract columns as you wish....
If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(
Or is it somehow you cannot use library?
Although while waiting for better question I will note that you can use awk
to read columns from text file but probably this is not what you want? Because what are you trying to do really?
@DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
– Mun Dong
Nov 10 at 8:49
Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
– Rrz0
Nov 10 at 8:50
@DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
– Mun Dong
Nov 10 at 8:51
@Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
– Mun Dong
Nov 10 at 8:52
You are right, will edit question.
– Rrz0
Nov 10 at 8:53
|
show 13 more comments
up vote
0
down vote
If you know what is column separator and how many columns you have you use getline
with column separator and then with line separator.
Here is getline
:
http://man7.org/linux/man-pages/man3/getline.3.html
It is very good because it allocates space for you, no need to know how many bytes is your column or line.
Or you just use getline
as in code example in link to read whole line then you "parse" and extract columns as you wish....
If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(
Or is it somehow you cannot use library?
Although while waiting for better question I will note that you can use awk
to read columns from text file but probably this is not what you want? Because what are you trying to do really?
@DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
– Mun Dong
Nov 10 at 8:49
Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
– Rrz0
Nov 10 at 8:50
@DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
– Mun Dong
Nov 10 at 8:51
@Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
– Mun Dong
Nov 10 at 8:52
You are right, will edit question.
– Rrz0
Nov 10 at 8:53
|
show 13 more comments
up vote
0
down vote
up vote
0
down vote
If you know what is column separator and how many columns you have you use getline
with column separator and then with line separator.
Here is getline
:
http://man7.org/linux/man-pages/man3/getline.3.html
It is very good because it allocates space for you, no need to know how many bytes is your column or line.
Or you just use getline
as in code example in link to read whole line then you "parse" and extract columns as you wish....
If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(
Or is it somehow you cannot use library?
Although while waiting for better question I will note that you can use awk
to read columns from text file but probably this is not what you want? Because what are you trying to do really?
If you know what is column separator and how many columns you have you use getline
with column separator and then with line separator.
Here is getline
:
http://man7.org/linux/man-pages/man3/getline.3.html
It is very good because it allocates space for you, no need to know how many bytes is your column or line.
Or you just use getline
as in code example in link to read whole line then you "parse" and extract columns as you wish....
If you paste exactly how you want to run program with input you show I can try write fast C program for good answer. Now it is just comment-style answer with too many words for comment :-(
Or is it somehow you cannot use library?
Although while waiting for better question I will note that you can use awk
to read columns from text file but probably this is not what you want? Because what are you trying to do really?
edited Nov 10 at 9:05
answered Nov 10 at 8:44
Mun Dong
237
237
@DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
– Mun Dong
Nov 10 at 8:49
Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
– Rrz0
Nov 10 at 8:50
@DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
– Mun Dong
Nov 10 at 8:51
@Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
– Mun Dong
Nov 10 at 8:52
You are right, will edit question.
– Rrz0
Nov 10 at 8:53
|
show 13 more comments
@DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
– Mun Dong
Nov 10 at 8:49
Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
– Rrz0
Nov 10 at 8:50
@DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
– Mun Dong
Nov 10 at 8:51
@Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
– Mun Dong
Nov 10 at 8:52
You are right, will edit question.
– Rrz0
Nov 10 at 8:53
@DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
– Mun Dong
Nov 10 at 8:49
@DavidC.Rankin you are most certainly very correct but I thought the question say: "able to read text from a file" and I see small example on bottom with text in columns. So we sit and wait and OP tells us what it is.... If binary format usually I find in binary format file lengths before content so then of course reading is not as reading from free-form text file. But thank you so much for comment! :-)
– Mun Dong
Nov 10 at 8:49
Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
– Rrz0
Nov 10 at 8:50
Thanks for your answer. File I want to read from is a text file. I don't understand what you mean here: "If you paste exactly how you want to run program with input you show". I want my program to be able to read, all columns from the text file into separate arrays, for further use.
– Rrz0
Nov 10 at 8:50
@DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
– Mun Dong
Nov 10 at 8:51
@DavidC.Rankin Other option is to read and check byte and check allocated memory and grow memory and many things but then this is much code and who knows what OP really wants? Maybe necessary maybe not but much code to write in a hurry.
– Mun Dong
Nov 10 at 8:51
@Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
– Mun Dong
Nov 10 at 8:52
@Rrz0 You say now "all columns" but in your question you say "skip columns". So you want to read all columns into array or what length? or should the array know its length? why not read into strings and maybe just save positions of column starts and ends? is that any good?
– Mun Dong
Nov 10 at 8:52
You are right, will edit question.
– Rrz0
Nov 10 at 8:53
You are right, will edit question.
– Rrz0
Nov 10 at 8:53
|
show 13 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237280%2fhow-to-read-columns-from-a-text-file-and-save-to-separate-arrays-in-c%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
"so reading columns separately may not be what fread is intended to do." - correct. Block-storage devices return large buffers of data, it's not worth it to try to read like that from the underlying storage device.
– Dai
Nov 10 at 8:31
1
No, you generally need to read all the data from the file, and then can use the necessary data in your program as required. However, if the no of bytes in file for each column is known and constant you can use
fseek
to jump to particular location in the file to read some bytes.– Rishikesh Raje
Nov 10 at 8:36
Thanks for both comments. Very helpful.. @RishikeshRaje I would like to design my program to be able to read separate columns, irrespective of the number of bytes. Will edit the question.
– Rrz0
Nov 10 at 8:38
@DavidC.Rankin understood, so the recommended way is to use
fgets
?– Rrz0
Nov 10 at 8:45
If the file is binary, then you are pretty much stuck with the
struct
approach. If it is just text, then yes,fgets
thensscanf
(or walk a pair of pointers down the line picking out what you need) Note: you can also usefgets
thenstrtok
to separate (tokenize) the fields. You can do the same thing withsscanf
using the"%n"
specifier to determine the number of characters used with each conversion and then offsetting your buffer by that amount for the next conversion.– David C. Rankin
Nov 10 at 8:46