Converting CSV to ARFF file in C
ref: <http://slavnik.fe.uni-lj.si/markot/csv2arff/csv2arff.php?do=instructions>
CSV simple format
CSV file can create by Excel or common text editor.
file extension ".csv"
figure 1. Creating CSV file by Excel
CSV simple format
CSV file can create by excel or common text editor.
File extension ".csv"
The cvs parser shown in figure 2.
Note: the csv parser allows the use of comma "," or semicolon ";" to delimit the values.
figure 2. Example CSV file
ARFF File Format
ARFF is simpify of
"Attribute-Relation file format"
An input file format used by WEKA (The machine learning tool).
see also : <https://weka.wikispaces.com/ARFF><https://weka.wikispaces.com/ARFF+(stable+version)>
figure 3. Example ARFF file
How to convert CSV to ARFF?
The converting can use both online tools and the WEKA to convert CSV file to ARFF file.
ref: <http://slavnik.fe.uni-lj.si/markot/csv2arff/csv2arff.php?do=instructions>
Can you implement by yourself in C programming?
The format file
- CSV format is simple.
- Do you know about ARFF section format ?
- @relation
- @attribute
- @data
The format file
- @relation <relation-name>
-
@attribure <attribute-name> <data-type>
- data-type : numeric or nominal
-
@data
- consist of instance of data
Step by Step
- Declare relation name in top of file. (@relation)
- Count number of attributes from CSV and ask type of this.
- Declare type of all attributes. (@attribute)
- Put instance of data as below @data
@relation <relation-name>
@attribute <attribute-name> <attribute-type>
@attribute <attribute-name> <attribute-type>
@attribute <attribute-name> <attribute-type>
...
@data
... instance of data each record separate by comma (,) ...
C programming functions
- #include <stdlib.h>
- FILE *file
-
fopen(<file-name>,<open-type>)
- r : read only
- w : write + create new file
- a : write + append if file that existed
- fgetc(<FILE-object>)
- fgets(<char []>,<buffer-size>,<FILE-object>)
- EOF, '\r', '\n'
- fputc('character',<FILE-object>)
- fputs("string",<FILE-object>)
- fprintf(<FILE-object>,<string-format>,[args,...])
- fclose(<FILE-object>)
Implementation Step
1. open .csv file 'r'
2. <read> first line >> header
3. ask for attribute type each column is numeric or nominal
count attributes number
4. <close file> .csv file
[Relation / Attribute Section]
5. open .arff file 'w' <create new file>
6. <write>@relation whatever<\n> //fix
..newlines..
@attribute [A] [numeric or nominal]<\n>
@attribute [B] numberic<\n>
@attribute ...
@attribute [E] nominal {m1,m2,m3,...}<\n>
..newlines..
7. <close file>
Implementation Step (cont.)
[Data Section]
@data
..newline..
8. open .arff file 'a'
9. open .csv file 'r'
10. <read> .csv skip first line
11. <write> .arff for each data instance
data from .csv each line must change
comma (,) and semicolon (;) to space-comma-space (' , ')
12. <close> .arff and .csv file
It's done!. That's it.
Let's try by yourself.
Thank you.
@adaydesign
#include <stdio.h>
#include <stdlib.h>
int main(int argc, const char * argv[]) {
/*
concept
1. open .csv file 'r'
2. <read> first line >> header
3. ask for attribute type each column is numeric or nominal
count attributes number
4. <close> .csv file
[Relation / Attribute Section]
5. open .arff file 'w' <create new file>
6. <write>@relation whatever<\n> //fix
..newlines..
@attribute [A] [numeric or nominal]<\n>
@attribute [B] numberic<\n>
@attribute ...
@attribute [E] nominal {m1,m2,m3,...}<\n>
..newlines..
7. <close>
[Data Section]
@data
..newline..
8. open .arff file 'a'
9. open .csv file 'r'
10. <read> .csv skip first line
11. <write> .arff for each data
data from .csv change , and ; to ' '
12. <close> .arff and .csv file
*/
FILE *csvfile;
char *csvfileName1 = "file_csv_2.csv";
char *outarfffileName = "out_arff.arff";
//1.
csvfile = fopen(csvfileName1, "r");
if (csvfile == NULL) {
printf("File Not Found!");
exit(0);
}
//2.
//open : count attribute
char ch;// = fgetc(csvfile);
int numAttrb = 0;
while ((ch = fgetc(csvfile)) != '\n') {
if (ch==',' || ch==';') {
numAttrb++;
}
}
//end +1
numAttrb++;
fclose(csvfile);
//open : type attribute
csvfile = fopen(csvfileName1, "r");
char attrs[numAttrb][50];
int pNum = 0;
int pChar = 0;
//ch = fgetc(csvfile);
while ((ch=fgetc(csvfile)) != '\n') {
if (ch!=',' && ch!=';') {
attrs[pNum][pChar] = ch;
pChar++;
}else{
attrs[pNum][pChar] = '\0';
pNum++;
pChar = 0;
}
}
//end of array + \0
attrs[pNum][pChar] = '\0';
fclose(csvfile);
printf("Attributes : ");
for (int i=0; i<numAttrb; i++) {
printf("%s ",attrs[i]);
}
//3. ASK Attribute type ...
printf("\nType of each attributes(%d) -- Numeric[N] or Nominal[!N]?\n",numAttrb);
char aInput[numAttrb];
printf("__ is : ");
for (int i=0; i<numAttrb; i++) {
scanf("%c",&aInput[i]);
}
printf("\n");
//5. print @relation and @attribute
FILE *outfile = fopen(outarfffileName, "w");
fputs("@relation whatever\n\n", outfile);
for (int i=0; i<numAttrb; i++) {
if (aInput[i]=='N'||aInput[i]=='n') {
fprintf(outfile,"@attribute %s numeric\n",attrs[i]);
}else{
fprintf(outfile,"@attribute %s nominal\n",attrs[i]);
}
}
fputs("\n@data\n", outfile);
fclose(outfile);
//8. print @data
FILE *csvin = fopen(csvfileName1, "r");
FILE *arffout = fopen(outarfffileName, "a");
int lineIndex = 0;
char ch1;
while ((ch1 = fgetc(csvin)) != EOF) {
if (lineIndex > 0) {
if (ch1 == ',' || ch1==';') {
fprintf(arffout, " %c ",ch1);
}else{
fputc(ch1, arffout);
}
}
if (ch1=='\n') {
lineIndex++;
}
}
fclose(csvin);
fclose(arffout);
//FINISH
printf("Convert file %s to %s has been finished.\n",csvfileName1,outarfffileName);
printf("open...\n\n");
arffout = fopen(outarfffileName, "r");
char buf[100];
while (fgets(buf, 100, arffout) != NULL) {
printf("%s",buf);
}
printf("\n\nend...");
return 0;
}
Example
https://github.com/appcodev/cpp_csv2arff
Converting CSV to ARFF file in C Programming
By Chalermchon Samana
Converting CSV to ARFF file in C Programming
- 2,862