Converting CSV to ARFF file in C

ref: <http://slavnik.fe.uni-lj.si/markot/csv2arff/csv2arff.php?do=instructions>

CSV simple format

CSV file can create by Excel or common text editor.

 

file extension  ".csv"

figure 1. Creating CSV file by Excel

CSV simple format

CSV file can create by excel or common text editor.

 

File extension  ".csv"

 

The cvs parser shown in figure 2.

Note: the csv parser allows the use of comma "," or semicolon ";" to delimit the values.

figure 2. Example CSV file

ARFF File Format

ARFF is simpify of

"Attribute-Relation file format"

 

An input file format used by WEKA (The machine learning tool).

see also : <https://weka.wikispaces.com/ARFF><https://weka.wikispaces.com/ARFF+(stable+version)>

figure 3. Example ARFF file

How to convert CSV to ARFF?

The converting can use both online tools and the WEKA to convert CSV file to ARFF file.

ref: <http://slavnik.fe.uni-lj.si/markot/csv2arff/csv2arff.php?do=instructions>

Can  you implement by yourself in C programming?

The format file

  • CSV format is simple. 
  • Do you know about ARFF section format ?
    • @relation
    • @attribute
    • @data

The format file

  • @relation <relation-name>
  • @attribure <attribute-name> <data-type>
    • data-type : numeric or nominal
  • @data
    • consist of instance of data

Step by Step

  1. Declare relation name in top of file. (@relation)
  2. Count number of attributes from CSV and ask type of this.
  3. Declare type of all attributes. (@attribute)
  4. Put instance of data as below @data

@relation <relation-name>

 

@attribute <attribute-name> <attribute-type>

@attribute <attribute-name> <attribute-type>

@attribute <attribute-name> <attribute-type>

...

 

@data

... instance of data each record separate by comma (,) ...

C programming functions

  • #include <stdlib.h>
  • FILE *file
  • fopen(<file-name>,<open-type>)
    • r : read only
    • w : write + create new file
    • a : write + append if file that existed
  • fgetc(<FILE-object>) 
  • fgets(<char []>,<buffer-size>,<FILE-object>)
  • EOF, '\r', '\n'
  • fputc('character',<FILE-object>)
  • fputs("string",<FILE-object>)
  • fprintf(<FILE-object>,<string-format>,[args,...])
  • fclose(<FILE-object>)

Implementation Step

1. open .csv file 'r'

2. <read> first line >> header

3. ask for attribute type each column is numeric or nominal

    count attributes number

4. <close file> .csv file

[Relation / Attribute Section]

5. open .arff file 'w' <create new file>

6. <write>@relation whatever<\n>    //fix

        ..newlines..

        @attribute [A] [numeric or nominal]<\n>

        @attribute [B] numberic<\n>

        @attribute ...

        @attribute [E] nominal {m1,m2,m3,...}<\n>

        ..newlines..

7. <close file>

Implementation Step (cont.)

[Data Section]

        @data

        ..newline..

8. open .arff file 'a'

9. open .csv file 'r'

10. <read> .csv skip first line

11. <write> .arff for each data instance

        data from .csv each line must change

        comma (,) and semicolon (;) to space-comma-space (' , ')

12. <close> .arff and .csv file

It's done!. That's it.

Let's try by yourself.

Thank you.

@adaydesign

#include <stdio.h>
#include <stdlib.h>

int main(int argc, const char * argv[]) {
    /*
     concept
     1. open .csv file 'r'
     2. <read> first line >> header
     3. ask for attribute type each column is numeric or nominal
        count attributes number
     4. <close> .csv file
     
     [Relation / Attribute Section]
     5. open .arff file 'w' <create new file>
     6. <write>@relation whatever<\n>    //fix
        ..newlines..
        @attribute [A] [numeric or nominal]<\n>
        @attribute [B] numberic<\n>
        @attribute ...
        @attribute [E] nominal {m1,m2,m3,...}<\n>
        ..newlines..
     7. <close>
     
     [Data Section]
        @data
        ..newline..
     8. open .arff file 'a'
     9. open .csv file 'r'
     10. <read> .csv skip first line
     11. <write> .arff for each data
        data from .csv change , and ; to ' '
     12. <close> .arff and .csv file
     */
    
    FILE *csvfile;
    char *csvfileName1 = "file_csv_2.csv";
    char *outarfffileName = "out_arff.arff";
    //1.
    csvfile = fopen(csvfileName1, "r");
    if (csvfile == NULL) {
        printf("File Not Found!");
        exit(0);
    }
    //2.
    //open : count attribute
    char ch;// = fgetc(csvfile);
    int numAttrb = 0;
    while ((ch = fgetc(csvfile)) != '\n') {
        if (ch==',' || ch==';') {
            numAttrb++;
        }
    }
    //end +1
    numAttrb++;
    fclose(csvfile);
    
    //open : type attribute
    csvfile = fopen(csvfileName1, "r");
    char attrs[numAttrb][50];
    
    int pNum = 0;
    int pChar = 0;
    //ch = fgetc(csvfile);
    while ((ch=fgetc(csvfile)) != '\n') {
        if (ch!=',' && ch!=';') {
            attrs[pNum][pChar] = ch;
            pChar++;
        }else{
            attrs[pNum][pChar] = '\0';
            pNum++;
            pChar = 0;
        }
    }
    //end of array + \0
    attrs[pNum][pChar] = '\0';
    fclose(csvfile);
    
    printf("Attributes : ");
    for (int i=0; i<numAttrb; i++) {
        printf("%s ",attrs[i]);
    }
    //3. ASK Attribute type ...
    printf("\nType of each attributes(%d) -- Numeric[N] or Nominal[!N]?\n",numAttrb);
    char aInput[numAttrb];
    printf("__ is : ");
    for (int i=0; i<numAttrb; i++) {
        scanf("%c",&aInput[i]);
    }
    printf("\n");
    
    //5. print @relation and @attribute
    FILE *outfile = fopen(outarfffileName, "w");
        fputs("@relation whatever\n\n", outfile);
    
    for (int i=0; i<numAttrb; i++) {
        if (aInput[i]=='N'||aInput[i]=='n') {
            fprintf(outfile,"@attribute %s numeric\n",attrs[i]);
        }else{
            fprintf(outfile,"@attribute %s nominal\n",attrs[i]);
        }
    }
    fputs("\n@data\n", outfile);
    fclose(outfile);
    
    //8. print @data
    FILE *csvin = fopen(csvfileName1, "r");
    FILE *arffout = fopen(outarfffileName, "a");
    int lineIndex = 0;
    char ch1;
    while ((ch1 = fgetc(csvin)) != EOF) {
        if (lineIndex > 0) {
            if (ch1 == ',' || ch1==';') {
                fprintf(arffout, " %c ",ch1);
            }else{
                fputc(ch1, arffout);
            }
        }
        
        if (ch1=='\n') {
            lineIndex++;
        }
    }
    
    fclose(csvin);
    fclose(arffout);
    
    
    //FINISH
    printf("Convert file %s to %s has been finished.\n",csvfileName1,outarfffileName);
    printf("open...\n\n");
    
    arffout = fopen(outarfffileName, "r");
    char buf[100];
    while (fgets(buf, 100, arffout) != NULL) {
        printf("%s",buf);
    }
    
    printf("\n\nend...");
    return 0;
}

Example

https://github.com/appcodev/cpp_csv2arff

Converting CSV to ARFF file in C Programming

By Chalermchon Samana

Converting CSV to ARFF file in C Programming

  • 2,862