+1 (315) 557-6473 

Process DSV to CSV File using C Assignment Solution


Programming in C for Building a Simple ETL program

Captain America has intercepted a secret Hydra database and wants to select specific columns which he thinks are the most vital information from the database. The database is too large for him to do this manually. The Captain seeks help with c assignment to write a program that makes this job easier for him to complete.

A common format for moving data between systems is called a DSV file. A delimiter-separated values (DSV) file uses a delimiter (',' OR '#' OR ' ' OR '\t' ,etc) to separate values on a line of data. Each line of the file, or “data record”, is terminated by a newline ('\n'). Each record consists of one or more data fields (columns) separated by the delimiter. Fields are numbered from left to right starting with column 1. DSV file stores tabular data (numbers and text) in plain text (ASCII strings). In a proper DSV file, each line will always have the same number of fields (columns).

CSV 1

In the example above we have a sample of a Call Detail Record (CDR) in DSV format with the delimiter as a comma. A CDR file describes a cell phone voice call from one phone to another phone. Each record has 10 fields or columns. The first record of the file is a label for that column (field). Each column can be empty, and the last column is never followed by a ','. It always ends with a '\n' for every line record.

An empty Call Detail Record in the above format would have nine (9) commas in it as shown below.

For the purposes of this assignment, assume the following of the input data:

  1. The DSV file is ASCII text-based and can be edited by text editors.
  2. Every line (including the last line) ends with a '\n'.
  3. All records have the same number of data fields.
  4. A data field can be empty only when there is more than 1 field in each record.
  5. The default (if the option -d is not provided as input) delimiter is a comma (',')
  6. The delimiter can be any ASCII printable character except the '\n' (newline),'\' (backslash), and '\0' (NULL) characters
  7. No empty record will be provided. No extra options will be provided as input.

For example, a 3 field DSV has the following variations

  • 1#2#3 (delimiter is '#')
  • I am a string_another string_5 (delimiter is '_')
  • @@4 (delimiter is '@')
  • (delimiter is ',')

Spaces (or lack of spaces) in a field are to be preserved in this assignment.

In this assignment, you will write a program that reads DSV data from standard input, and writes a modified DSV file to standard output. In the description, record columns (fields) are numbered from 1 being the leftmost column, to N being the last column. (N is also the number of columns in a single record.)

Sample Examples :

Example #1

Given an input file with 4 columns containing the following 3 records of data:

10,20,30,40

a,b,c,d

this is input,more input,3,last input

Calling the program as:

./cnvtr -c 4 4 3 2 1 < input_file > output_file

It says to read an input DSV file where each record has 4 columns and the delimiter is a comma. The output specification is to write a DSV file where each output record has a column order of 4,3,2,1 from each input record with the delimiter as a comma.

The columns above are in order 1,2,3,4. For e.g.: in the case of 10,20,30,40 – 10 (column 1) is entry 0 in the array.

The output will contain three records that look like:

~~Hydra Secret~~ 40,30,20,10

d,c,b,a

last input,3,more input,this is input

~~Hydra Secret~~

You do not need to parse "<" and ">" from the command line in your program. These are redirection operators and are parsed by the shell before the arguments even arrive in your program.

Example #2

Calling the program, cnvtr with the same input but as:

./cnvtr -c 4 -d “,” 3 < input_file > output_file

It says to read a DSV file with 4 columns and only write column 3 of the input file to the output DSV file.

The output will contain three records that look like:

~~Hydra Secret~~ 30

c 3

~~Hydra Secret~~

Example #3

Some DSV files want to allow the fields to contain the delimiter. To incorporate this functionality, those are escaped with a '\' (backslash). (Refer to Part 2 of the assignment for more details on this case.)

Given an input DSV file of four records could look like:

1#2#test\#string#4

When used as

./cnvtr -c 4 -d “#” 3 4

Output is

~~Hydra Secret~~

test\#string,4

~~Hydra Secret~~

Solution:

#include < stdio.h> #include < stdlib.h> #include < string.h> #define MAX_LENGTH 256 #define MAX_TOKENS 256 int main(int argc, char* argv[]) { int i, argNum = 0, linecnt = 0, columns = -1; size_t bufSize = MAX_LENGTH-1; char delimiter = ',', prev = 0; int* oBuf = (int*)malloc(MAX_TOKENS * sizeof(int)); for(i = 1; i columns) { printf("ERROR: Illegal column index\n"); exit(EXIT_FAILURE); } } char **buf = (char**)malloc((MAX_TOKENS+1) * sizeof(char*)); printf("~~Hydra Secret~~\n"); buf[0] = (char*)malloc(MAX_LENGTH * sizeof(char)); while(getline(&buf[0], &bufSize, stdin) > 0) { int fieldNum = 1, len = strlen(buf[0]); buf[fieldNum] = buf[0] + 0; for (i = 0; i 0) { printf("%c", delimiter); } printf("%s", buf[oBuf[i]]); } printf("\n"); } printf("~~Hydra Secret~~\n"); free(buf[0]); free(oBuf); free(buf); return 0; }