CSV with Multiple Headers

Feb 10, 2013 at 12:50 AM
I am new to VB .Net programming but have successfully used KBCsv to read in a CSV file. I was wondering if this could be used to read in a CSV file with multiple headers. The first Item or Column of a header will contain a : to designate that it is a header.

So it would need to read a line then check if its a header and if its a header it sets the header then continues and if the next line is not a header then it becomes data.

Here is a sample file

:Template,ParentTemplate
$SampleUserDefined,$UserDefined
:FADisc,Template,AccessMode,Category
StopC,$Motor_Fv,InputOutput,ObjectWriteable

Thanks
Coordinator
Feb 11, 2013 at 9:11 AM
Hi,

You're essentially asking whether KBCsv can parse non-CSV data, which naturally it cannot. However, there's no reason why you cannot provide your own wrapper around your data that uses KBCsv internally to do the heavy lifting. Here's some pseudo code showing one way to achieve this (C#ish, but should be easy to translate to VB.NET):
public class YourDataReader
{
    private readonly TextReader textReader;
    private CsvReader csvReader;
    
    public YourDataReader(TextReader textReader)
    {
        this.textReader = textReader;
    }
    
    public bool HasMoreRecords
    {
        // NOTE: using Peek means it's not as robust as it could be (see my comments below)
        get { return this.textReader.Peek() != -1; }
    }
    
    private bool IsHeaderRecord
    {
        // NOTE: using Peek means it's not as robust as it could be (see my comments below)
        get { return (char)this.textReader.Peek() == ':'; }
    }
    
    public DataRecord ReadDataRecord()
    {
        if (!HasMoreRecords)
        {
            return null;
        }
        
        if (IsHeaderRecord)
        {
            if (this.csvReader != null)
            {
                // dispose of previous reader
                this.csvReader.Dispose();
            }
            
            // false to ensure disposing CsvReader does not dispose TextReader
            this.csvReader = new CsvReader(textReader, false);
            
            // read the header record
            this.csvReader.ReadHeaderRecord();
        }
        
        return this.csvReader.ReadDataRecord();
    }
}
It's not perfectly robust and peeking isn't good with an underlying NetworkStream, for example, because it may return -1 when there is actually more data to come. You could make it more robust with error checking and by removing your dependency on Peek(), but you get the idea.

Best,
Kent
Feb 15, 2013 at 1:20 AM
Edited Feb 15, 2013 at 3:25 PM
Here is my code in VB.

As soon as the line
HeaderRecord = csvReader.ReadHeaderRecord() is executed then the Peek value changes to -1 so it will not loop.

Is there any other way to loop through the file without using peek and without reading the line to loose the first header?
        Dim objReader As New System.IO.StreamReader(Filename)
        Do While objReader.Peek() <> -1
            If ChrW(objReader.Peek()) = ":" Then

                ' dispose of previous reader
                If csvReader IsNot Nothing Then
                    csvReader.Dispose()
                End If
                Call StatusLog(objReader.Peek().ToString)
                csvReader = New Kent.Boogaart.KBCsv.CsvReader(objReader, False)

                'Set the Header
                HeaderRecord = csvReader.ReadHeaderRecord()

                HeaderRecordItem0 = HeaderRecord.Item(0)
                Call StatusLog(objReader.Peek().ToString)
                Call StatusLog(HeaderRecordItem0)
                objReader.ReadLine()

            Else
                'read Record
                dataRecord = csvReader.ReadDataRecord()
                Call StatusLog(dataRecord.Item(0))
                objReader.ReadLine()
            End If
        Loop
Thanks
Coordinator
Feb 15, 2013 at 5:19 PM
Hi ElroyJ,

My bad. I forgot that KBCsv obviously buffers data, so it will read your entire data set in (because it's less than 4K). The trick is to make sure KBCsv only sees the data in each section. Here's some code in C# that achieves this. I tested it against your data and it works. It could definitely be tidied up and made more robust, however, especially around encodings and error checking. This should more than get you started though:
    class Program
    {
        static void Main(string[] args)
        {
            var sampleData = @":Template,ParentTemplate 
$SampleUserDefined,$UserDefined 
:FADisc,Template,AccessMode,Category 
StopC,$Motor_Fv,InputOutput,ObjectWriteable";

            using (var dataStream = new MemoryStream(Encoding.Default.GetBytes(sampleData)))
            {
                var customDataReader = new CustomDataReader(dataStream);

                while (customDataReader.HasMoreRecords)
                {
                    var record = customDataReader.ReadDataRecord();
                    Console.WriteLine(record);
                }
            }

            Console.WriteLine();
            Console.WriteLine("DONE - ANY KEY TO EXIT");
            Console.ReadKey();
        }
    }

    public class CustomDataReader
    {
        private readonly Stream customDataStream;
        private CsvReader csvReader;
        private CustomTextReader customTextReader;

        public CustomDataReader(Stream customDataStream)
        {
            this.customDataStream = customDataStream;
            this.InitializeCsvReader(true);
        }

        public bool HasMoreRecords
        {
            get
            {
                if (!this.csvReader.HasMoreRecords)
                {
                    this.InitializeCsvReader(false);
                }

                return this.csvReader.HasMoreRecords;
            }
        }

        public DataRecord ReadDataRecord()
        {
            if (!this.csvReader.HasMoreRecords)
            {
                this.InitializeCsvReader(false);
            }

            return this.csvReader.ReadDataRecord();
        }

        private void InitializeCsvReader(bool skipHeaderRecordIndicator)
        {
            if (skipHeaderRecordIndicator)
            {
                // skip the ':'
                for (var i = 0; i < Encoding.Default.GetByteCount(":"); ++i)
                {
                    this.customDataStream.ReadByte();
                }
            }

            this.customTextReader = new CustomTextReader(this.customDataStream);
            this.csvReader = new CsvReader(this.customTextReader);
            this.csvReader.ReadHeaderRecord();
        }

        private sealed class CustomTextReader : StreamReader
        {
            private bool done;

            public CustomTextReader(Stream stream)
                : base(stream)
            {
            }

            public override int Peek()
            {
                throw new NotImplementedException();
            }

            public override int Read()
            {
                throw new NotImplementedException();
            }

            public override int Read(char[] buffer, int index, int count)
            {
                if (this.done)
                {
                    return -1;
                }

                for (var i = 0; i < count; ++i)
                {
                    var r = this.BaseStream.ReadByte();

                    if (r == -1 || (char)r == ':')
                    {
                        this.done = true;
                        return i;
                    }

                    buffer[i + index] = (char)r;
                }

                return count;
            }
        }
    }
Best,
Kent
Feb 18, 2013 at 11:40 PM
Thanks for helping.

I got your code coverted to VB and working.

I love your library. It really made things easier.
Feb 19, 2013 at 7:10 PM
On more question that I hate to ask.

I need it to detect only the ":" at the beginning of the line so if some of my data has a ":" in it then it will not think that is a new header.

If there anyway to modify this easily.

I tried adding Linefeed or CrLf & ":" but I can't seem to get that to work.



Thanks again