One thing most parsers don't handle correctly, that's I've seen, is double
choke on this sort of thing.
Hector Santos wrote:
Goran,
Many times even with 3rd party libraries, you still have to learn how to
use it. Many times, the attempt to generalized does not cover all bases.
What if there is a bug? Many times with CSV, it might requires upfront
field definition or its all viewed as strings. So the "easiest" does not
always mean use a 3rd party solution.
Of course the devil is in the details and it helps when the OP provides
info, like what language and platform. If he said .NET, as I mention
the MS .net collection library has a pretty darn good reader class with
the benefits of supporting OOPS as well which allows you to create a data
"class" that you pass to the line reader.
Guess what? There is still a learning curve here to understand the
interface, to use it right as there would be with any library.
So the easiest? For me, it all depends - a simple text reader and
strtok() parser and work in the escaping issues can be both very easy and
super fast! with no dependency on 3rd party QA issues.
For me, I have never come across a library or class that could handle
everything and if it did, required a data definition interface of some
sort - like the .NET collection class offers. If he using .NET, then I
recommend using this class as the "easiest."
Case in point.
Even with the excellent .NET text I/O class and a CSV reader wrapper, it
only offers a generalized method to parse fields. This still requires
proper setup and conditions that might occur. It might require specific
addition logic to handle situations where it does not cover, like when
fields span across multiple lines. For example:
1,2,3,4,5,"hector
, santos",6
7,8
9,10
That might be 1 data record with 10 fields.
However, even if the library allows you to do this, in my opinion, only an
experienced implementator knows what to look for, see how to do it with
the library to properly address this.
Here is a VB.NET test program I wrote a few years back for a VERY long
thread regarding this topic and how to handle the situation for a fella
that had this need of fields spanning across multiple rows.
------------- CUT HERE -------------------
'--------------------------------------------------------------
' File : D:\Local\wcsdk\wcserver\dotnet\Sandbox\readcsf4.vb
' About:
'--------------------------------------------------------------
Option Strict Off
Option Explicit On
imports system
imports system.diagnostics
imports system.console
imports system.reflection
imports system.collections.generic
Imports system.text
Module module1
//
// Dump an object
//
Sub dumpObject(ByVal o As Object)
Dim t As Type = o.GetType()
WriteLine("Type: {0} Fields: {1}", t, t.GetFields().Length)
For Each s As FieldInfo In t.GetFields()
Dim ft As Type = s.FieldType()
WriteLine("- {0,-10} {1,-15} => {2}", s.Name, ft, s.GetValue(o))
Next
End Sub
//
// Data definition "TRecord" class, for this example
// 9 fields are expected per data record.
//
Public Class TRecord
Public f1 As String
Public f2 As String
Public f3 As String
Public f4 As String
Public f5 As String
Public f6 As String
Public f7 As String
Public f8 As String
Public f9 As String
Public Sub Convert(ByRef flds As List(Of String))
Dim fi As FieldInfo() = Me.GetType().GetFields()
Dim i As Integer = 0
For Each s As FieldInfo In fi
Dim tt As Type = s.FieldType()
If (i < flds.Count) Then
If TypeOf (s.GetValue(Me)) Is Integer Then
s.SetValue(Me, CInt(flds.Item(i)))
Else
s.SetValue(Me, flds.Item(i))
End If
End If
i += 1
Next
End Sub
Public Sub New()
End Sub
Public Sub New(ByVal flds As List(Of String))
Convert(flds)
End Sub
Public Shared Narrowing Operator CType(_
ByVal flds As List(Of String)) As TRecord
Return New TRecord(flds)
End Operator
Public Shared Narrowing Operator CType(_
ByVal flds As String()) As TRecord
Dim sl As New List(Of String)
For i As Integer = 1 To flds.Length
sl.Add(flds(i - 1))
Next
Return New TRecord(sl)
End Operator
End Class
Public Class ReaderCVS
Public Shared data As New List(Of TRecord)
'
' Read cvs file with max_fields, optional eolfilter
'
Public Function ReadCSV( _
ByVal fn As String, _
Optional ByVal max_fields As Integer = 0, _
Optional ByVal eolfilter As Boolean = True) As Boolean
Try
Dim tr As New TRecord
max_fields = tr.GetType().GetFields().Length()
data.Clear()
Dim rdr As FileIO.TextFieldParser
rdr = My.Computer.FileSystem.OpenTextFieldParser(fn)
rdr.SetDelimiters(",")
Dim flds As New List(Of String)
While Not rdr.EndOfData()
Dim lines As String() = rdr.ReadFields()
For Each fld As String In lines
If eolfilter Then
fld = fld.Replace(vbCr, " ").Replace(vbLf,"")
End If
flds.Add(fld)
If flds.Count = max_fields Then
tr = flds
data.Add(tr)
flds = New List(Of String)
End If
Next
End While
If flds.Count > 0 Then
tr = flds
data.Add(tr)
End If
rdr.Close()
Return True
Catch ex As Exception
WriteLine(ex.Message)
WriteLine(ex.StackTrace)
Return False
End Try
End Function
Public Sub Dump()
WriteLine("------- DUMP ")
debug.WriteLine("Dump")
For i As Integer = 1 To data.Count
dumpObject(data(i - 1))
Next
End Sub
End Class
Sub main(ByVal args() As String)
Dim csv As New ReaderCVS
csv.ReadCSV("test1.csf")
csv.Dump()
End Sub
End Module
------------- CUT HERE -------------------
Mind you, the above written 2 years ago while I was still learning .NET
library and I was participating in support questions to learn myself to do
common concept ideas in the .NET environment.
Is the above simple for most beginners? I wouldn't say so, but then
again, I tend to be a "tools" writer and try to generalized an tool, hence
when I spent the time to implement a data class using an object dump
function to debug it all. Not eveyone needs this. Most of the time, the
field types are known so a reduction can be done, or better yet, you can
take the above, have it read the first line as the field definition line
and generalize the TRecord class to make it all dynamic.
--
HLS