Rollback part of commit 7dbb3fe710466ec2f573f81e19c7130ec1c7a365 #2273

jzabroski · 2024-07-02T21:19:33Z

Describe the bug
I don't know why the code is now throwing an exception when it used to return null - the commit where this occurred just says it was done to remove nullability - but why when that is very useful. There is no clear use case for this behavior, and it causes an exception to be thrown on an empty file, which is highly non-desirable behavior.

In any event, the lack of migration guide since v30 to now is also making upgrading difficult and questionable. We're on v31 but given the other bug I filed, I'm not planning to upgrade further.

To Reproduce
7dbb3fe#diff-e1cbba681f102065fada3e91fecbfbc458195a01974189a3483f75e72137afdaL733-L736

Fails with the following:

Empty.csv in CsvHelperBug.csproj

first,last,dob

SampleMap.cs

    public class SampleMap : ClassMap<SampleLine>
    {
        [SuppressMessage("ReSharper", "VirtualMemberCallInConstructor")]
        public SampleMap()
        {
            Map(x => x.FirstName).Name("first");
            Map(x => x.LastName).Name("last");
            Map(x => x.DateOfBirth).Name("dob");
        }
    }

SampleLine.cs

    public class SampleLine
    {
        public string FirstName { get; set; }

        public string LastName { get; set; }

        public DateTime? DateOfBirth { get; set; }
    }

CsvHelperBug/Test.cs

var resourceName = "CsvHelperBug.Empty.csv"; // Modify name based on what csproj your repro is called
using (var streamReader = new StreamReader(GetType().Assembly.GetManifestResourceStream(resourceName)))
{
    using (var csvReader = new CsvReader(streamReader, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true }))
    {
        csvReader.Context.RegisterClassMap<SampleMap>();
        var results= new List<SampleLine>();
        while (await csvReader.ReadAsync())
        {
            var importLine = csvReader.GetRecord<SampleLine>();
            if (importLine == null)
            {
                //No record found
                continue;
            }
            results.Add(importLine);
        }
        return results;
    }
}

Additionally, the following fails with "No header record was found" (wtf?):

var resourceName = "CsvHelperBug.Empty.csv";
using (var streamReader = new StreamReader(GetType().Assembly.GetManifestResourceStream(resourceName)))
{
    using (var csvReader = new CsvReader(streamReader, new CsvConfiguration(CultureInfo.CurrentCulture) { HasHeaderRecord = true }))
    {
        csvReader.Context.RegisterClassMap<SampleMap>();
        csvReader.ReadHeader(); // throws CsvHelper.ReaderException: No header record was found.
    }
}

Expected behavior
Roll this back! It doesn't make any sense in the perfectly valid scenario where there are no rows.

Screenshots
n/a

Additional context
n/a

The text was updated successfully, but these errors were encountered:

JoshClose · 2024-07-02T22:55:04Z

The idea is, you should never have a null record. The issue here is ReadAsync should be returning false.

Does this sound reasonable?

As far as the ReadHeader thing, that code hasn't changed in years. You should be calling Read before ReadHeader, but there could probably be a better exception message.

jzabroski · 2024-07-02T22:56:44Z

That would work for me. Can you push a fix today

JoshClose · 2024-07-02T22:56:57Z

Working on it right now.

JoshClose · 2024-07-02T23:17:46Z

I have a dilemma.

The way things work is row by row. That means you call Read every time you want a new row. You can then act on the given row. If you have a header record, you're supposed to call ReadHeader, which reads the current row as headers. Then you call you Read again to get to the first record. This allows the user to handle each row however they like. When using GetRecords, everything is handled for you.

The issue here is that GetRecord is calling ReadHeader for you if you haven't called it yet. That was a convenience for people, but presents a problem here, because there might not be another row. Returning null is a bad way to handle that. When enabling nullable it becomes a pain to handle null records when there is almost no chance of getting one.

I think the correct solution here is to have GetRecord stop calling ReadHeader and Read for you. This may be a little annoying, but it makes things consistent.

csv.Read();
csv.ReadHeader();
while (csv.Read())
{
    var record = csv.GetRecord<Foo>();
}

The other option is to have Read call ReadHeader and read a second time, returning false when there isn't another record. I don't like this idea because it doesn't allow the user to do any other inspection on the header record and automatically goes past it. If you're using Read instead of GetRecords, you've opted into a more manual way of reading.

What do you think?

jzabroski · 2024-07-02T23:31:58Z

I think if CsvConfiguration specifies there is a header record, it should do it for you.

ReadHeader should add a remark that it should only be used for highly customized or niche scenarios, and is primarily intended for internal use when CsvConfiguration.HasHeaderRecord is true.

Otherwise, what is the point of the CsvConfiguration.HasHeaderRecord property?

JoshClose · 2024-07-02T23:34:45Z

what is the point of the CsvConfiguration.HasHeaderRecord property?

Only used for GetRecords.

jzabroski · 2024-07-02T23:58:37Z

what is the point of the CsvConfiguration.HasHeaderRecord property?

Only used for GetRecords.

Then ReadHeader should add a remarks that it requires CsvConfiguration.HasHeaderRecord to be true, but allows the consumer to directly read.

And yes, your Read fix should work.

In my use case, we read and validate one row at a time with FluentValidation, so that we can parse a batch of records in really large CSV files, validate it, and save to a db only the good rows and report the bad rows. Otherwise I'm not certain we would need to read one row at a time.

A lot of financial data is zipped csv files, so we deflate, which is memory intensive, and to avoid running out of memory, as well as scaling db insert/update/delete operations, we process in batches of 1000.

jzabroski · 2024-09-18T15:49:51Z

@JoshClose This was my generic solution based on your last comment - do you think this is OK? - at a practical level, it doesn't make a ton of sense given how the code for GetRecord reads (it implies it should read the header for you) - but the below code passes 3 different important test scenarios:

CSV with header record and no data
CSV with error in third column of data - e.g., malformed Date value that cannot be read into a DateTime struct
CSV with header record and 3 rows of data - should read in perfectly

using (var streamReader = new StreamReader(request.Stream))
{
    using (var csvReader = new CsvReader(streamReader, request.Configuration))
    {
        csvReader.Context.RegisterClassMap<TMap>();
        
        var stopwatch = Stopwatch.StartNew();
        if (csvReader.Configuration.HasHeaderRecord)
        {
            csvReader.Read();
            csvReader.ReadHeader();
        }
        if (csvReader.Read())
        {
            var importLine = csvReader.GetRecord<TResult>();
            if (importLine != null)
            {
                if (!IsValid(request, importLine, csvReader.Context, out var error))
                {
                    errors.Add(error);
                }
                else
                {
                    results.Add(importLine);
                }
            }
        }

        return new CsvBatchParserResponse<TResult>
        {
            Results = results,
            Errors = errors,
            ParseTime = stopwatch.Elapsed
        };
    }

jzabroski added the bug label Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rollback part of commit 7dbb3fe710466ec2f573f81e19c7130ec1c7a365 #2273

Rollback part of commit 7dbb3fe710466ec2f573f81e19c7130ec1c7a365 #2273

jzabroski commented Jul 2, 2024

JoshClose commented Jul 2, 2024

jzabroski commented Jul 2, 2024

JoshClose commented Jul 2, 2024

JoshClose commented Jul 2, 2024

jzabroski commented Jul 2, 2024 •

edited

Loading

JoshClose commented Jul 2, 2024

jzabroski commented Jul 2, 2024 •

edited

Loading

jzabroski commented Sep 18, 2024 •

edited

Loading

Rollback part of commit 7dbb3fe710466ec2f573f81e19c7130ec1c7a365 #2273

Rollback part of commit 7dbb3fe710466ec2f573f81e19c7130ec1c7a365 #2273

Comments

jzabroski commented Jul 2, 2024

Empty.csv in CsvHelperBug.csproj

SampleMap.cs

SampleLine.cs

CsvHelperBug/Test.cs

JoshClose commented Jul 2, 2024

jzabroski commented Jul 2, 2024

JoshClose commented Jul 2, 2024

JoshClose commented Jul 2, 2024

jzabroski commented Jul 2, 2024 • edited Loading

JoshClose commented Jul 2, 2024

jzabroski commented Jul 2, 2024 • edited Loading

jzabroski commented Sep 18, 2024 • edited Loading

jzabroski commented Jul 2, 2024 •

edited

Loading

jzabroski commented Jul 2, 2024 •

edited

Loading

jzabroski commented Sep 18, 2024 •

edited

Loading