Skip to content
This repository has been archived by the owner on Nov 19, 2020. It is now read-only.

Add an Example for CrossValidating NaiveBayes #807

Closed
1 of 4 tasks
ConductedClever opened this issue Aug 24, 2017 · 10 comments
Closed
1 of 4 tasks

Add an Example for CrossValidating NaiveBayes #807

ConductedClever opened this issue Aug 24, 2017 · 10 comments

Comments

@ConductedClever
Copy link

ConductedClever commented Aug 24, 2017

What would you like to submit? (put an 'x' inside the bracket that applies)

  • question
  • bug report
  • feature request
  • doc request

Issue description

Hi. Please add an example of how to cross validate Naive Bayes learning algorithm. There are some for SVM, Hidden Markov Model, and Decision Tree here. Thanks.

@cesarsouza
Copy link
Member

Hi @ConductedClever,

While the example is not added to the official documentation, you can find an example on how to use NaiveBayes with Cross-Validation below:

// Ensure we have reproducible results
Accord.Math.Random.Generator.Seed = 0;

// Let's say we have the following data to be classified
// into three possible classes. Those are the samples:
//
int[][] inputs =
{
    //               input      output
    new int[] { 0, 1, 1, 0 }, //  0 
    new int[] { 0, 1, 0, 0 }, //  0
    new int[] { 0, 0, 1, 0 }, //  0
    new int[] { 0, 1, 1, 0 }, //  0
    new int[] { 0, 1, 0, 0 }, //  0
    new int[] { 1, 0, 0, 0 }, //  1
    new int[] { 1, 0, 0, 0 }, //  1
    new int[] { 1, 0, 0, 1 }, //  1
    new int[] { 0, 0, 0, 1 }, //  1
    new int[] { 0, 0, 0, 1 }, //  1
    new int[] { 1, 1, 1, 1 }, //  2
    new int[] { 1, 0, 1, 1 }, //  2
    new int[] { 1, 1, 0, 1 }, //  2
    new int[] { 0, 1, 1, 1 }, //  2
    new int[] { 1, 1, 1, 1 }, //  2
};

int[] outputs = // those are the class labels
{
    0, 0, 0, 0, 0,
    1, 1, 1, 1, 1,
    2, 2, 2, 2, 2,
};

// Let's say we want to measure the cross-validation 
// performance of Naive Bayes on the above data set:
var cv = CrossValidation.Create(

    k: 10, // We will be using 10-fold cross validation

    learner: (p) => new NaiveBayesLearning(),

    // Now we have to specify how the tree performance should be measured:
    loss: (actual, expected, p) => new ZeroOneLoss(expected).Loss(actual),

    // This function can be used to perform any special
    // operations before the actual learning is done, but
    // here we will just leave it as simple as it can be:
    fit: (teacher, x, y, w) => teacher.Learn(x, y, w),

    // Finally, we have to pass the input and output data
    // that will be used in cross-validation. 
    x: inputs, y: outputs
);

// After the cross-validation object has been created,
// we can call its .Learn method with the input and 
// output data that will be partitioned into the folds:
var result = cv.Learn(inputs, outputs);

// We can grab some information about the problem:
int numberOfSamples = result.NumberOfSamples; // should be 15
int numberOfInputs = result.NumberOfInputs;   // should be 4
int numberOfOutputs = result.NumberOfOutputs; // should be 3

double trainingError = result.Training.Mean; // should be 0
double validationError = result.Validation.Mean; // should be 0.05

Hope it helps,
Cesar

@ConductedClever
Copy link
Author

Thanks a lot for the responsiveness. I will try this code and reflect the result here or close the issue. Thanks again @cesarsouza.

@ConductedClever
Copy link
Author

Hi @cesarsouza,

I have one more question about cross validation.

Is there any way to get confusion matrix from CrossValidationResult or CrossValidationStatistics?

For example to get the mean accuracy after cross-validating.

@cesarsouza
Copy link
Member

cesarsouza commented Sep 6, 2017

Yes, just instead of creating a ZeroOneLoss, you can create a GeneralConfusionMatrix and return its accuracy, for example:

// Let's say we want to measure the cross-validation 
// performance of Naive Bayes on the above data set:
var cv = CrossValidation.Create(

    k: 10, // We will be using 10-fold cross validation

    // First we define the learning algorithm:
    learner: (p) => new NaiveBayesLearning(),

    // Now we have to specify how the n.b. performance should be measured:
    loss: (actual, expected, p) =>
    {
        var cm = new GeneralConfusionMatrix(expected, actual);
        p.Tag = cm; // (if you want to save it for some purpose)
        return cm.Accuracy;
    },

    // This function can be used to perform any special
    // operations before the actual learning is done, but
    // here we will just leave it as simple as it can be:
    fit: (teacher, x, y, w) => teacher.Learn(x, y, w),

    // Finally, we have to pass the input and output data
    // that will be used in cross-validation. 
    x: inputs, y: outputs
);

Hope it helps,
Cesar

@cesarsouza
Copy link
Member

cesarsouza commented Sep 6, 2017

Ops, sorry, I misunderstood your question. Do you mean presenting a combined, aggregated confusion matrix representing all cross-validation folds? Or simply returning the confusion matrix of the best model found so far?

@ConductedClever
Copy link
Author

Hi @cesarsouza. I mean the first one, "a combined, aggregated confusion matrix representing all cross-validation folds". Because I think this could be a good metric to decide if the learning would work or not. Although the worst case also helps (I think).

@cesarsouza
Copy link
Member

In this case, you can use the example I have posted above to save a GeneralConfusionMatrix in each of the .Tag properties of the cross validation results. Then, you can combine all the confusion matrices from the validation folds using the GeneralConfusionMatrix.Combine method. I might be able to post a better example soon.

@cesarsouza
Copy link
Member

I will be adding a new extension method to help with this task. In the next framework release, you should be able to call it using:

// Let's say you have estimated a cross-validation result using:
var result = cv.Learn(inputs, outputs);

// You should be able to obtain a confusion matrix for the cross-validation using:
GeneralConfusionMatrix gcm = result.ToConfusionMatrix(inputs, outputs);

Regards,
Cesar

cesarsouza added a commit that referenced this issue Sep 7, 2017
…matrices from cross-validation results.

 - Updates GH-807: Add an Example for CrossValidating NaiveBayes
@ConductedClever
Copy link
Author

Hi @cesarsouza,

Thanks. now your given solution is complete.

And about your extension method, I think it will be useful to have both ToConfusionMatrix and ToDistinctConfusionMatrixes which return the mean values and the whole ConfusionMatrixes for more investigation. May be to take the min values or some complex relation between data. Although it is just a suggestion.

Thanks.

@Jeka17
Copy link

Jeka17 commented Aug 28, 2018

Hi,

is it possible to get classified class results for the validation data?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants