-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about tree tuning #20
Comments
Hi @luxin-tian
|
@luxin-tian @keertanavc . Yes, @keertanavc is right about how the parameter options settings work. And yes, the MSE in 1b and 1c are cross validated, while 1a is not. That is the point, to see the comparison of those values. |
Dear Keertana @keertanavc , I am curious about your saying that: 'If it is a list, sampling is done without replacement and if it is a distribution sampling is done with replacement.' Suppose we have a list with a length of 101, and a parameter of n_iter = 100. In this case, how can randomized search CV ensure that every value of the list will be tried out? Besides, if it is sampling without replacement for list, I think it will leave some combinations untried. How do you resolve the problem? Thanks! |
Dear Dr. @rickecon, @keertanavc,
In the problems set, we are required to tune the parameters of a Decision Tree and a Random Forest regression model. As is specified, the distributions of the parameters are set as the following,
While
sp_randint
is used for the other parameters, the distribution ofmax_depth
andn_estimators
are only specified by a list, which, according to the documentation ofRandomizedSearchCV
andGridSearchCV
, means that only two numbers will be tried.I wonder if this is intended, or a more reasonable specification would be
sp_randint(int, int)
?In 1.(b) and 1.(c), the problem description writes that
However, in 1.(a), we are calculating the test MSE on the testing set, and when we tune the trees in (b) and (c), the
best_score_
method returns the MSE calculated on the training set. I wonder if it may be the case that we cannot evaluate the performance of the tuning by comparing the MSE calculated on two different subsets of data?Thank you!
The text was updated successfully, but these errors were encountered: