scikit grid search over multiple classifiers python -


i wanted know if there better more inbuilt way grid search , test multiple models in single pipeline. of course parameters of models different, made complicated me figure out. here did:

from sklearn.pipeline import pipeline sklearn.ensemble import randomforestclassifier sklearn.neighbors import kneighborsclassifier sklearn.svm import svc sklearn.naive_bayes import multinomialnb sklearn.grid_search import gridsearchcv   def grid_search():     pipeline1 = pipeline((     ('clf', randomforestclassifier()),     ('vec2', tfidftransformer())     ))      pipeline2 = pipeline((     ('clf', kneighborsclassifier()),     ))      pipeline3 = pipeline((     ('clf', svc()),     ))      pipeline4 = pipeline((     ('clf', multinomialnb()),     ))      parameters1 = {     'clf__n_estimators': [10, 20, 30],     'clf__criterion': ['gini', 'entropy'],     'clf__max_features': [5, 10, 15],     'clf__max_depth': ['auto', 'log2', 'sqrt', none]     }      parameters2 = {     'clf__n_neighbors': [3, 7, 10],     'clf__weights': ['uniform', 'distance']     }      parameters3 = {     'clf__c': [0.01, 0.1, 1.0],     'clf__kernel': ['rbf', 'poly'],     'clf__gamma': [0.01, 0.1, 1.0],      }     parameters4 = {     'clf__alpha': [0.01, 0.1, 1.0]     }      pars = [parameters1, parameters2, parameters3, parameters4]     pips = [pipeline1, pipeline2, pipeline3, pipeline4]      print "starting gridsearch"     in range(len(pars)):         gs = gridsearchcv(pips[i], pars[i], verbose=2, refit=false, n_jobs=-1)         gs = gs.fit(x_train, y_train)         print "finished gridsearch"         print gs.best_score_ 

however, approach still giving best model within each classifier, , not comparing between classifiers.

although topic bit old, i'm posting answer in case helps in future.

instead of using grid search hyperparameter selection, can use 'hyperopt' library.

please have @ section 2.2 of this page. in above case, can use 'hp.choice' expression select among various pipelines , define parameter expressions each 1 separately.

in objective function, need have check depending on pipeline chosen , return cv score selected pipeline , parameters (possibly via cross_cal_score).

the trials object @ end of execution, indicate best pipeline , parameters overall.


Comments

Popular posts from this blog

apache - Remove .php and add trailing slash in url using htaccess not loading css -

javascript - jQuery show full size image on click -