scikit grid search over multiple classifiers python -
i wanted know if there better more inbuilt way grid search , test multiple models in single pipeline. of course parameters of models different, made complicated me figure out. here did:
from sklearn.pipeline import pipeline sklearn.ensemble import randomforestclassifier sklearn.neighbors import kneighborsclassifier sklearn.svm import svc sklearn.naive_bayes import multinomialnb sklearn.grid_search import gridsearchcv def grid_search(): pipeline1 = pipeline(( ('clf', randomforestclassifier()), ('vec2', tfidftransformer()) )) pipeline2 = pipeline(( ('clf', kneighborsclassifier()), )) pipeline3 = pipeline(( ('clf', svc()), )) pipeline4 = pipeline(( ('clf', multinomialnb()), )) parameters1 = { 'clf__n_estimators': [10, 20, 30], 'clf__criterion': ['gini', 'entropy'], 'clf__max_features': [5, 10, 15], 'clf__max_depth': ['auto', 'log2', 'sqrt', none] } parameters2 = { 'clf__n_neighbors': [3, 7, 10], 'clf__weights': ['uniform', 'distance'] } parameters3 = { 'clf__c': [0.01, 0.1, 1.0], 'clf__kernel': ['rbf', 'poly'], 'clf__gamma': [0.01, 0.1, 1.0], } parameters4 = { 'clf__alpha': [0.01, 0.1, 1.0] } pars = [parameters1, parameters2, parameters3, parameters4] pips = [pipeline1, pipeline2, pipeline3, pipeline4] print "starting gridsearch" in range(len(pars)): gs = gridsearchcv(pips[i], pars[i], verbose=2, refit=false, n_jobs=-1) gs = gs.fit(x_train, y_train) print "finished gridsearch" print gs.best_score_
however, approach still giving best model within each classifier, , not comparing between classifiers.
although topic bit old, i'm posting answer in case helps in future.
instead of using grid search hyperparameter selection, can use 'hyperopt' library.
please have @ section 2.2 of this page. in above case, can use 'hp.choice' expression select among various pipelines , define parameter expressions each 1 separately.
in objective function, need have check depending on pipeline chosen , return cv score selected pipeline , parameters (possibly via cross_cal_score).
the trials object @ end of execution, indicate best pipeline , parameters overall.
Comments
Post a Comment