This page describes Hep, one of the optimization planners available in Farrago; it serves as a heuristic alternative to the global-cost-based Volcano planner. For background, see HowToWriteAnOptimizer.
The Volcano planner is very powerful, but using it to implement an optimizer can be difficult for the following reasons:
- Complexity of relational expression definitions. The author of a class implementing a relational expression must be very careful in producing a good rel digest, otherwise Volcano's dynamic programming algorithm may fail due to false positives (resulting in incorrect plans) or false negatives (resulting in search space explosion) during equivalence detection.
- Infinite loops. Sometimes Volcano needs extra help in recognizing equivalences; careless digest construction can also be a cause. Finding the cause of an infinite loop and coming up with a solution can be frustrating.
- Cost sensitivity. Coming up with a good cost model can be tricky, and this can make unit-testing a new rule difficult since it's not always obvious why a rule didn't fire.
- Exhaustive search. Even without infinite loops, Volcano's traversal order of the search space may not be optimal since it only has generic cost information. For example, a specialized rule may be able to use a sort to find an optimal plan, whereas Volcano may try every permutation. See VolcanoExplosionCaseStudy.
- Rule interactions. Volcano is capable of applying rules in any order, which is very powerful because it can lead to unexpected optimizations which humans never thought of before. But the downside is that truly meaningless combinations may force the optimizer writer to recognize and deal with edge cases that aren't really of interest (just to keep plan generation from failing, looping, taking too long, etc.).
We have developed a heuristic planner which can be used for a number of purposes:
- As a standalone functional planner implementation. Some optimizers may not need the power of a full global-cost-based optimizer, and can be constructed by combining the heuristic planner with a ruleset containing a mix of purely heuristic rules and locally cost-based rules.
- As a companion to Volcano. The heuristic optimizer can be used to apply "no-brainer" rules to a tree before starting Volcano, so that the ruleset used by Volcano can be smaller and can apply to a better initial-condition.
- As a mock planner for use in unit-testing specific rules or combinations of rules. The planner can be told to fire a particular sequence of rules, exercising the rules in a controlled, isolated fashion.
- Unlike Volcano, which maintains multiple possible implementations for each relational expression, the heuristic planner maintains only a single working copy of the entire relational expression graph at any time.
- The heuristic planner does maintain a DAG rather than a tree, allowing common subexpressions to be constructed or recognized at the relational level. So rel authors still need to be careful in constructing digests.
In order to use the heuristic planner, the optimizer developer must first create a program which specifies exactly how rules should be fired. The API for creating a program is defined in the Javadoc for class org.eigenbase.relopt.hep.HepProgramBuilder. Here's an example which demonstrates all of the features:
HepProgramBuilder programBuilder = new HepProgramBuilder(); // pull filters up out of joins programBuilder.addRuleInstance(new ExtractJoinFilterRule()); // push all filters into or down below joins programBuilder.addRuleInstance(new PushFilterThroughJoinRule()); // give plugins a chance to do whatever they want programBuilder.addSubprogram(pluginSubprogram); // convert contiguous 2-way joins into n-way joins programBuilder.addRuleInstance(new ConvertMultiJoinRule()); // optimize join order programBuilder.addSubprogram(joinOrderSubprogram); // find physical implementations for joins // try hash joins first programBuilder.addRuleInstance(new LhxHashJoinRule()); // for anything else, there's Cartesian, but first have to // extract join conditions again because FennelCartesianJoinRule // can't deal with those programBuilder.addRuleInstance(new ExtractJoinFilterRule()); programBuilder.addRuleInstance(new FennelCartesianJoinRule()); // convert remaining filters and projections to calcs; // these can be done as a group for efficiency programBuilder.addGroupBegin(); programBuilder.addRuleInstance(new ProjectToCalcRule()); programBuilder.addRuleInstance(new FilterToCalcRule()); programBuilder.addGroupEnd(); // implement calcs with Java iterators; we love Java! programBuilder.addRuleInstance(new MergeCalcRule()); programBuilder.addRuleInstance(new IterCalcRule()); // add any converters as needed programBuilder.addRuleClass(ConverterRule.class); HepProgram program = programBuilder.createProgram();
A subprogram can be used to create sequences which fire repeatedly until fixpoint. In the example above, there's a reference to a join order subprogram. Here's what its creation might look like:
HepProgramBuilder joinProgramBuilder = new HepProgramBuilder(); // change the match order to bottom up so that lower // joins get optimized before higher joins; this way we // can correctly cost the lower joins as factors in the // higher joins joinProgramBuilder.addMatchOrder(HepMatchOrder.BOTTOM_UP); // also change the match limit so that we only // optimize one join at a time, giving us a chance // to post-process the results before moving on to // the next join joinProgramBuilder.addMatchLimit(1); joinProgramBuilder.addRuleInstance(new OptimizeJoinOrderRule()); // revert to default match order joinProgramBuilder.addMatchOrder(HepMatchOrder.ARBITRARY); // likewise for match limit joinProgramBuilder.addMatchLimit(HepProgram.MATCH_UNTIL_FIXPOINT); // now, clean up by removing trivial projects that might have // been introduced by the join joinProgramBuilder.addRuleInstance(new RemoveTrivialProjectRule()); // after this, the subprogram implicitly loops back // to the start, firing OptimizeJoinOrderRule again, until // no more joins are processed // return program to be included in the main sequence return joinProgramBuilder.createProgram();
A subprogram can be used to create reusable program components; it can also be used to implement pluggable optimizers. The reference above to pluginSubprogram assumes that a plugin has been given a chance to create its own subprogram and register it with the optimizer. Here's an example of how the subprogram might have been created for the FTRS row-store data server plugin:
HepProgramBuilder ftrsProgramBuilder = new HepProgramBuilder(); // convert filters to index searches ftrsProgramBuilder.addRuleInstance(new FtrsScanToSearchRule()); // use index join where possible ftrsProgramBuilder.addRuleInstance(new FtrsIndexJoinRule()); // deal with CREATE INDEX and DML ftrsProgramBuilder.addRuleInstance(new FtrsIndexBuilderRule()); ftrsProgramBuilder.addRuleInstance(new FtrsTableModificationRule()); // push projections into scans where possible; do this after // the other rules since they may have introduced new projections ftrsProgramBuilder.addRuleInstance(new PushProjectThroughFilter()); ftrsProgramBuilder.addRuleInstance(new FtrsTableProjectionRule()); // likewise, CREATE INDEX may have introduced a new sorter; see // if we can eliminate sorters now (e.g. creating an index on a prefix of // an existing index) ftrsProgramBuilder.addRuleByDescription("FtrsRemoveRedundantSortRule"); // return program to be included into main sequence return ftrsProgramBuilder.createProgram();
The code above also shows how to add a rule by its description string in place of object reference. This feature should make it easy to write an XML mapping layer so that optimizers can be completely defined via XML configuration files.
More example mini-programs are available in the unit tests for the planner in org.eigenbase.test.HepPlannerTest. Hep is also used for controlled rule application in org.eigenbase.test.RelOptRulesTest and net.sf.farrago.test.FarragoOptRulesTest.
TBD: real program for a full optimizer in the LucidDB personality.
Rule authors should follow the guidelines in HowToWriteNewOptimizerRules. Strictly speaking, some of the steps are unnecessary for rules which are only intended to be used with the heuristic planner. For example, the heuristic planner does not care about traits except during the final phase of inserting converters, so it implements changeTraits as a no-op. However, it is recommended that all of the guidelines be followed in case the decision about which planner to use is changed later.
In addition, the heuristic planner has its own restrictions you should be aware of. Most importantly, unlike Volcano, it does not guarantee that it won't re-fire a rule on exactly the same expression, which can be a cause of infinite loops in rules which rely on Volcano's ability to keep track of rule queues.