Number is apower of 2

#NUMBER IS APOWER OF 2 CODE#

If you run at 100 MHz, the former approach will work for you the best. You either agree to live with this, or you build everything from smaller pieces manually. In practice, tools are dumb and you cannot predict what they do, except that it is not likely to be optimal.

After all, the whole thing is just 32x1 LUT and the task you're giving to the tools is to model it with the available smaller LUTs, minimizing either the number of LUTs or the shortest path. In theory, this should be optimized by tools to produce the optimal solution no matter how you write it.

A bitwise operation uses only two inputs of a LUT (while 6 are available) - this is very inefficient. To use less LUTs, each LUT should use as many inputs as possible - otherwise resources get wasted. Or you can do it in a tree where each node represents a sum of a number of values coming from child nodes. You can just add all the bits and if see if the sum is equal to 1. Oh, and I don't really know the OP either - it could also be some kind of "homework" that we are gracefully doing for them.? Actually detecting that a given register holds a power of two value may not be required, and maybe some other approaches can yield the result they're looking for. We discussed something similar in the Programming section about algorithms. One last thing: as in many cases, the best answer to the OP's problem may not even be what they asked for. Sure it's relatively easy to implement alternatives so that they take less area, but it's often a lot clunkier, and some have limitations (such as requiring a power of two number of bits, the above has no such restriction.) Can certainly be an interesting optimization problem per se, but I'm not sure it will further help the OP whatsoever. For fun, I tried various alternatives, and still the above is hard to beat overall. Looks like the thread, now that solutions have been proposed, is going to turn into a pissing contest. If the above doesn't fit the OP's requirements, then alright. Again, if it fits all your requirements, AND is simple, then no need to find anything else. Trying to think of small chunks of logics in isolation doesn't always give you the right idea of how the whole thing is going to be implemented. But if you use (v-1) elsewhere in your code, the tools will or may reuse it, thus making the overall thing actually take up less space than some other alternatives. The "minus one" operation here is kinda "expensive". Unless you're going to instantiate myriads of such detectors, a few LUTs of difference won't make the slightest difference!) One concrete thing to note, is also that the end result in a given design will depend on a lot of factors.

#NUMBER IS APOWER OF 2 CODE#

The above "modulo" method is actually one of the fastest (at least for non-pipelined approaches) once synthesized (not the fastest, but pretty good), is simple to understand and simple to read from code, yields pretty compact code (which is always a plus for maintenance and validation), is provably correct, and takes up reasonable area (sure it's not the least area-hungry, but it's still reasonable. And good engineering is actually all about finding the simplest solution that meets ALL requirements of a given problem. Don't leave such circuitry to the would create a glitchy hairball. Note that, if your FPGA has a different LUT construction ( like 8 input, 4 output or 4 input 3 output ) then other optimisations are possible.

There are not race conditions as all signals have passed the same amount of LUT's. signals passing a column all exet with the same propagation delay. since MTO is high the output will be low. in the last column both ALO and MTO goes high. in the second column we get both ALO's high and nothing registers on MTO's. in the first column we get ALO from Lut1 high, andnd ALO from LUT3 high, nothing registers on the MTO's. Example : lets try a number where A0 if high and A5 is high. Simply tell the synthesizer it needs to implement this as-is and not minimize the logic. If one MTO is high the output will be low. All MTO signals propagate through the OR-tree. MTO : there is more than one high (and gate ) ,we feed those into an or-tree.

ALO : there is at least one input high ( OR gate ), we will re-evaluate that in the next block. The idea is based on thinking 'in reverse'. you only need a NAND gate as output then. You can combine the three input or and the inverter also into one lut. Since a LUT typically has 4 inputs and 2 outputs you can combine two OR gates into one lut. Note that the last step ( two inverters and an OR gate ) could have been done differently but it would have unbalanced the propagation delay creating very glitchy output. This guarantees a known fixed propagation delay ( the actual gates are LUT's which have a constant propagation time irrespective of the type of gate created ).