Stack based vs Register based Virtual Machine Architecture, and the Dalvik VM

[Kostja Stern has been kind enough to translate this article in russian, which can be found here]
A virtual machine (VM) is a high level abstraction on top of the native operating system, that emulates a physical machine. Here, we are talking about process virtual machines and not system virtual machines. A virtual machine enables the same platform to run on multiple operating systems and hardware architectures. The Interpreters for Java and Python can be taken as examples, where the code is compiled into their VM specific bytecode. The same can be seen in the Microsoft .Net architecture, where code is compiled into intermediate language for the CLR (Common Language Runtime).

What should a virtual machine generally implement? It should emulate the operations carried out by a physical CPU and thus should ideally encompass the following concepts:

  • Compilation of source language into VM specific bytecode
  • Data structures to contains instructions and operands (the data the instructions process)
  • A call stack for function call operations
  • An ‘Instruction Pointer’ (IP) pointing to the next instruction to execute
  • A virtual ‘CPU’ – the instruction dispatcher that
    • Fetches the next instruction (addressed by the instruction pointer)
    • Decodes the operands
    • Executes the instruction

There are basically two main ways to implement a virtual machine: Stack based, and Register based. Examples of stack based VM’s are the Java Virtual Machine, the .Net CLR, and is the widely used method for implementing virtual machines. Examples of register based virtual machines are the Lua VM, and the Dalvik VM (which we will discuss shortly). The difference between the two approaches is in the mechanism used for storing and retrieving operands and their results.

Stack Based Virtual Machines

A stack based virtual machine implements the general features described as needed by a virtual machine in the points above, but the memory structure where the operands are stored is a stack data structure. Operations are carried out by popping data from the stack, processing them and pushing in back the results in LIFO (Last in First Out) fashion. In a stack based virtual machine, the operation of adding two numbers would usually be carried out in the following manner (where 20, 7, and ‘result’ are the operands):


  1. POP 20
  2. POP 7
  3. ADD 20, 7, result
  4. PUSH result

Because of the PUSH and POP operations, four lines of instructions is needed to carry out an addition operation. An advantage of the stack based model is that the operands are addressed implicitly by the stack pointer (SP in above image). This means that the Virtual machine does not need to know the operand addresses explicitly, as calling the stack pointer will give (Pop) the next operand. In stack based VM’s, all the arithmetic and logic operations are carried out via Pushing and Popping the operands and results in the stack.

Register Based Virtual Machines

In the register based implementation of a virtual machine, the data structure where the operands are stored is based on the registers of the CPU. There is no PUSH or POP operations here, but the instructions need to contain the addresses (the registers) of the operands. That is, the operands for the instructions are explicitly addressed in the instruction, unlike the stack based model where we had a stack pointer to point to the operand. For example, if an addition operation is to be carried out in a register based virtual machine, the instruction would more or less be as follows:


  1. ADD R1, R2, R3 ;        # Add contents of R1 and R2, store result in R3

As I mentioned earlier, there is no POP or PUSH operations, so the instruction for adding is just one line. But unlike the stack, we need to explicitly mention the addresses of the operands as R1, R2, and R3. The advantage here is that the overhead of pushing to and popping from a stack is non-existent, and instructions in a register based VM execute faster within the instruction dispatch loop.

Another advantage of the register based model is that it allows for some optimizations that cannot be done in the stack based approach. One such instance is when there are common sub expressions in the code, the register model can calculate it once and store the result in a register for future use when the sub expression comes up again, which reduces the cost of recalculating the expression.

The problem with a register based model is that the average register instruction is larger than an average stack instruction, as we need to specify the operand addresses explicitly. Whereas the instructions for a stack machine is short due to the stack pointer, the respective register machine instructions need to contain operand locations, and results in larger register code compared to stack code.

A great blog article I came across (At this link), contains an explanatory and simple C implementation of a register based virtual machine. If implementing virtual machines and interpreters is your main interest, the book by ANTLR creator Terrence Parr titled ‘Language Implementation Patterns: Create your own domain-specific and general programming languages’, might come in very handy.

The DALVIK virtual machine

The Dalvik virtual machine is implemented by Google for the Android OS, and functions as the Interpreter for Java code running on Android devices. It is a process virtual machine, whereby the the underlying Linux kernel of the Android OS spawns a new Dalvik VM instance for every process. Each process in Android has its own Dalvik VM instance. This reduces the chances of multi-application failure if one Dalvik VM crashes. Dalvik implements the register machine model, and unlike standard Java bytecode (which executes 8 bit stack instructions on the stack based JVM), uses a 16 bit instruction set.The registers are implemented in Dalvik as 4 bit fields.

If we want to dive a bit deep into the internals of how each process gets an instance of the Dalvik VM, we have to go back to the beginning… back to where the Linux kernel of the Android OS boots up:


When the system boots up, the boot loader loads the kernel into memory and initializes system parameters. Soon after this,

  • The kernel runs the Init program, which is the parent process for all processes in the system.
  • The Init program starts system daemons and the very important ‘Zygote’ service.
  • The Zygote process creates a Dalvik instance which will be the parent Dalvik process for all Dalvik VM instances in the system.
  • The Zygote process also sets up a BSD read socket and listens for incoming requests.
  • When a new request for a Dalvik VM instance is received, the Zygote process forks the parent Dalvik VM process and sends the child process to the requesting application.

This is in essence, how the Dalvik virtual machine is created and used in the Android system.

Coming back to the topic of virtual machines, Dalvik differs from the Java virtual machine in that it executes Dalvik byte code, and not the traditional Java byte code. There is an intermediary step between the Java compiler and the Dalvik VM, that converts the Java byte code to Dalvik byte code, and this step is taken up by the DEX compiler. The difference between the JVM and Dalvik is depicted in the following diagram (Click here for image source):


The DEX compiler converts the java .class file into a .dex file, which is of less size and more optimized for the Dalvik VM.

In Ending…

There is no clear cut acceptance as to whether stack based virtual machines are better than register based implementations, or vice versa. This is still a subject of ongoing debate, and an interesting research area. There is an interesting research paper where the authors have re-implemented the traditional JVM as a register based VM, and recorded some significant performance gains. Hopefully, I have shown the reader the difference between stack and register based virtual machines, and also explained the Dalvik VM in some detail. Please feel free to provide your feedback or any questions you have regarding this article.


56 responses

  1. Hariraju

    Awesome Article………really thank u a lot to share this wonderful article..

    11/09/2012 at 4:59 PM

  2. This is a good article.
    Good for everyone who like to find info about Dalvik VM

    19/09/2012 at 4:16 PM

  3. Nafz

    Very good article..
    Really Helped for my seminar

    25/09/2012 at 12:06 PM

  4. Chucherm

    Very good article, just what I was looking for

    23/10/2012 at 9:33 AM

  5. Like Einstein said “If u cant explain it simply, u didn’t understand it well enough”. Great article

    23/11/2012 at 10:07 AM

    • Thanks 🙂 Indeed, Einstein said it best, and practiced it as well. Sometimes one can wonder how the theory of relativity can be understood so simply, while the world of software is made so complex 😉

      23/11/2012 at 8:13 PM

  6. Very good and detail info. In a Glance, Pictures narrates the whole thing.

    18/12/2012 at 4:57 PM

    • Thanks for the comment 🙂 am glad you found the information useful!

      21/12/2012 at 6:14 PM

  7. Thank you Mark. Was a great read.

    19/01/2013 at 7:47 PM

    • Am glad you liked the post Amit 🙂 Thank you for reading.

      19/01/2013 at 9:23 PM

  8. Thank you for a nice article.
    One mistake: dex compiler converts .class file(s) to .dex file, not the .java file(s).

    20/01/2013 at 2:26 AM

    • Thanks Zenth for the observation… 🙂 Have corrected the specific point in the article.

      21/01/2013 at 9:34 PM

  9. Tom

    Thanks you very much..
    Very nice artical and you have cleared my worng ideas about VMs.

    01/03/2013 at 5:15 PM

    • Thanks Tom… am glad you found the article interesting and useful.

      01/03/2013 at 5:20 PM

  10. Sandro

    Register-based VMs are well known to be superior in throughput. The paper you linked to was pretty definitive on that issue. Stack VMs are only slightly better in program size, and for simplicity of understanding.

    18/03/2013 at 12:39 AM

    • I rather think that the current VM designers make the mistake of using only single stack system.

      When I was young I used Forth which had a VM with two stacks. And it was blazingly fast.

      (If you read my posting below: the program counter is a register — one register + one stack is Turing complete as well)

      30/04/2013 at 7:48 PM

  11. Excellent article !!! thank you very much.

    18/03/2013 at 1:44 PM

  12. Brijendra

    Fabulous article……….

    23/04/2013 at 5:37 PM

  13. Interesting. But you overlooked something: A true stack based CPU (virtual or hardware) needs at least TWO stacks. Just as register based CPU needs at least TWO register (program counter, stack counter and one data register would count as three).

    Without they can not be Turing complete.

    An example would be the VMs for the Forth programming language where a separate data and return stack is used.

    And once you have two stacks the the disadvantages you mentioned disappear.


    LoadR1 20
    LoadR2 7
    Add R1,R2, R3
    StoreR3 result

    Are four statements as well.

    30/04/2013 at 7:17 PM

    • minirop

      the “add” opcode of a stack-based VM has to pop both arguments, do the addition and push the result (4 operations) on the stack whilst the register-based doesn’t, and does the addition directly: R3 = R1 + R2

      05/10/2016 at 5:04 AM

      • And how does the data get into the register? And how does the result get out of the register? A register based CPU has a very limited amount of register. Currently somewhere between 8 and 16 so you frequently have to move data in and out.

        A stack, like the one used by the FORTH virtual machine can grow to size of main memory. Therefore you can just leave the results on the stack for far longer time to be used as input for the next operation.

        Most notably on a stack based CPU you can keep intermediate values on the stack while making a function call. You can’t do that on a register based CPU. Either the caller or the callee need to store and restore the registers somewhere. Usually on the stack. Because most (but not all) register based CPUs have a stack as well.

        05/10/2016 at 5:03 PM

  14. Abhi

    hi..Great article..
    But, speaking about the bytecode, which either machine executes, how are local variables represented.Are they compiled into addresses as actual machines need. or are they provided as an index into registers.For e.g., Parrot provides an arbitrary number of registers which is fixed at compile time per subroutine.

    08/05/2013 at 1:46 PM

  15. Its good article…

    15/06/2013 at 6:11 PM

  16. suresh

    Very good informative article …

    03/07/2013 at 12:44 AM

  17. alok

    Thanx to provide such a good information..:)

    29/07/2013 at 3:38 AM

  18. Pingback: Стековая и регистровая архитектура виртуальной машины и Dalvik VM | MyBlog

  19. Pingback: Стековая и регистровая архитектура виртуальной машины и Dalvik VM | MyBlog

  20. Pingback: Стековая и регистровая архитектура виртуальной машины и Dalvik VM | MyBlog

  21. Thank you for a nice article.
    I translated it into Russian

    03/11/2013 at 7:51 PM

  22. Eng Sultan

    Thanks for your great effort 😀

    18/12/2013 at 9:01 PM

  23. Baseer

    smoothly swallowed a mountain…! very well explained!

    04/05/2014 at 8:27 PM

  24. Pingback: Learning Android by Marko Gargenta and Masumi Nakamura; O’Reilly Media | Dev @ Work

  25. This sounds incorrect. Pop would involve moving data from stack to a CPU register. The time for a pop should be equal to the time for copying a data by memory address. Further, I suspect that further optimizations can be made using a stack based VM rather than a register based vm, due to the fact that the data the operation is working on resides next to each other when using a stack. Finally, the stack is usually stored in the L1, L2 or L3 cache. There are no such guarantees with arbitrary addresses as operands.

    But I may be incorrect. But it seems weird that Microsoft and Sun chose “inferior” (according to this post it sounds obviously inferior) technology for something so central to their business.

    I think the reason Android uses a register based VM may be that it allowed them to reuse bigger parts of a compiler toolchain.

    Also, android is noticeably more sluggish on similar hardware than Windows Phones running .NET.

    25/06/2014 at 5:40 PM

  26. As someone noticed it above you forget about loading values to registers (f.e. mov) which makes register based vm demanding the same amount of steps to add two numbers.

    26/06/2014 at 4:57 AM

  27. Pingback: Sul Java che batteva il C (compiler strikes back again) | Ok, panico

  28. This is very Complex topic..But explained it very simply and looks like normal to understand..Thank you…

    16/12/2015 at 3:35 PM

  29. Sam

    Why I see an exactly identical article here? That one is later than this post.

    19/05/2016 at 9:59 AM

    • Hi Sam, what you are referring to is the same article, but (automatically) exposed to codeproject via a feed from my blog 🙂

      03/06/2016 at 1:25 AM

  30. Excellent writing. It helps a lot

    p/s just a very small typo in the word “deamons”, for the completeness!

    20/06/2016 at 10:24 PM

  31. Andrew Runner

    Really good article. And still great after 4 years!

    07/08/2016 at 8:10 PM

    • Thanks Andrew… Glad you found it interesting and still relevant 🙂

      07/08/2016 at 10:57 PM

  32. Pingback: Stack based VM vs. Register based VM | Beyond the Geek

  33. Pingback: Dalvik Virtual Machine | Beyond the Geek

  34. Akbar

    Great Article..All clear..

    21/08/2016 at 11:38 PM

  35. the best

    thank very much…

    19/01/2017 at 8:11 PM

  36. Abhijit

    Very good explanation Mark. Thank you very much.

    19/03/2017 at 4:12 PM

  37. Pingback: JDK & JVM – ideveloperWeb

  38. Roh

    Mark you have explained it in a lay man language with appropriate examples. This was really helpful. Thank you!

    24/05/2017 at 8:23 PM

  39. Ashokumar

    Very cool, thanks.

    Can I tell that Windows running c, c++ codes are the system based virtual machines whereas the JVM is process based ?

    20/06/2017 at 11:17 PM

  40. Faisal Nadeem

    great work

    27/07/2017 at 9:46 PM

  41. mohammad

    thanks alot
    very helpful

    07/10/2017 at 2:05 PM

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s