How compilers work
Posted: July 30th, 2012, 5:06 pm
A couple of days ago my friend asked me this question. Basically, he couldnt understand how it happens that English words, commonly used in high programming languages turn into machine code. I got a little interested in the topic and asked my friend, who codes, and what I received is this message:
"From a philosophical standpoint, the first thing to be aware of is that everything is Data. Both data (an MP3 file for example) and a program (Firefox browser) are Data and stored in computer's Memory (not to be confused with storage, like hard drive) indistinguishable from each other. This is called Von Neumann architecture, after the genius who came up with a lot of really great stuff. This is also why viruses and other hacks are possible and why programs sometimes crash. The computer is fooled to execute data as a program when they accidentally or on purpose read Data from Memory from a wrong place.
In essence, what a compiler does is translate a piece of data that somewhat resembles English (a text file having source code in it) to instructions for a computer to do things. A while ago there was a nice Turing machine as Google's front page, which nicely illustrates some aspects of this. The paper is Data and the Computer reads the next icon and executes whatever is there and it has. However, here the memory and code are isolated from each other.
So, taking the Google Turing machine example the compiler converts English sentences "Read the next bit. If it's 0 go down" to symbols for the computer. Why this conversion does need to happen is beyond me. Probably because the CPU ultimately works on 0s and 1s and the compiler knows exactly the best order of those 0s and 1s so that the program as written (but, not necessarily, as intended) by the author runs best.
The closest programming language to the "metal" is Assembly language, which is literally just instructions to the CPU to read certain part of memory and jump to certain point of memory and write to certain point of memory. In Assembly language, each instruction from the programmer translates 1:1 to a operating instruction for the CPU.
In more traditional programming languages, I'm quite sure what the compiler does is to turn the semi-English sentences like (if x>0 then print "hello world") to operating instructions for the CPU. How it happens is way beyond my understanding."
As I said above I got a little interested in the topic and found myself immersed in the magical world of kernels, hardware abstraction layers and similar stuff, eventually finishing as far as reading how teleprinter works. I don't understand a half of what I read, regardless of language I read article in, but I noticed that even experts don't fully understand the nature of this all. I asked another of my friend, who also codes and his explanation was a bit more clear.
What he said is that the first ever compiler was simply coded in machine language (according to Wikipedia the first ever compiler is A-0 System). Whether next compilers were also coded in machine language or used the technology of the first compiler I don't know, but I felt interested enough to gather more informations from available sources.
Several Wikipedia entries may be useful if you become interested in the topic:
FLOW-MATIC, History of compiler construction, Abstraction layer and many more.
"From a philosophical standpoint, the first thing to be aware of is that everything is Data. Both data (an MP3 file for example) and a program (Firefox browser) are Data and stored in computer's Memory (not to be confused with storage, like hard drive) indistinguishable from each other. This is called Von Neumann architecture, after the genius who came up with a lot of really great stuff. This is also why viruses and other hacks are possible and why programs sometimes crash. The computer is fooled to execute data as a program when they accidentally or on purpose read Data from Memory from a wrong place.
In essence, what a compiler does is translate a piece of data that somewhat resembles English (a text file having source code in it) to instructions for a computer to do things. A while ago there was a nice Turing machine as Google's front page, which nicely illustrates some aspects of this. The paper is Data and the Computer reads the next icon and executes whatever is there and it has. However, here the memory and code are isolated from each other.
So, taking the Google Turing machine example the compiler converts English sentences "Read the next bit. If it's 0 go down" to symbols for the computer. Why this conversion does need to happen is beyond me. Probably because the CPU ultimately works on 0s and 1s and the compiler knows exactly the best order of those 0s and 1s so that the program as written (but, not necessarily, as intended) by the author runs best.
The closest programming language to the "metal" is Assembly language, which is literally just instructions to the CPU to read certain part of memory and jump to certain point of memory and write to certain point of memory. In Assembly language, each instruction from the programmer translates 1:1 to a operating instruction for the CPU.
In more traditional programming languages, I'm quite sure what the compiler does is to turn the semi-English sentences like (if x>0 then print "hello world") to operating instructions for the CPU. How it happens is way beyond my understanding."
As I said above I got a little interested in the topic and found myself immersed in the magical world of kernels, hardware abstraction layers and similar stuff, eventually finishing as far as reading how teleprinter works. I don't understand a half of what I read, regardless of language I read article in, but I noticed that even experts don't fully understand the nature of this all. I asked another of my friend, who also codes and his explanation was a bit more clear.
What he said is that the first ever compiler was simply coded in machine language (according to Wikipedia the first ever compiler is A-0 System). Whether next compilers were also coded in machine language or used the technology of the first compiler I don't know, but I felt interested enough to gather more informations from available sources.
Several Wikipedia entries may be useful if you become interested in the topic:
FLOW-MATIC, History of compiler construction, Abstraction layer and many more.