Day 1 Part 2: Introductory Intel x86: Architecture, Assembly, Applications

Day 1 Part 2: Introductory Intel x86: Architecture, Assembly, Applications

So we’re going to go over this example one and uh… what i’m going to have here this is for the key here uh… anything blue that’s going to be the instruction which we just executed so when you see an instruction listed in blue in the next slides that means i’d just executed that and then uh… if you see something in red that means you know as a consequence of that instruction we just modified some value and then the stuff in green is just some arbitrary starting value which i’ve chosen so this is just filling in registers etc. i think i gave them roughly realistic values based on what i saw when we did this when uh… i did this in visual studio to explain what we’re going to be seen here for instance this right here this 4012E8 we’re going to consider this to be so we’re going to start right with this uh… push in main we’re going to pretend again that main is the first well ok here we’re not pretending that we’re saying there is something out of scope some initialization code which gets run before main and now that initialization code has just called main so we said there is going to be some initialization code which uh…

Executes immediately before main and when that code does the instruction call main that call main is going to put a saved return address on to the stack right and so this initial value which i’m pointing out here this is just the address after the call in whatever initialization code ran before main but it’s like outside the scope on i’m just trying to point out here this is something which will be on the stack at the top of the stack right at the very first instruction of main right whoever called main and the saved return address is going to be there in the stack alright so now we’re gonna start it out and we’re going to say our first instruction that we’re going to execute we’re gonna start main.

Main was just called we’re going to execute the instruction push ebp right. so what’s that going to do by uh… by initialization we said that we had the value 12FFB8 that was the ebp when this code started executing that was pointing at the frame of the initialization code which called main so that was some frame which is out of out of context we can see it on our picture here happen before our code was called once our code is called main it pushes the ebp as the first thing maybe this gets to uh… the convention the compiler just automatically generates these first two instructions this push ebp moved esp/ebp push ebp move ebp esp these two are instructions are automatically added by the compiler basically this reminds me i’l get to it in the next instruction we had 12FFB8 sitting here in the ebp register we did push ebp and therefore now we put 12FFB8 onto the stack also we see the modification to esp right here so we see a modified esp because right when the push instruction is called esp always needs to keep pointing at whatever’s on the top of the stark the thing that’s on the top of the stack now is this 12FFB8 that we just pushed esb gets gets uh…

Four subtracted from it going back to the previous slide we said by intialization 12FF6C was the stack pointer right. so the stack pointer was pointing at this green stuff that was our top of our stacked right before we executed the push ebp instruction and when we execute the push ebp instruction it gets subtracted by four ’cause we just put something on the stacks that grows towards lower addresses subtracting when we add something to this stack that was our first instruction the only things that got modified is something that got put on the stack and the stack pointer got moved down to that stays pointing at the top of the stack moving on to the next instruction now the thing which i neglected to put here which i put in my notes before which i still haven’t fixed is on the intel syntax instructions which we’re going to be looking at beginning part of our class with windows stuff the source well so the destination is the thing on the left so uh…

Over bill over at the uh… over at the white board so if we have any instruction so any given instruction this will be the destination on this side and this will be the source over here and for instructions where it takes two parameters like if this was a move instruction right this would be best and this would be source and this could be say eax and maybe ebx so what this is just saying is take ebx register value whatever’s in ebx and put it into eax so that’s what we’re seeing right here we’re seeing move esp to ebp esp ebp take this put it into that that’s all we’re doing now the reason i call this out and why i should’ve called this on the slides is because when we get later on in the class uh…

at&t syntax which is used on like linux systems it’s the exact opposite of that so you you switch the things around so that’ll get a bit confusing then for now it’s just put the thing from the right thing to the left i think of this like algebraic uh… equations and stuff like that right so you’re used to doing like y = x^2 + c you hear something like that right and the destination is over here on the left and so when we switch it around it’ll be opposite but um… the other thing i want to say about this instruction form is that if we have something which takes two things so we have uncovered add yet right but i think you’ll have a good idea of what add does if we did something like add eax and ebx then what this does you know it’s still destination in source but actually the destination is also used as one of the operands so therefore if you have something like and add instruction this would do something like eax equals eax plus ebx right so you just do this plus this and put it still into that destination thing on the left side that’s all i want to say about you know instructions which have two operands you need to have the destination on the left now we get to this move esp to ebp instruction and so for this one basically you take whatever was an esp right 12FF68 and you put it into ebp and so that’s why you have a modified version modified value right here 12FF68 you jut put that into there and this is basically the instruction which is setting up the new stack frame for main.

It’s saying right now i want ebp to point at 12FF68 and therefore everything from 12FF68 and below i’m going to consider that part of my stack frame next instruction this is our call instruction and so it’s saying called sub and sub is this uh… function up here at the top and so what it’s really saying is you know call to the address 401000 that’s the address of the first instruction of sub right 401000 so it’s saying i want you to change the instruction pointer and go to the sub routine and so what’s going to change here uh… as a consequence of this right i said the call as a sort of implicit thing in the background it’s going to be pushing the address of the next instruction right and so what’s the address of the next instruction it’s 401018 right that’s the instruction which is immediately after call instruction and so that’s why we see this 401018 sitting here on the stack question: ‘are these addresses determined at run time or compiled time?’ now so his question for those on the phone was are these addresses determined at compiled-time or run time and so they are run time and they’re generally speaking created by convention on the operating system so the operating system will say i actually let me put that a different way they are set at compile time in some sense because at compile time the linker will put into the binary and this will be something we cover in life of binaries uh…

At compile time it’ll say here’s my preferred virtual address where i want to start like load all of my code starting at this address and that will be somewhere inside the headers for the binary. it’ll say start me at this address and then you know all of my code will be relative to that so in this case may be you know maybe the uh… the binary said start me up 40 but typically by convention on windows x intervals it’ll be 40000 that’s implying that this subroutine is actually hex 1000 into where this thing would have been started so in that sense it’s uh… in that sense it’s set at compile-time in another sense when you deal with things like libraries dlls and stuff like that those can be moved around in memory and you know also you may be where if you know address based layout randomization and stuff like that in newer operating systems and what those do is they just say okay well i know you want to be loaded in that address but you don’t necessarily get to be so i’m gonna move you around until you know up to the next place and therefore in that sense they’re set at runtime so an executable may have a preferred value that it wants to start there but the dos loader can say well i’m just going to put you somewhere else and you’re gonna have to fix yourself up and these addresses can be completely different it’s also run-time in that sense because the operating system will have different conventions for where they start stuff so when we get to linux things you’ll see things more like 08- something something something and that will be the default for executables, as opposed to 004-something… in some cases these will be predictable but you shouldn’t like ideally you should never like hardcode addresses like this you’d ever like want to hard code something like a 401018 to like some other code which is trying to jump there are something like that because these things can move around you know uh…

Executeables at least in windows xp for instance the executable is always guaranteed to get its preferred thing but then the dlls are not guaranteed so those can move all over the place so i don’t believe that’s the case that when we start getting into vista, it’s just basically randomization and then but for things we like linux if they don’t have address address space layout randomization and uh… then you could potentially say that it is predictable for like the executable but not the libraries and stuff like that so you could say that is deterministic so anyways we we said here we just executed that call instruction and what it’s going to do is it’s gonna change the instruction pointer so like i said thus far we’ve just been going blue and then moved down blue than move down blue and then moved down right the point of the call instruction is no longer here we just going to immediately go to that next move instruction right we’re going to go to the sub subroutine that’s at the location 401000 so we’re gonna go to this push instruction next so going to the next slide here we are we you know landed at this 401000 and now when that instruction executes it’s going to get a new push ebp so our ebp was 12FF68 so that was pointing at the stack frame for the uh…

for uh… main and now we’re going to push that save that copy of uh… that stack frame onto the stack and so that’s why we see right here in red 12FF68 simultaneously we also saw the stack pointer moved right so esp got moved down to match where that data got put the data got put at 12FF60 esp just got decrimented by four that’s why esp is red and uh… ebp is copied onto the stack here and that’s why the stack location is red now we move down to the next instruction again this is just uh… standard function prologue kind of thing it uh… it takes the esp and it moves it into ebp so we had esp right here was 12FF60 we moved it into the ebp register so now the ebp is also 12FF60 and that’s basically saying now sub is trying to set up its own stack frame right so sub’s stack frame starts at 12FF60 so it kinda yes so the next slide kind of talks about stack frames and this is like extreme minimalist case kind of here in that sub will have a stack frame that starts right here and they can keep adding stuff and if it had local variables and maybe local variables below this if you do callee-save or if it is going to call some other function it would put it below this for all intents and purposes subs uh…

Stack frame is only ever going to have the saved copy of the previous stack frame right in order to maintain that linked list that we showed before so at this point in the code right after we executed this move esp to ebp sub stack frame is right here because ebp is currently pointing at 12FF60 main’s frame is right here 12FF68 because we can see if we look at our current ebp which is pointing right here we take the value out of memory right there that’s a pointer to the previous stack frame so this points to the previous one 12FF68 and then this one points to the previous one as well 12FFB8 that’s you know somewhere outside of our picture but that’s the sort of linked list that we’re dealing with here so that one points to that one that one points somewhere outside of our picture and above each of those uh…

stack frames right we see the saved instruction pointer that was pushed by call instruction so here’s the link right above it 401018 that’s some saved uh… instruction that’s going to be immediately after some call instruction right above this stack frame right above main stack frame again this 4012E8 which were just saying by convention or by assumption that was the address after whoever called main in the first place so this is kind of just the the stack frame timeout do we have any questions thus far? on what we’ve done to get into sub routine uh… why a stack frame looks like this what the components are of the stack frame anything like that anyone on the phone? any other questions about uh… stack frames as where we are right now? move my mouse. i’ll try to move my mouse a little slower since i can see it doesn’t track very fast well we’re going to continue on because uh… now we’re going to get to this moving hex BEEF to eax so if you remember the original c code go back to that right here’s our original c code once again we said sub routine the only thing it does is it returns hex BEEF and so if we say that by convention the eax register is what you do where you place your return values then we would expect the c code to basically take hex BEEF and stick it into the eax register going back forward that’s exactly what we see here is hex BEEF is moved into the eax register because by convention that’s where you gonna stick your return value so we did that and the only thing that changed here was that our eax uh…

Appear was uh… initialized a hex BEEF thirty two-bit value nothing changed on the stack here but now, so this was like the only instruction of actual functionality for this thing all the code above it in sub is compiler generated automatic set up a stack frame all the code below it is tear down the stack frame and return to the previous caller we did our single line of actual work yes, question ‘is it possible to return more than one value?’ and so typically speaking in you know registers for instance by convention you can put stuff only in this one register and typically what programmers i mean this is even back to c right? in your c program in c++ you may be able to return an object and that’s kind of like returning multiple things right but really in your c program you only return one thing right. you define this function returns an int. this function returns a whatever boolean or something like that right so even the limitation more on c and that uh… exhibits itself here in the assembly so when you return multiple values you typically do that by you know passing by reference right you pass in a pointer and then the code within modifies the data that points to you know that that pointer points to and in that way then the function which called you and passed in that pointer it can still see the modified data right so it typically if you return multiple things you just gonna pass by reference and modify data at pointers now we did our one line of actual work and we’re going to now tear down our stack frame so we’re going to do pop ebp so we took what was stored here at the stack pointer the stack pointer was pointing at 12FF60 and now we popped out that value so the stack pointer when we do a pop we add four to it so that you know we get rid of that value on the stack so now stack pointer it points at 12FF64 right so the top of the stack top of the stack is now this value right here and that value which was in this place which i’m now marking as undefined we did a pop ebp instruction right so that’s saying take whatever’s on the top of the stack put it into the ebp register and that’s why ebp now is 12FF68 that was in the previous slide that was what was there and so when would pop it into the register that’s where it goes and uh…

We get it off the stack next we’re going to do the return instruction there’s two forms a return instruction if there’ll just be a plain return or it’ll be a return with some constant after it and if there’s a constant it means like take that much extra off the stack but we’re not we’re just looking at a plain return right now so for the plain return all it does is it says whatever’s on the top of the stack when this instruction is executed go ahead and take that off of the stack and i’m going to put that into the instruction pointer and that’s where we’re going to go back to next right so what was on the stack 401018 that was at the top of the stack stack pointer was pointing at 12FF64 so take whatever’s at the top the stack and that’s where we’re going to go next right so so what instruction you know this is where we’re at right now we just executed the return instruction uh…

Which instruction are we going to see next (inaudible question) which move? we’ve got lots of moves! okay yes move FOOD into eax right because that’s at the address 401018 that’s where we’re going next, yes. so the question was is there any difference between return and pop esp the difference is if you do pop esp you take whatever’s at the top of the stack and you put it into the stack pointer or you put it into the esp register when you’re doing return it’s actually taking whatever’s at the top of the stack and putting it into the eip register and so i said before uh…

We can’t do like a move directly we can’t use like a move instruction to set eip we can only use these other implicit methods call instruction sets eip return instructions sets eip these are the ways that we can do it if we use a pop we only end up setting esp for instance or some other register when we did the return instruction it popped the thing off the stack into that the eip register and because it’s you know kind of implicitly a pop uh…

We again add four to the stack pointer move it up to 12FF68 that’s why you see esp is 12FF68 that’s now as expected uh… we go down to the move hex FOOD to eax and so the only thing that we see changing in this uh… in this slide is that eax gets set eax was hex BEEF and in this instruction just goes ahead and overwrites it immediately because our regional c code just returned hex FOOD right if we wanted to modify this right if we wanted the original function to return hex BEEF instead of hex FOOD we could just we could just like put nothing after the call to the sub routine and then implicitly the eax from sub would still be set to BEEF when we return from main right so eax is the return register and that would still be set to BEEF but in this case we immediately overrode it functionally nullifying any effects that subroutine would have had and then we just returned FOOD anyways in this case eax got set i moved to the next thing this is again just tearing down or starting to tear down the stack and return to the previous function so pop ebp we took whatever was uh…

what was at esp right so esp pointed at 12FF68 the previous slide and we said esb pointed at 12FF68 and now we’re going to execute popp ebp that says whatever’s at 12FF68 pop it into the ebp register so take it off the stack put into ebp and then you know move this stack pointer up by four so that it’s no longer on the stack essentially so we see ebp got changed restored that frame pointer for the previous the frame of you know whoever called main that frame pointer gets reset back there that somewhere outside of our scope outside of our picture and then we moved the stack pointer up again so that now the stack pointer right before we do the next return instruction the stack pointer is again pointing at this assumed return address which was saved by the call main now we execute the return instruction again the return is like an implicit pop into eip and so we take whatever is that top of the stack which was 4012E8 put it in the eip and then move the stack pointer up by four so 12FF70 is outside of our picture but we just moved it up by four so that now everything’s undefined we consider everything below the stack pointer numerically is undefined any questions on uh…

This example that we went through right now? will be going for one more like this where we do you know every instruction at a time uh… they’ll be more complicated will be passing values et cetera but any questions on uh… instructions we saw in this thus far? people on the phone questions? so just some miscellaneous notes about this subroutine like i said is functionally dead code it’s not doing anything because the only thing it does is put a value in to eax right puts hex BEEF into eax, but main always returns hex FOOD. it doesn’t care what subroutine puts in eax it always returns it’s own thing so if you were to optimize this for instance if you’re to compile this with optimizations uh… it was just straight-out blink it would get rid of all the code for sub it would literally make main be the return main would essentially turn into to move, FOOD into be a eax return so in the optimized form it wouldn’t even bother calling the thing they would say like and i can even call it doesn’t do anything important get rid of it and just you know terms of calling conventions there’s no real difference here because there’s no passing of parameters to the thing there’s no way that this could use like there could be any difference between cdecl versus standard call calling conventions you could force this to be one way or the other but should still get the same assembly code so now actually we’re going to uh…

Go through this sort of in the actual visual studio so that you get a little bit of experience the point here is we’re gonna use visual studio express which is the free version and so the idea here is after this class you know you can go take small snippets of c programs and you can put them in visual studio compile them and then walk through and debug them see what sort of assembly is generated see the uh… modifications to the stack in registers, etc. let’s do the same in the tool you have the things in your slides which of the rough approximation how to do it comes to walk through it anyways so just follow along with what i’m doing on my machine.

on your machine you should have uh… for the people on the phone i think uh… you should all have been uh… given some machine that you’ve been uh… remote desktop into in the actual lab so uh… they’ll be something you can use here or you can just install visual studio like i said in the email and uh… do it yourself on your own machine on your machine there will be a TPL103 intro x86 does everyone have that? go into the intro asm code for class and then open the intro to asm solution file when you do that visual studio come up it’s got to initialize the environment and so i actually don’t have example one in here so we’re going to create example one just so you can see how you can go create your own projects etc. when you want to create something or you can just use scratch pad technically right-click on uh… the solution portion and go to add new project alright so you selected new project go to general and select empty project we’re gonna call this example one general empty project example one right now you’ll get example one up here at the top.

Go ahead and right click on example one right away and go to set as startup project this is basically gonna say whatever thing is in bold in your solution you can have multiple projects in a single solution whichever ones in bold is the one that you actually compiling any given time so set as start up thing now it’s bold and then right click on source file go to add new item and then click on cpp file c++ file and we’re going to do example1.c this could call it that’s the default to compiing it like c instead of c++ shouldn’t make difference but we’ll see so you’re all adding a new c++ file example1.c then hit add you’ve got example one and uh…

i’m a cheat and just copy it in you have to either open your slides on your desktop or copy it in as well i’ll leave this up for a second so that you’ll write it put it in your file and save it i don’t know actually even if they put the slides on your desktop the slides are on your desktop actually so you can grab ’em now or at break or whenever so now we’re actually going to change some of the uh… project properties to simplify the assembly which is generated and bring it back to the kind of assembly that i just made so the default properties of a project you’re going to like add sanity checks and things like that in there so we want to get rid of those so that it’ll just be the simple assembly that we just walked through right click on the example one and go to properties and now under expand the c slash c++ uh…

Tab and under general there’s this debig information format and we want to change that to program database as opposed to uh… program database for editing continue this is just saying uh… it maks it a bit more complicated if uh… if you want it to like be able to just recompile on the fly in like change one line of source code and then like allows it to just compile and sticks it in in whatever you’re running so we don’t want edit and continue we just want regular debug information so set that the program database uh…

Next go down to code generation still under c/c++ under enable c++ exceptions i don’t think this matters at all but i just change it anyways so set that to no no c++ exceptions it shouldn’t matter superstitious go to basic run-time checks and set that to default uh… this is basically just some sanity checks which look for if you corrupted your stacker or if you uh… grab data from a variable which has not been initialized set those to default which actually turns off both of run-time checks and then under buffer security check uh… this is stack cookies or stack areas if you’re aware of that uh… set that to no because that’ll add in code which adds something to the stack and then check set to make sure you know i’ve got to get rid of that as well but it’s good that it’s on by default right um…

okay i think that’s it for c/c++ under advanced set this to compile as c code compile as c code again i don’t think for the super simple code that’s gonna matter at all and you can actually see here right above here this is uh… where it’s setting the defaults calling convention to cdecl right and so we can actually override this and we can say everything’s gonna be standard call or everythings gonna be fast call which we didn’t actually talk about this class but you know these are alternate places where you can force that calling convention or functions and that’s it for c c++ go to linker and then…

General and there’s this thing called enable incremental linking we want to set that to no because uh… again that’s just something where it’s going to break up the code so went if we don’t disabled this and when we call the function it’s not going to call directly to its gonna call to some jump which then jumps to the thing so just for simplicity sake we’re gonna get rid of that enable incrementally so then go ahead and hit apply in hit ok and so now uh…

Gonna talk about how you can right so when you want to actually compile this code you can do to couple ways you can right click on example one and then the first option there is build that’ll uh… just go ahead and compile it the output down here at the bottom it’ll say one succeeded so either right click on that and hit build or under build up here you can build solution that’ll build everything and all those projects or just build example one i just generally right click and hit build but so that succeeded there so we’ve got the stuff compiled now uh… we want to actually start stepping through one assembly instruction at a time like we’re just doing the notes so what we’re gonna do his first we’re going to set a breakpoint so to do that visual studio you click on the left hand side of the c code that you want to break at it’ll create this little red bubble and that’ll say you know when i get to the line into main, stop there so just go ahead and set a break right next to int main and then we’re going to go ahead and go up to debug and start debugging or just hit F5 when you do that you start debugging you know it’ll pop up a little window in the background if there was anything that was going to be printed out or anything like that that would show up there but you don’t care about that for this particular code for right now you get a little arrow in your bubble now it says i’m stopped right here at main this now brings up your debugging interface up here at uh…

This top bar so this little play button is continue that says like just run the code until you hit another breakpoint or until you exit the code this right here break all this is kinda like well this is like a pause in this is if you have multiple threads running or something you are pause every one all the threads just say stop wherever you are right now stop is just stop debugging entirely exit program this is restart so if you steo too far and you want to restart and go back to the first breakpoint or something it’ll stop it it’ll debug it again this uh… little arrow right here sometimes if you have big code it’s not going to matter for this class but if you have big code when you’re all over the place you don’t know where you’re at you press this little arrow to get back to where ever you’re currently broken at and then these are the uh…

Three most important things that we need to know any debugger is going to have some variation on these three capabilities there’s ‘step into’ which says i want to go to the target of things like call instructions or jump instructions so i want to step into the call the function which you are calling so for instance if i you know if i do ‘step into’ once right here it’s gonna bring me to the sub instruction if i do ‘step into’ again now i’m going to be at the sub instruction if i restart this and i try again this other option is called ‘step over’ that means i don’t want to go into this function i don’t want to go into some jump i want to like just execute the entire subroutine and then come back to like the next instruction either the next instruction or the next the c statement something like that so if i did ‘step over’ here and i did ‘step over’ this sub instead of the next place i’m breaking being the in sub it just did all of that it went and executed all that code and then it stopped me after all that code was done executing ‘step into’ and ‘step over’ are very common you can see those in all debuggers and then restarting again uh…

If i did a ‘step into’ and i stepped into the sub routine for instance then there’s frequently the option to step out of the sub routine so you may go into sub routine and you may decide okay this is boring i don’t think this matters to what i’m looking for and so then you can do step ‘step out’ of the sub routine and that just basically says continue until this would have returned and destroyed the stack frame and executed a return instruction essentially you know now i may be in in sub routine but if i hit the ‘step out’ it’ll go ahead and step out of sub routine and go to the next statement after that subroutine would have been completed so we’re gonna see those again in gdb when we get into linux things but ‘step in’ to ‘step out’ of and step ‘stepped in’ to ‘step over’ and ‘step out’ of are the three most common sort of things which every debugger should have now we’ve seen a little bit about stepping at the c level right but we want to step that the assembly level so what we’re going to do it is we’ll go ahead and debug it starter debugging you’ll be broken at main right now right click and select goto disassembly what that’s going to bring up now is an intermixed of you where you can see this c statements and you can see the corresponding assembly statements so for instance i’m broken at the beginning of main but we know that the compiler automatically generates these two instructions worth of stack frame creation but at the beginning before any of the rest of you know whatever you’re going to do when the debugger breaks actually breaks at the first destruction of main which happens to be you know those automatic stack frame creation instructions and then you know you can see ok well this c-statement call sub routine looks like this actual assembly statement call subroutine right and all that so this is uh…

How you can view with the assembly instructions you can create a new little c program you wanted here compile it break on it and then when you’re already at a break point hit goto disassembly and now you can see all the rest of the assembly so now what we need to do for instance is we want to show you like what the stack looks like we’ll show you where the registers of changing things like that actually i guess before that i should point out we already have this little call stack view right here this is what i was talking about before where where the debugger will frequently show you a call stack it saying you know main was actually called by underscore underscore t main c_r_t_ like c runtime which was called by main’s c runtime start-up so the point is there’s some initialization code which happens before main is ever called uh…

And this actually shows you where that code was in our previous slide we kinda said somebody somewhere called main that’s where we care that it starts but this will show you and then it’ll actually say okay will actually kernel thirty two.dll actually executed even before that right and so and below that we don’t even know where and so basically probably the kernel past control to kernel thirty-two dot dll which is a user space dll actually despite being called kernel thirty-two that’s just a user space library and so maybe the kernel passed it to that user space library which passed it to the initialization which passed it to your real thing uh… but you can’t see the call stacked back into the kernel for instance you only see it for user space anyways uh… so there’s a couple of things i want to set up in terms of windows so one i’ll just put up right here there’s autos window that’s going to say whatever instruction uh…

Whatever registers are going to be changed by this current instruction it’s going to show you them right now it it automatically knows that ebp is going to be changed based on this instruction right so it’s a push ebp and that we know that ebp, sorry it’s not going to be changed rather it’s telling you the value that in that thing so you can see uh…

You know 12FFV8 is going to be pushed onto the stack it’ll just tell you the minimal amount of information you need to know to kind of understand what’s going to happen in this current instruction but uh… we want to know sort of everything that’s gonna all the register values so one way we can do that is there’s this watch tab which everyone should have in the watch tab we can actually put in all of our registers so we could just type the eax and that can tell us that ebx, etc. ecx and so we can do that and this will tell us are the values each of those registers right now and actually what you probably want to do then is you want to right-click and select hexadecimal display because we don’t like looking at things and that’s what we want them to go bits to nibbles and stuff like that back and forth so when you select hexadecimal display you’ll see okay eax is 00343818 so this is one way that you can see it but uh…

but i’m going to recommend instead this is plenty fine but there’s a separate if you go to the bug and windows and registers or if you just hit alt-5 that’ll bring up a separate window that’ll show all your registers all at once so that’s this window registers window so as you’re going through anytime a register changes on this it’ll be highlighted in red so for instance i’m gonna execute a single instruction and i’m going to execute this uh… push ebp uh… what registers and what registers we expect are going to change based on this instruction (inaudible question) did you step through it all ready he stuck through it so uh… would you say so yes ebp? no not ebp. esp. i’m going to step you know ‘step over’ or ‘step into’ it doesn’t matter for this instruction i’m going to uh… step over this instruction and what i expect to be highlighted in red is going to be eip because it’s going to change in its gonna point at the next instruction right and then esp because i put something on the stack and the stack pointer still needs to point out the new top of the stack so go ahead and hit ‘step over’ or ‘step into’ whatever you want right and we see eip changed and esp changed right so now we can at least watch registers change like all of the registers change all at once so the next thing i’m going to recommend is go to debug and windows and then memory and then memory one or just hit alt-6 and this is going to bring up a memory window where we can basically look at any arbitrary memory address i’m going to recommend to you drag that window over the uh…

over to the side i messed up my windows take your tab for the memory window and drag the tab so that it uh… gets place like this little highlighting will pop up you wanted to be basically placed uh… to the side of these other tabs so like that so if you grab the tab you drag it until you see the little says put it right side of your you like that you want this little split screen view going on that memory windows over here and register windows over here so that you can see them both at the same time this i’m actually gonna expand my window a little bit you know what you uh… make your window a little wider so that this columns auto shows up over your memory window yes it’s absolutely easy to to get these elements that uh… okay that memory and procedures that’s good enough pull grab that right there when you get a register in memory in good enough that now the register’s tab the slides you have um…

printed out for you were talking about is like instructions exactly that i’m doing uh… and you can also just pull it out and like i would be a separate window off to the side floating somewhere but the point is we just want to be able to see uh… memory and registers at the same time see them both changing right now we’re gonna change on the formatting of the uh… the memory windows right now it’ll just display byte size memory anywhere you give it you know given any address that’s fine we want it we’re going to change this to be uh…

watching our stack change so we’re going to set the address esp and press return and we’ll set that to be the address there’s this little light double green a rose in between curly brackets kind of uh… thing right here and if you highlighted it’ll say reevaluate automatically you wanna click on that once and then it’ll change the address back to being the symbolic esp at this point address should say esp and and this little reevaluate automatically double arrow thing should be clicked and it should have sort of a box around it and then we’re going to right click on all of this uh…

Hex display of memory we’re going to change that to four byte integers so for display four bytes at a time instead of one byte at a time 0:50:24.809,0:50:27.399 so select that and now we’ve got four byte things and then under columns i want you to set that just to one so now what we’re gonna have is this is essentially going to be our stack any questions? right so this is essentially just going to be our stack actually i’m going to dim the lights all the way i know that the lights in here… it’s hard to see it’s washed out maybe that’ll help bit esp set here click the reevaluate automatically columns to one now what we essentially have is that’s gonna be our stack pointer you do not want me to turn off the lights? it’s just kind of washed out it’s not so hopefully everyone else has caught up by now so now this is essentially going to be always whatever’s displayed in this window will be the top of the stack but the problem is like i said you have to have mental flexibility here they’re going to display low addresses high and high addresses is low so this is the real upside-down stack from how i’m going to be drawing it but uh…

The point here is this is the top of the stock right now i just did a push ebp right now just did that instruction and it took whatever was in ebp which right here it says is 12FFB8 and i push that on to the top of the stack so now at least you know we like it better this way at least now the top of the stack is the top i put that on the top of the stack now i’m going to execute the next instruction move esp which is 12FF68 into evp down in my little registers window expect the eip always change like every single instruction now I expect to see changing is this ebp as right now it’s 12FFB8 and i’m going to put 12FF68 into it ‘step over’ and ebp is 12FF68 changed as expected nothing changed on the stack we still point pat this about that stack at address alright so now we want to step into this call instruction right so we want to see that the eip gets set to 401000 when we do the call instructions so we ‘step into’ that we need to go to the first instruction of the sub routine which is 401000 see eip got set to that down there and now esp changed here right so why is esp in red right here so i just executed a call instruction what’s the side-effect of a call instruction you are putting the stack pointer four lower but why are you putting it four lower so what sort of thing that call instruction puts onto the stack it puts the address the next instruction after the call so you get back from a call is there’s going to be the address of where you get back to basically right so if you look at the top the stack you see that 401018 and what that is is the address if we go in our assembly window and we go back down here we see ok will the call address 401013 but the address immediately after it was 401018 right so every call instruction it says whatever the next address is going to be stick that on to the top of the stack and then you know go to whereever my target is in this case the target was 401000 and again just remind you that like scroll down i don’t know where i am right now hit this little yellow arrow it’ll bring me back up to wherever my break is currently set at so now we’re in subroutine we’re going to do the common frame pointer setup right gotta save the previous frame pointer and then you know movie esp to ebp to make our new frame pointer the first thing we do is push ebp and so we’ll go ahead and do that step into a step over doesn’t matter for this instruction you do that and then it pushes the ebp on top of the stack is my stack still correct so i somehow messed up my stack but whatever did anyone else’s stack just get messed up there? that’s probably what happened i accidentally had automatic update so anyways we executed push ebp and so what goes on the top of the stack a copy of ebp so ebp is 12FF68 therefore 12FF68 goes on top of the stack now we’re going to do…

we saw the esp changed right because we put something on the stack so you decrimented esp by four right so it was 12FF64 but now we added something onto the top of the stack in that means yu decriment by four so you get 12FF60 and that’s why esp is 12FF60 as we grow towards lower addresses so now we’re just gonna move whatever’s in esp register into ebp register right this is saying you know wherever the top of the stack is right now so that’s what esp is wherever the top of the stock is pointing i want that to be my ebp to say like this is where i want my new frame pointer to be right step over that what changes? the ebp changes so it saying right now the ebp is always pointing at the top of my you except for this little interim area you know where you doing these two instructions ebp is always pointing at the top of the current stack frame right except for that transition those two instructions are basically the transition stage by the time you get done with this instruction the move esp to ebp you’re back to the normal case where ebp always points at the top of the current stock all right so then we have moving this immediate value hex BEEF but you know in thirty two always displaying sixteen bits of it but it’s actually moving the thirty two-bit value into thirty two-bit eax register so when we execute this instruction we expect to be eax to be updated to hex BEEF right so eax is now red eip changed as normal and now that’s all we did now it’s time to tear down the stock frame exit the sub routine we pop ebp because we haven’t like moved this stack at all we have moved the stack frame haven’t added any local variables nothing else top of the stack frame the only thing there is the pointer to the previous stack frame go ahead pop it off rights so 12FF68 we’re gonna pop that into the key ebp register right whatever’s on the top so step over it took that put it into the ebp register 12FF68 and then it moved the esp again right because it’s a pop instruction it takes it off the stack of what that means is you gotta move the stack pointer so that it doesn’t look like that’s there anymore so we added four to the stack order when we take something off so we subtract from the stack pointer where to put things on add to the stack pointer to take things off so now the stack pointer points to 12FF64 which looks like you know a saved return address to me uh…

And so yes work here at the return instruction and what does it do it says whatever’s at the top of the stack whatever esp is pointing right now take that pop it off and put it into the eip register right so popping off means we’re gonna change the stack order being it’s a return we’re gonna change the eip to whatever that is and what that is right now is 401018 so the next instruction after i step over this instruction better be 401018 step it and we are back at 401018 which is you know if the place after the call instruction the reason and the whole point of the call instruction is pushing the address of the next the next uh… instruction is just so you can get back to it when you’re done with that subroutine so we return from the subroutine were sitting here ready to move hex FOOD into eax you know this is really just more of the same that you’ve already seen you move hex FOOD into eax nothing changes on the stack only thing that changes is the eax register and then go ahead and pop te saved stack frame pointer off of the top of the stack right here 12FFB8 pop that off then we’re gonna return right and so the one thing you can kind of see here is um…

you can kind of thanks to you know the conventions that windows is using here you can see that the code is roughly speaking in the range 0040- something and the stack is roughly in the range 0012- something so when you see things on the stack you can kind of think you know either this address at this address looks like a twelve left out something maybe it’s a stack frame pointer maybe it’s just you know some sort of stack address if you see something that starts with four zero whatever maybe it’s just some sort code address right so you can kind of keep as a simplistic notion here you can keep your stack addresses and code addresses kind of different because they start by different conventions which are sufficiently uh… sufficiently separated so anyways we’re going to tear down the stack frame for main pop this saved pointer back into ebp right so execute that 12FFB8 goes into ebp and then we’re gonna execute the return instruction gonna take whatever’s on the top of the stack put that into eip add four to the stack pointer so that we get rid of whatever’s on the top of the stack it’s like a pop and then it’s saying execute that it’ll say i don’t know where the source code is for where you’re trying to go can you please tell me where the source code is instead goers excuses either i don’t have the source code but here you go we returned out of main and we’re now in the assembly for the guy who called me right and so we could keep going backwards if we want to but you can see the function which called main there was literally an instructions called main somewhere in there right and as far as we’re concerned in this class we don’t care about initialization and all that that happens before after main we just care we’ve got to main some stuff happened something happened before something happened after i don’t care about that but you know when you’re a reverse engineer this is like what you see easy just a bunch of assembly and you don’t see something like even main you just have to figure out you know what is this in the context of everything else so if we want to start reverse engineering the c run-time library we can start stepping through this but we’re not gonna do that for now so as far as were concerned that’s done main returned main added value in eax FOOD right and that value is still there when we return out of main and so now if the guy who called main wants to check what it returned which it will it can go ahead and look at eax and now i’m just going to hit play to let it continue and you see down here it says the program example one native has executed exited with code 61453 aka hex FOOD in uh…

Hexadecimal so saying main returned FOOD any questions about uh… this example how to step through organization inside windows things like that we’re going to but see okay it’s eleven thirty so we’re gonna to see what’s up i think we’re could go into the example two uh… i think we have enough time i’m gonna be breaking at like in fifty five since people are gonna be wanting to go to uh… a talk here talk here but uh… we’re going to go into the next example and will get as far as we can twenty minutes ish so back to these slides for a second here you don’t have to go back you can stay in visual studio did a good example two next but uh… so that was are very simple thing yaar sub routine had no parameters that it took as input all it did was call function which returned something we didn’t even care what the return value was we just always return FOOD so we did that in the program just to see how you can take other things to do it and so now we’re going to do that more complicated version not even the same version just a different more complicated example code example two now we got example two and again we should be uh…

super surprises here in terms of c stuff and trying to figure out what it’s doing right so we’ve got some c code now receive takes input actually c takes uh… parameters so it takes argc which is you know the number of arguments passed into it like on the command line right so we’re gonna think this is a command line program it’s going to take in some command line parameters argc is the number of parameters and argv is a pointer to an array of pointers each of which points at a string i’ll show that on the board here in a second would that work 1:04:42.079,1:04:44.309 and then uh… so point is by convention with these command-line programs the first entry of argv, argv of zero is going to be the name of the program itself so in this case like example2.exe so argv of zero is going to be the example2.exe argv of one is gonna be first parameter that you pass to you do example2.exe you’ll one space two space three argv of one would be one in that example because i think one is my first command line parameter this main has a single local variable ‘a’ and what it does it uses the function atoi takes a string so ‘a’ implies string here string to ‘i’ integer that’s going to take a string and turn it into an integer a signed integer i believe yes and so this ‘a’ is implicitly a signed integer and i didn’t see unsigned ‘a’ uh…

And so it’s going to say whatever the first thing is you passing on the command line whatever number you pass on the command line that argv thing is actually going to be a string it’s not going to be like actual number it’s going to be the string you know ascii character for one ascii character for zero ascii character for zero and then literal zero in null character right now it’s how you have a string one zero zero uh…

And so atoi takes that string input and that’s going to turn it into the actual number one hundred which you can put into by thirty two-bit value it’s there just fine so it’s going to turn the string into a number and store it into the ‘a’ and then it’s going to call sub and then whatever sub returns it’s going to return that right so it’s implicitly it’s like the same thing is if you did if you would have taken to return value from sub and put it into a variable and done return variable right you can just call return in sub and that says whatever sub returns i’m going to return that as well and so what sub is going to take because i think i think i’ll have enough explaining of this that we won’t get into exactly before we break for lunch so what sub is going to take is one parameters going to be argc if that’s going to be the count of the number of parameter you passed in and uh…

Just for clarification like i said by convention since argv of zero is always the name of the parameter argc is always greater than one right you have at least one parameter always is the name of the function the name of the executable itself and so if i pass a parameter on the command line argc would be two right argv of one would be whatever i passed so sub routine takes its first parameter argc whatever that happens to be and from how many parameters you pass in on the command line and then ‘a’ which is the numeric form of whatever the first thing is we’ve passed on the command line and sub just does some arbitrary map on it it takes you know it says okay i’m going to take take the first thing i would call ‘x’ and we take the second thing i would call a variable ‘y’ and i’m going to do 2 * (x+y) right so it’s going to do some math and return whatever that calculation is and then when it returns that main is going to just return as well so this is place where we’re gonna see you know that value whatever 2 * (x+y) that’s gonna get put in to eax it’s going to return before main returns it’s not going to modify eax at all right it’s going to just leave whatever’s there there so that it returns the same thing during return the return value of the south right and now i just want you for clarification purposes because it’ll definitely matter for all disassembly uh…

I’ll kind of draw a picture of how the argv argc and stuff like that works bill over to the uh… white board so i’m going to drive the ah… the stack frame for this c here i’ll draw it pretty low so that we have enough room so again we’re going to say when main starts whoever called main pushed to the parameters two main right to left onto the stack those parameters are the pointer to the argv but it’s ok it’s a pointer to a pointer-right and the argc so we know on the stack there’s going to be some argv pointer and an argc so we’re gonna say that uh… we’ll say main stack frame will start here about okay we’re gonna say this is this picture’s going to be before we execute any instructions in main so you may guess that like the first instruction main is pushing ebp right we’re not even going to say that the push ebp is executed right now all we’ve had is the call main has executed right and because the call main has been called the address of the instruction after that call instruction will be on the stack as the thing that’s on the top of the stack listing of my picture so this is going to be saved eip of the instruction after it’ll be the instruction after the call the person who called main and for people got for people on the uh…

Remote vpc thing i highly recommend you draw all of the uh… stack frames and stuff that we do even want to go back over and make sure everything makes sense in this thing is not in the slides anywhere else so this is the saved eip of the instruction after call main and so before that before it calls main it pushes the parameters on to the stack so right here we would have argc which way i’ll put it argc and we’ll say that this is equal to two for instance like this will just be like there and we’re gonna pretend that i’m going to execute example we’re gonna pretend i’m executing from the the command line example2.exe and then one hundred or something like that so this is like some command line program which run example2.exe one hundred argc is gonna be two and this is going to be argv of zero this is going to be argv of one and will show that when we show these things so right here this is going to be the other parameter passed to main this is going to be char star star argv and i’ll put in some fake addresses in a second this thing points somewhere where there’s going to be an array of pointers to strings so what i’m going to say is you know we don’t care about what’s in here but this thing is going to hold the address somewhere up here on the stack which is where uh…

Typically by convention when when the stuff from from command line parameters gets passed it’s going to be so somewhere up higher up on the stack this argv thing is going to be pointing somewhere up here and this is going to be argv of zero and this is going to be argv of one but each of these is a character pointer so so each of these is the pointer to some strings somewhere else so we know that uh… string typically you know you have a character pointer which points at some sequence of bytes which represents the string in reality these are then even going to point like somewhere else farther up on the stack them for their and the the point somewhere up there and somewhere all parameters back i’m going to have the byte sequence this’ll be exam the and ple2 dot dot dot eventually terminating at the top right so somewhere there’s going to be a series of bytes which terminates in a null character and this guy points at that one that guy point at the second one somewhere else much farther up on the stack and so i’m going to put in real values for each of these when we go over this actual assembly code because what you’ll see is you know when it accesses is you know argv of zero or something like that or argv one right in our c code we see that argv of one so u actually gonna see if like take this value read the value where that points to that’ll be point to the address of this that’ll be plus four to get to argv one and then it’ll take that pointer you’ll say push that because that’s the actual pointer to the spring stuff like that so you’ll see that later but uh…

This is the rough layout of the way that argv argc kind of stuff works this is all set up either by like the operating system or i think it’s still the operating system that’s responsible for that and then you know the c the c initialization code may get set up later but uh… this stuff is on the stock already i think uh… have confirm that pretty sure though from working on the linux stuff with the os that’s all that up as well as environment variables were way up there and stuff like that and then calls the initialization initialization code is responsible for pushing these two things before it calls main but like all that’s already there so the initialization code just puts these before calls main and like i said i don’t want to get into this before we go on break so that some other stuff we have to cover before we get into it.

A couple more instructions you’re gonna have to see in order to actually understand everything that’s going on here so now i’m going to talk about what i’m calling the r/m32 form so uh… i get this notion of r/m32 from the uh… intel manuals uh… there’s a more complicated name for it but you know i don’t want it use that that throughout because it’s not as convenient what the r/32 from means is there is the sequence of increasingly complex ways that you can specify access to register or memory like so the single form can specify just plain register access but if you’re doing that you don’t so if you can access either just plain registers or memory and so in intel syntax this will change later when we talk about uh…

At&t syntax but in intel syntax what you can think of is with the exception of one instruction which we’re gonna talk about next if you ever seen these angle brackets enclosing some register like this ebx that means you’re going to memory so it means you’re going to be taking that and it goes to that as a memory address and it holds that value to memory so anyways i’m getting ahead of myself in the r/m32 form r/m32 value could just translate to a plain old register so you could have move ebx to eax and there you just taking whatever the number is in ebx put in that number into the eax register we go sources on this side sources on the right destination on the left now when we get more complex with it and we start seeing angle brackets what that means is like i said take ebx treat it like a memory address go to memory read four bytes from memory and put back in eax right so i said we have registered a memory memory to register forms of move instruction so that that brackets could be on one side ot it can be on the other side but it can’t be on both sides cuz i said there’s no memory to memory so you’ll see angle brackets on one side or the other side so in the simple form you know it’s just treat it like a straight-up address go there get the memory get the contents of memory stick it in the register now it gets more complex when we add in a form like this so we say ebx plus ecx times some constant which can be one two four or eight treat all of the stuff inside the angle brackets as an address so calculate whatever address that is so if ebx is one and ecx is you know 400000 add those together and multiply that whatever you know what say x is one here you know just do the calculation for whatever is inside the angle brackets put it together go to memory at that resulting address pull it out to get into the register and then it can get even more complex in that you can have that form plus another constant offset from here but there is an actual sane reason why they’re doing this alright the way that they call this these forms is that you have base plus index times scale plus displacement all right so what you can think about here it is let’s say you were trying to walk through an array for instance right you’ve got an array of values all sequentially uh…

Put into memory so the base in this case may be the base of the array you may have you know this ebx here you may set that to be the base of your array so my array starts here and take the address of the start of the array stick that into ebx ecx now could be the index in an array right so you have index zero index one index two right but then the reason we have this x constant there is because that could be the size of your elements of the array right it could be an array of bytes in which case you would want to be based plus zero times one which gives you base plus zero that’s the zeroth byte in a byte array base plus one times one that gives you base plus one at the index of the oneth element in a byte array similarly you could have you know four byte arrays right so there’s four bytes four bytes four bytes and so again base plus zero times four zeroth entry base plus zero times of uh…

One times four so base plus four that’s your oneth entry in an array and therefore are you can see how that form can help you step through the elements of an array and then this next form with the displacement uh… that could potentially start getting into multi dimensional arrays right so remember that multi-dimensional arrays are really just array lined up after each other right so that would mean the only help you with uh… well that could help you with multi-dimensional arrays i guess the way i explained it in the first class is more like well yeah so multi-dimensional arrays i was thinking like you know let’s say you have um… c code where you defined two arrays in your own c code and you do like you know my array one and you call that say you know ten bytes my array two you call that ten bytes the code might you know start from the base of one of them and then index in to it sequentially and then later on it made you like base plus displacement to get to the second array or something like that right so you can use the displacement like just jump up to a second-array where you can use it for like multi-dimensional array if you have one array uh…

It’s basically any case where you have one array after another array or any other reason why you’d want just completely jump where you’re sitting in memory everything xxx side soothing altogether so the question was like does displacement for instance the used for like if the operating system decides to move around where code is going to be for instance and no it wouldn’t get used in that case essentially uh…

In those cases like the operating system is literally is just taking the data and copying it to one place or another it really is more like if you have continuous data and you wanna like maybe offset past one array to start accessing the second array or uh… there’s another thing i think in like um… theoretically using segmentation but not really sort of closer we talked about segmentation and intermediate think of it mostly in terms of contiguous to raise you may want it displaced passed one but really you can use it for anything this doesn’t just have to be used for array access and stuff like that use it for anything it’s just that’s the reason why the sort of make sense to have base plus index time scale and that may be displayed question over there? yes generates actions still just a multi-dimensional array so that’s the r/m32 form and okay so this is the exception to the rule that i said where you have angle brackets you go to memory at that address LEA is low defective address instruction whereas move will take the thing go through the address and move from memory to data uh…

To register LEA basically just says calculate that address that’s inside of the angle brackets and stick that in a register don’t go to memory there just stick that in register so LEA is frequently used for like getting the addresses of pointers for instance right lets say instead of grabbing the value in like some array you want to know what’s the address of the value at the fourth element of some array you want the address you don’t want to go to memory grab the data at that address LEA for instance would be used to calculate addresses and you don’t go to memory.

So this is the exception to this angle brackets for r/m32 forms right so potentially the question while the statement was you know potentially in this example if you just want to find where the base address of example where the string example two is you might use LEA but for practical purposes since you already have that address stored argv zero you’d actually move that address out of argv zero but you could potentially use it that way so that’s LEA it’s basically calculate an address and then add in sub you know will come back and go over this and let you guys go now uh… we’ll be back at uh… one o’clock but you know add in sub it’s what you think it is it’s addition subtraction back at one o’clock any questions from any one of the uh…

remote side you know i’m going to quiz you for those of you at the site which i am at so it’s best to you ask your questions now.

Click here for more details

    JVZoo Product Feed

  • Heal Low Back Pain Now and Forever PLR The Main eBook is called "Heal Low Back Pain Now and Forever PLR" where we discuss the root to low back pain. It reminds us of our Mind/Body connection and encourages us to look within to know thy self. This eBook contain a Practical approach to dealing wi
  • SyndLab Booster Page 1 Rankings For BOTH Video and Niche Sites Made EASY With Automatic Syndication!
  • iGloo Premium Igloo enables you to create and launch an online business with all the marketing pages you need to convert traffic into sales.
  • ProfitBuilder - Standard Generate leads and increase revenue using the #1 drag and drop landing page builder for WordPress with it's full suite of powerful tools for marketers...
Share