There is the original unix Operating System or OS and its direct successors from Bell Labs that I will call UNIX, with upper case to distinguish it from the plethora of clones, variations, look-a-likes etc., that I'll call unix as a generic operating system of a type.
The important characteristics of a unix OS that make it the premiere OS for the internet are that it is a Multiuser, Multitasking OS. A PC type computer running DOS (Disk Operating System) handles one user running one program (Some modern DOS versions now allow multitasking). A PC type computer running unix, on the other hand, can handle many users each running several programs "simultaneously", or at least, so it would appear to the users. A single CPU chip (Your Intel 80386, 80486 or pentium, e.g.) can actually perform only one task at any one time. What is actually happening is that unix uses a concept of process time slicing. I'll try to explain how this works in its simplest form; but we'll need an explanation of the distinction between 'program' and 'process' and a few other things.
After you log into a shell account you are interacting with a process that is a 'shell'. The original shell is a program (a set of instructions) which is caused to 'run' or 'become process' when you log in. The shell process is actually your personal abassador to the actual unix operating system which is called the kernel process. The kernel is a program which begins running when the computer is booted, is locked in memory and runs always. If it ceases to run, the operating system crashes. The kernel process is the major domo which handles user requests for CPU time and communications with peripheral devices: hard disks, floppy drives, tape drives, terminals, modems, etc. The kernel, of course, is responsible for time slicing. Let's say you, as a user, want to get a listing of the contents of your current directory. The thing that does this for you is a program called 'ls' in unix. This is actually the name of a binary file sitting in a standard directory that contains many other such binary files. These files are generally kept on a hard disk which houses the unix "hierarchical file system". Sitting on the hard disk, a program does exactly nothing. The user enters the name of the program. The shell process knows the possible locations for program files and attempts to find the file with the given name. If successful, it makes a request of the kernel process to execute (run) the program as process for you; 'for you', meaning attached to the terminal or serial line on which you are logged in, so if the process has output it is directed to your screen, not someone else's. The shell also passes on other relevent account information. Contraindications for the process, as lack of permission have already been checked by the shell; so, the kernel schedules your request. It is good to picture other people making requests, and there are certain other processes that run continuously, "in the background", i.e., not attached to any terminal, certain of these are called 'daemons'. Running a program as process, means executing a sequence of instructions, the variety of which is rather complex and not worth dwelling on here. The simplest might be getting a character from the user, or printing a message to the user's screen.
When the kernel runs a program it copies the instructions in the binary file into the volatile (but fast) RAM memory associated with the CPU. The program has become process. See below for a description of the real life complications of this.
The kernel, being a good robot, wants to keep all users happy by running their programs in a timely fashion. User A may be compiling a very large program, for which the time may be several minutes. If user B wants to list his directory contents, which requires very little time, the kernel can and does stop doing the compilation, saving it place in the instruction sequence and does the instructions for the listing program; then, returns to the compilation. This is simplest example of a time slice operation. Bascially when confronted with many requests, the kernel does a few instructions for each request in its schedule and continues passing through its schedule cyclically, eliminating items from its schedule as they terminate.
Needless to say, someone must write the this time scheduling into the coding of the kernel so that it knows how to prioritize, how many instructions to run for a given process. What to do if all requested processes cannot be fit into RAM? Suppose the same program is being requested over and over again. There have developed all sorts of refinements and tweaks to the time slicing operation, that try to improve speed and efficiency.
A unix system has a place on the hard disk called 'swap space' which an overflow storage space if RAM is exhausted by requests. Older unix only read in, and swapped entire instruction sets. Newer unix works in terms of 'pages' of instructions to reduce the amount of swapping time. There is caching of program instruction sets for programs that are requested often: they are not eliminated from RAM just because the process terminates. There are shared libraries, where standard library functions in binary form are read in only at run time. This eliminates hundreds of copies of these standard routines being contained in binary program files that use them. This cuts down on disk usage.
One has to begin somewhere and rather than start with the ancient Egyptians (this is supposed to be short, and far from encyclopedic), I'll begin about 1968, CE that is.
In the beginnig there was Multics. A large operating system developed jointly between Bell Labs, General Electric and M.I.T., seen by some as being overly large and not modular. Modularity is key to change and development. Unix was a response to the discontents of Multics, developed at what was then Bell Labs in New Jersey, by Ken Thompson (formerly associated with the Multics project) and Dennis Ritchie.