Lecture 4 Outline / September 16th, 2008

Homework stuff

Questions on blocking vs non-blocking, and how to handle?

GET vs POST

GET puts queries on the URL

POST puts queries on the URL and sends them in the body of the HTTP request.

We'll talk about how to deal with that next week.

Multitasking, concurrency, and pre-emptiveness

How do you do more than one thing at a time on a computer!?

The friendly approach:
  • say, are you done yet? excellent! may I have some CPU? (Windows 3.1)
  • "non-preemptive"
  • doesn't work well with "bad actors" (WANT CPU!) or bad programmers
  • actually more complicated to do right, in some ways
  • e.g. think about your nonblocking code due Thursday...
The less friendly approach: the traffic cop (scheduler)
  • pre-emptive
  • task scheduler coordinates things
  • less up-front work for you, but no guarantees on order of execution
  • how almost everything works nowadays

The bottom line is that as soon as you start using multitasking, you lose predictability & gain stochasticity (randomness) in your execution.

Atomicity:

'atomic' operations are indivisible operations, i.e. they (look like they) take precisely one CPU tick.

Do basic operations get interrupted? Imagine a simple algorithm for inserting into a list:

get slot number (slot = length)
add one element slot to list (array.add_one_slot() -> length++)
fill slot with data (array[slot] = data)

This is not guaranteed to perform correctly as written, in presence of threading! Need to use locks:

grab the lock:
  get slot number (slot = length)
  add one element slot to list (array.add_one_slot() -> length++)
  fill slot with data (array[slot] = data)
release lock

Locking of resources:

Locks provide a guaranteed atomic way to execute a piece of code nonconcurrently.

Fine grained vs coarse-grained locks:
  • more locks are better, because they let you subdivide tasks and do more concurrently
  • fewer locks are better because they are easier to manage and understand

Deadlock - two tasks are waiting for each other to release a lock.

(always release the lock; try/finally!)

This can get arbitrarily complicated. There be dragons. Avoid at all costs until you know what you're doing (-> never use).

Processes vs Threads

Processes:

  • heavyweight concurrency
  • less shared state between processes (disk, etc., but not memory)

os.fork() on UNIX; also see 'multiprocessing' in Python 2.6.

How do you communicate between processes? Through the OS. But it's slow:

Interprocess Communication (IPC) techniques:
  • shared memory
  • pipes
  • domain or network sockets
  • files

Threads:

  • lighterweight concurrency
  • shared memory space DANGER WILL ROBINSON
  • atomicity suddenly becomes important

'threading' module on Python 2.2+.

create, start, do stuff, 'join'

Avoiding shared state

Concurrency and stochasticity will only bite you when you need to communicate either explicitly (IPC) or implicitly.

Any writeable, shared state must be protected: "threadsafe".

Module globals, class globals, shared namespaces... anything that can be accessed outside of the function that creates it is bad. Local variables are isolated and hence safe, until you say otherwise.

(This is why reentrancy came up in iterators: only the calling process could get a handle on the iterator object!)

Python, concurrency, atomicity, and locking

Python->C calls are automatically atomic, so most "low-level" Python data type operations are atomic.

All other Python calls are not, so e.g. functions in threads may execute in parallel (that's the point!)

...well, not really. It's complicated.

Digression: I/O vs CPU-intensive multitasking

  • blocking or slow I/O calls are one reason to use multitasking
  • executing CPU-intensive tasks in parallel are another

Python multithreading is good for the first, NOT good for the second, because only specially written C code actually runs in parallel.

---

Bottom line: concurrency is not for the faint-hearted and is best done as simply as possible.