Lecture 3 Outline / September 9th, 2008 ======================================= "There is always an easy solution to every problem -- neat, plausible and wrong." -- H.L. Mencken Homework Stuff ============== Office hours switched to Wednesday, 8:30-10pm. Homeworks will still be handed out on Tuesday, but will be due on the following Thursday instead of Tuesday. Let me know if that's a problem. There's a formatting error in HW #2: to end the echo server, it should be ``\n.\n`` instead of ``n.n``. Grading will **not** be on a curve unless it's good for all students. Iterators --------- Consider iterating over stacks vs lists -- one would be re-entrant, one is not. Resetting the count does not count, because you cannot iterate over it multiple times: :: class Iterator(object): def next(self): if self.count > self.max_count: self.count = -1 # or __init__ raise StopIteration ... (Among other things, it doesn't pass the dual for-loop test.) Think of the object returned by __iter__ as the counter information; you can return a *new* counter for every iterator request, *or* you can return the same one. The first makes things re-entrant. Also, minimize unnecessary code if you can: several people did this, :: class ReentrantFibonacciIterator(object): ... def __iter__(self): return FibonacciIterator(self.n) def next(self): ... The 'next' function is now completely useless and never called -- **why is it there**? This sort of thing offends me and makes me grouchy. Generators ---------- For generators, look at this example: :: def fn(): print 'at 0' yield 0 print 'at 1' yield 1 print 'at 2' yield 2 print 'at 3' yield 3 When you call 'fn', you get a **generator object** that holds internal state (basically a pointer to the next line of code to be executed, plus a copy of the local variables). Each successive call to ``next()`` advances the pointer to the next ``yield``. Thoughts on doing homework -------------------------- Stringing together code examples may be easier than reinventing, or not - YMMV. No penalty for swiping code in this class; heck, it's encouraged. Note I did give you most of the necessary code. If you're having trouble with the primes for HW #2, try taking the fib stuff I sent out and adapting that. Networking ========== Blocking vs non-blocking I/0 - blocking I/O **blocks** the thread of execution until input/output is available. This means that you must use multiple threads of execution (threading, multiprocess) OR special monitoring functions (select) to handle multiple connections with blocking I/O in a single thread of execution. - non-blocking I/O doesn't block, but returns immediately with data OR aborts and signals an error ("no data available, boss"). (how do you distinguish between "data available" and error? guesses?) Synchronous vs asynchronous I/O - synchronous is like a conversation: you speak, then I speak. - asynchronous is like an unfriendly argument: everyone speaking at once. Synchronous vs asynchronous **processing** - synchronous processing waits for one "conversation" to be over before starting another one. - synchronous processing is BAD: inefficient, doesn't scale, resource intensive... - asynchronous processing relies on built-in language or OS features to "keep track" of several connections at once. It can be done inefficiently (see HW #3, "busy wait") or efficiently (threading/processes, or select); the latter depends more on language/OS features. HTTP and the Web ================ The Web uses HTTP (hypertext transport protocol) as its primary transport protocol. Note, HTTP can be used to transport any content, not just "hypertext" (e.g. HTML) but also things like large files. However, HTTP servers are usually optimized for many connections sending fairly small loads of data. URLs: http://machine:port/path/to/resource?query_stuff - http is the protocol - machine is a DNS name or an IP address - port is a TCP port - /path/to/resource is the on-server path, can be anything you want, although many Web servers have canonical URL schemes (e.g. CGI scripts). - query_stuff is a way of passing in values The HTTP protocol: - synchronous (query, response transaction pairs); AJAX adds asynchrony. - client/server in design, not peer to peer (although again JavaScript is changing things) - note, all header lines etc. end in (carriage return/line feed), or ``\r\n`` - basic protocol: port 80, requests like :: \r\n [ ] ended by blank line; for example :: GET / HTTP/0.9\r\n \r\n - answer is in format :: \r\n
\r\n \r\n for example :: HTTP/1.1 200 OK\r\n Content-length: 4567\r\n \r\n - GET, retrieval of page content; generally full URL is specified. - content-type specification (MIME types: 'text/html', 'image/jpeg') - data passing via query strings: query_string is set of pairs of name, value data, encoded suitably: name=value2&name2=value2 Obvious limitations for large data (submitting files!)... - POST, retrieval of page content AFTER data passing. data is passed after the header, as one long encoded stream of bytes. - "cookies" are small bits of data that are specific to a URL hierarchy, and are a name,value pair. They are stored on the client side -- really the *only* persistent thing stored on the client. - markup, html, forms - HTML ("HyperText Markup Language") is a bastardized XML-ish language: :: plain text There's a specific document format that we will mostly be ignoring :: ... [ other stuff ] [ actual content ] - note, HTTP is actually a very simple protocol, and HTML is quite simple (and "open source"). This drove the spread of both! Statelessness ------------- HTTP is designed around *statelessness* -- the only persistent state on the client side is in the cookies, and those contain only data relevant to the *server* (so you can e.g. switch IP addresses and clients easily). Connections are designed to be transient (request, receive, end) although there are optimizations on top of that now (HTTP/1.1). Online purchase example. Think of iterator and how annoying it is to hold state... See Wikipedia, `Stateless server `__; each transaction is treated independently, with cookies being the only thing kept across conversations. Thus HTTP is actually not just a synchronous protocol/conversation, it is a very *brief* conversation :). (POST connections, and GET connections with query strings, do retain some additional state - can cause problems in bookmarks & reload. These are **design issues** you will confront in building Web apps.) This design allows **separation in time (and space)**, and also drives you to design your Web app in such a way as to permit client side **loose coupling**, which is a good architectural principle for distributed systems, esp. when issues of intermittent connectivity, low bandwidth, etc. arise. **Strong** or **tight coupling** often seems very convenient (simpler to program!) but is, generally, a bad idea architecturally. (Remember, "easy to design and implement" often means you're painting yourself into a corner when it comes to the larger issues!) The Web has driven a shift in mass-market applications to loose coupling and distributed systems. Canalizing thought patterns in this way is ... good? Well, it's happening, anyway.