Python, MongoDB, and asynchronous web frameworks

VI

Feb 14, 2012 (3 months and 5 days ago)

367 views

Python, MongoDB, and
asynchronous web
frameworks
. A Jesse Jiryu Davis
.jesse@10gencom
.emptysquarenet
CPU-bound web service
Client
Server
socket

No need for async

Just spawn one process per core
Normal web service
Client
Server
socket

Assume backend is unbounded

:Service is bound by
• 
- Context switching overhead
• 
File descriptors
• 
!Memory
Backend
(, ,DB web service
, …)SAN
socket
What’s async for?

Minimize resources per connection

Wait for backend as cheaply as possible
CPU- vs. Memory-bound
Crypto
Chat
HTML?
-Memory bound
-CPU bound
HTTP long-polling (“COMET”)

E.g., chat server

Async’s killer app

Short-polling is CPU-bound: tradeoff
between latency and load

Long-polling is memory bound


C10K problem”: kegel.com/c10k.html

Tornado was invented for this
Why is async hard to code?
Backend
Client
Server
request
response
store state
request
response
t
i
m
e
Ways to store state
Easy
Efficient
Blocked threads
:Method
:Example
, Django WSGI
, .Tornado Nodejs
Callbacks
Greenlets
Gevent
Tradeoff between coding ease and memory efficiency
What’s a greenlet?

A.K.A. “green threads”

A feature of Stackless Python, packaged
as a module for standard Python

Greenlet stacks are stored on heap,
copied to / from OS stack on resume /
pause

Cooperative

Memory-efficient
Threads:
State stored on OS stacks

# pseudo-Python


sock = listen()


request = parse_http(sock.recv())


mongo_data = db.collection.find()


response = format_response(mongo_data)


sock.sendall(response)
Gevent:
State stored on greenlet stacks

# pseudo-Python

import

gevent.monkey
; monkey
.
patch_all()



sock
=
listen()



request
=
parse_http(sock
.
recv())



mongo_data
=
db
.
collection
.
find()



response
=
format_response(mongo_data)



sock
.
sendall(response)
Tornado:
State stored in RequestHandler

class

MainHandler
(tornado
.
web
.
RequestHandler):


@tornado.web.asynchronous


def

get
(
self
):


AsyncHTTPClient()
.
fetch(
"http://example.com"
,


callback
=
self
.
on_response)




def

on_response
(
self
, response):


formatted
=
format_response(response)


self
.
write(formatted)


self
.
finish()
Tornado IOStream

class

IOStream
(
object
):


def

read_bytes
(
self
, num_bytes, callback):


self
.
read_bytes
=
num_bytes


self
.
read_callback
=
callback


io_loop
.
add_handler(


self
.
socket
.
fileno(),

self
.
handle_events,

events
=
READ)



def

handle_events
(
self
, fd, events):


data
=

self
.
socket
.
recv(
self
.
read_bytes)


self
.
read_callback(data)
Tornado IOLoop

class

IOLoop
(
object
):


def

add_handler
(
self
, fd, handler, events):


self
.
_handlers[fd]
=
handler


# _impl is epoll or kqueue or ...


self
.
_impl
.
register(fd, events)


def

start
(
self
):


while

True
:


event_pairs
=

self
.
_impl
.
poll()


for
fd, events
in
event_pairs:


self
.
_handlers[fd](fd, events)
Python, MongoDB, &
concurrency

Threads work great with pymongo

Gevent works great with pymongo

monkey.patch_socket();
monkey.patch_thread()

Tornado works so-so

asyncmongo

No replica sets, only first batch, no SON
manipulators, no document classes, …

pymongo

OK if
all
your queries are fast

Use extra Tornado processes
Introducing: “Motor”

Mo
ngo +
Tor
nado

Experimental

Might be official in a few months

Uses Tornado IOLoop and IOStream

Presents standard Tornado callback API

Stores state internally with greenlets

github.com/ajdavis/mongo-python-driver/tree/tornado_async
Motor

class

MainHandler
(tornado
.
web
.
RequestHandler):


def

__init__
(
self
):


self
.
c
=
TornadoConnection()



@tornado.web.asynchronous


def

get
(
self
):


# No-op if already open


self
.
c
.
open(callback
=
self
.
connected)



def

connected
(
self
, c, error):


self
.
write(
'['
)


self
.
cursor
=

self
.
c
.
collection
.
find(callback
=
self
.
found)



def

found
(
self
, result, error):


for
i
in
result:


self
.
write(json
.
dumps(i))



if

self
.
cur
.
alive:


self
.
cur
.
get_more(callback
=
self
.
found)


else
:


self
.
write(
']'
)


self
.
finish()
Motor (with Tornado Tasks!)

class

MainHandler
(tornado
.
web
.
RequestHandler):


def

__init__
(
self
):


self
.
c
=
MongoTornadoConnection()



@tornado.web.asynchronous


@gen.engine


def

get
(
self
):


yield
gen
.
Task(
self
.
c
.
open)


self
.
write(
'['
)


cursor
=

self
.
c
.
db
.
collection
.
find(


callback
=
(
yield
gen
.
Callback(
'find'
)))



while
cursor
.
alive:


for
i
in
(
yield
gen
.
Wait(
'find'
)):


self
.
write(json
.
dumps(i))



self
.
write(
']'
)


self
.
finish()
Motor internals
pymongo
IOLoop
RequestHandler
request
schedule
callback
start
t
i
m
e
Client
greenlet
.
(
)
I
O
S
t
r
e
a
m
s
e
n
d
()switch
()switch
return
.
(
)
I
O
S
t
r
e
a
m
r
e
c
v
stack depth
()callback
HTTP response
parse Mongo response
Motor internals: wrapper

class

TornadoCollection
(pymongo
.
collection
.
Collection):


def

find
(
self
,
*
args,
**
kwargs):


callback
=
kwargs
.
get(
'callback'
)


del
kwargs[
'callback'
]


cursor
=

super
(TornadoCollection,
self
)
.
find(
*
args,
**
kwargs)


tornado_cursor
=
TornadoCursor(cursor)


tornado_cursor
.
get_more(callback)


return
tornado_cursor


class

TornadoCursor
(
object
):


def

__init__
(
self
, cursor):


self
.
__sync_cursor
=
cursor



def

get_more
(
self
, callback):


def

_get_more
():


result
=

self
.
__sync_cursor
.
_refresh()


tornado
.
ioloop
.
IOLoop
.
instance()
.
add_callback(


lambda
: callback(result)


)



greenlet
.
greenlet(_get_more)
.
switch()


return

None
Motor internals: fake socket

class

TornadoSocket
(
object
):


@property


def

stream
(
self
):


if

not

self
.
_stream:


# Tornado's IOStream sets the socket to


# be non-blocking


self
.
_stream
=
tornado
.
iostream
.
IOStream(


self
.
socket)



return

self
.
_stream



def

recv
(
self
, num_bytes):


child_gr
=
greenlet
.
getcurrent()


def

recv_callback
(data):


child_gr
.
switch(data)



self
.
stream
.
read_bytes(


num_bytes, callback
=
recv_callback)


return
child_gr
.
parent
.
switch()
Motor

Shows a general method for
asynchronizing synchronous network
APIs in Python

Who wants to try it with MySQL? Thrift?
Questions?

A. Jesse Jiryu Davis

jesse@10gen.com

emptysquare.net