Python, MongoDB, and
asynchronous web
frameworks
. A Jesse Jiryu Davis
.jesse@10gencom
.emptysquarenet
CPU-bound web service
Client
Server
socket
•
No need for async
•
Just spawn one process per core
Normal web service
Client
Server
socket
•
Assume backend is unbounded
•
:Service is bound by
•
- Context switching overhead
•
File descriptors
•
!Memory
Backend
(, ,DB web service
, …)SAN
socket
What’s async for?
•
Minimize resources per connection
•
Wait for backend as cheaply as possible
CPU- vs. Memory-bound
Crypto
Chat
HTML?
-Memory bound
-CPU bound
HTTP long-polling (“COMET”)
•
E.g., chat server
•
Async’s killer app
•
Short-polling is CPU-bound: tradeoff
between latency and load
•
Long-polling is memory bound
•
“
C10K problem”: kegel.com/c10k.html
•
Tornado was invented for this
Why is async hard to code?
Backend
Client
Server
request
response
store state
request
response
t
i
m
e
Ways to store state
Easy
Efficient
Blocked threads
:Method
:Example
, Django WSGI
, .Tornado Nodejs
Callbacks
Greenlets
Gevent
Tradeoff between coding ease and memory efficiency
What’s a greenlet?
•
A.K.A. “green threads”
•
A feature of Stackless Python, packaged
as a module for standard Python
•
Greenlet stacks are stored on heap,
copied to / from OS stack on resume /
pause
•
Cooperative
•
Memory-efficient
Threads:
State stored on OS stacks
•
# pseudo-Python
•
•
sock = listen()
•
•
request = parse_http(sock.recv())
•
•
mongo_data = db.collection.find()
•
•
response = format_response(mongo_data)
•
•
sock.sendall(response)
Gevent:
State stored on greenlet stacks
•
# pseudo-Python
•
import
gevent.monkey
; monkey
.
patch_all()
•
•
sock
=
listen()
•
•
request
=
parse_http(sock
.
recv())
•
•
mongo_data
=
db
.
collection
.
find()
•
•
response
=
format_response(mongo_data)
•
•
sock
.
sendall(response)
Tornado:
State stored in RequestHandler
•
class
MainHandler
(tornado
.
web
.
RequestHandler):
•
@tornado.web.asynchronous
•
def
get
(
self
):
•
AsyncHTTPClient()
.
fetch(
"http://example.com"
,
•
callback
=
self
.
on_response)
•
•
def
on_response
(
self
, response):
•
formatted
=
format_response(response)
•
self
.
write(formatted)
•
self
.
finish()
Tornado IOStream
•
class
IOStream
(
object
):
•
def
read_bytes
(
self
, num_bytes, callback):
•
self
.
read_bytes
=
num_bytes
•
self
.
read_callback
=
callback
•
io_loop
.
add_handler(
•
self
.
socket
.
fileno(),
•
self
.
handle_events,
•
events
=
READ)
•
•
def
handle_events
(
self
, fd, events):
•
data
=
self
.
socket
.
recv(
self
.
read_bytes)
•
self
.
read_callback(data)
Tornado IOLoop
•
class
IOLoop
(
object
):
•
def
add_handler
(
self
, fd, handler, events):
•
self
.
_handlers[fd]
=
handler
•
# _impl is epoll or kqueue or ...
•
self
.
_impl
.
register(fd, events)
•
•
def
start
(
self
):
•
while
True
:
•
event_pairs
=
self
.
_impl
.
poll()
•
for
fd, events
in
event_pairs:
•
self
.
_handlers[fd](fd, events)
Python, MongoDB, &
concurrency
•
Threads work great with pymongo
•
Gevent works great with pymongo
–
monkey.patch_socket();
monkey.patch_thread()
•
Tornado works so-so
–
asyncmongo
•
No replica sets, only first batch, no SON
manipulators, no document classes, …
–
pymongo
•
OK if
all
your queries are fast
•
Use extra Tornado processes
Introducing: “Motor”
•
Mo
ngo +
Tor
nado
•
Experimental
•
Might be official in a few months
•
Uses Tornado IOLoop and IOStream
•
Presents standard Tornado callback API
•
Stores state internally with greenlets
•
github.com/ajdavis/mongo-python-driver/tree/tornado_async
Motor
•
class
MainHandler
(tornado
.
web
.
RequestHandler):
•
def
__init__
(
self
):
•
self
.
c
=
TornadoConnection()
•
•
@tornado.web.asynchronous
•
def
get
(
self
):
•
# No-op if already open
•
self
.
c
.
open(callback
=
self
.
connected)
•
•
def
connected
(
self
, c, error):
•
self
.
write(
'['
)
•
self
.
cursor
=
self
.
c
.
collection
.
find(callback
=
self
.
found)
•
•
def
found
(
self
, result, error):
•
for
i
in
result:
•
self
.
write(json
.
dumps(i))
•
•
if
self
.
cur
.
alive:
•
self
.
cur
.
get_more(callback
=
self
.
found)
•
else
:
•
self
.
write(
']'
)
•
self
.
finish()
Motor (with Tornado Tasks!)
•
class
MainHandler
(tornado
.
web
.
RequestHandler):
•
def
__init__
(
self
):
•
self
.
c
=
MongoTornadoConnection()
•
•
@tornado.web.asynchronous
•
@gen.engine
•
def
get
(
self
):
•
yield
gen
.
Task(
self
.
c
.
open)
•
self
.
write(
'['
)
•
cursor
=
self
.
c
.
db
.
collection
.
find(
•
callback
=
(
yield
gen
.
Callback(
'find'
)))
•
•
while
cursor
.
alive:
•
for
i
in
(
yield
gen
.
Wait(
'find'
)):
•
self
.
write(json
.
dumps(i))
•
•
self
.
write(
']'
)
•
self
.
finish()
Motor internals
pymongo
IOLoop
RequestHandler
request
schedule
callback
start
t
i
m
e
Client
greenlet
.
(
)
I
O
S
t
r
e
a
m
s
e
n
d
()switch
()switch
return
.
(
)
I
O
S
t
r
e
a
m
r
e
c
v
stack depth
()callback
HTTP response
parse Mongo response
Motor internals: wrapper
•
class
TornadoCollection
(pymongo
.
collection
.
Collection):
•
def
find
(
self
,
*
args,
**
kwargs):
•
callback
=
kwargs
.
get(
'callback'
)
•
del
kwargs[
'callback'
]
•
cursor
=
super
(TornadoCollection,
self
)
.
find(
*
args,
**
kwargs)
•
tornado_cursor
=
TornadoCursor(cursor)
•
tornado_cursor
.
get_more(callback)
•
return
tornado_cursor
•
•
class
TornadoCursor
(
object
):
•
def
__init__
(
self
, cursor):
•
self
.
__sync_cursor
=
cursor
•
•
def
get_more
(
self
, callback):
•
def
_get_more
():
•
result
=
self
.
__sync_cursor
.
_refresh()
•
tornado
.
ioloop
.
IOLoop
.
instance()
.
add_callback(
•
lambda
: callback(result)
•
)
•
•
greenlet
.
greenlet(_get_more)
.
switch()
•
return
None
Motor internals: fake socket
•
class
TornadoSocket
(
object
):
•
@property
•
def
stream
(
self
):
•
if
not
self
.
_stream:
•
# Tornado's IOStream sets the socket to
•
# be non-blocking
•
self
.
_stream
=
tornado
.
iostream
.
IOStream(
•
self
.
socket)
•
•
return
self
.
_stream
•
•
def
recv
(
self
, num_bytes):
•
child_gr
=
greenlet
.
getcurrent()
•
def
recv_callback
(data):
•
child_gr
.
switch(data)
•
•
self
.
stream
.
read_bytes(
•
num_bytes, callback
=
recv_callback)
•
return
child_gr
.
parent
.
switch()
Motor
•
Shows a general method for
asynchronizing synchronous network
APIs in Python
•
Who wants to try it with MySQL? Thrift?
Questions?
•
A. Jesse Jiryu Davis
•
jesse@10gen.com
•
emptysquare.net