A Tale of Two Leaks: An Introduction to Memsee
This is the first of a 3-part series of posts detailing my investigations into two separate memory leaks in edx-platform. In this post, I’ll describe memsee, a tool built by Ned and me during a previous memory investigation which can help to provide insight while diagnosing a leak.
Design
Memsee is intended as a tool to interactively investigate memory usage in a snapshot taken from a Python process. It is built to load in a memory dump into a SQLite database, and then provide a REPL for querying that database, naming objects for future investigation, and performing various graph-based queries to determine how objects are related.
The basic SQLite schema used is quite simple: it consists of a table for objects (named obj
) and a table of references between objects (named ref
). The schema for the two tables is reproduced below:
CREATE TABLE obj (
address int primary key, -- the memory address of the object
type text, -- the type of the python object
name text, -- the name of the python object (for things like functions and classes)
value text, -- the value of the object (for strings and ints)
size int, -- the amount of memory allocated to the object
len int, -- the length of the object (for lists, etc)
mark int, -- whether the object is reachable from
-- the root (used during memsee garbage collection)
repr text -- a rough approximation of the python repr of the object
-- (limited by the information that is dumped by meliae)
);
CREATE TABLE ref (
parent int, -- the memory address of the object holding the reference
child int -- the memory address of the referenced object
);
Commands
Starting Memsee
> python memsee.py [DATABASE]
Start memsee
. If DATABASE
is supplied, then connect to that database on startup. Will not create the database if it doesn’t exist.
Setting Up A Database
create
::> create DATABASE
Create a new memsee database to work in, and connect to it.
read
::> read FILE
Read a meliae memory dump into the active database as a new generation.
open
::> open FILE
Connect to a previously created memsee database.
Inspecting objects
select
::> select * from obj where type = 'dict' limit 10
# address type name value size len mark repr
-------- --------- ---- ---- ----- ------ ----- ------ ----------
#0.0 39981760 dict ∘ ∘ 1048 13 ∘ dict
#0.1 118896112 dict ∘ ∘ 1048 10 ∘ dict
#0.2 31427024 dict ∘ ∘ 1048 20 ∘ dict
#0.3 31172288 dict ∘ ∘ 3352 29 ∘ dict
#0.4 31463552 dict ∘ ∘ 664 8 ∘ dict
#0.5 16117648 dict ∘ ∘ 3352 71 ∘ dict
#0.6 20850512 dict ∘ ∘ 1048 10 ∘ dict
#0.7 16117360 dict ∘ ∘ 12568 144 ∘ dict
#0.8 16503472 dict ∘ ∘ 1048 7 ∘ dict
#0.9 16281728 dict ∘ ∘ 3352 41 ∘ dict
#0.10 16246816 dict ∘ ∘ 12568 235 ∘ dict
Execute a SQL select query against the connected memsee database.
Substitutions
Memsee will perform a number of substitutions in select commands that make traversing the object graph easier.
Children
&
, when appended to a memory address, means “the address of all child objects”, and could be understood as
1234& <==> (select child from ref where parent = 1234)
This suffix can be repeated to traverse multiple level of the object hierarchy.
Parents
Similar to &
, ^
selects the memory address of parents of the target address.
1234^ <==> (select parent from ref where child = 1234)
Tree Traversal
path
The path
command searches from one set of objects to another, following references from the currently selected set until it finds a path to the destination set. It prints the objects in the first path that is found.
::> path from "QUERY" to "QUERY" [reversed]
The reversed
argument causes the traversal to follow references backwards (from child to parent).
Example
::> path from "address = 110810832" to "address in 0&" reversed
Added 162 paths to newly discovered nodes
Added 224 paths to newly discovered nodes
Added 73884 paths to newly discovered nodes
Added 73698 paths to newly discovered nodes
Added 114334 paths to newly discovered nodes
Added 40842 paths to newly discovered nodes
Added 38 paths to newly discovered nodes
Added 38 paths to newly discovered nodes
Added 76 paths to newly discovered nodes
Added 304 paths to newly discovered nodes
Added 304 paths to newly discovered nodes
Added 1634 paths to newly discovered nodes
Added 1520 paths to newly discovered nodes
Added 18278 paths to newly discovered nodes
Added 567302 paths to newly discovered nodes
# address type name value size len mark repr
-------- ---------------- -------------------- -------------------- ------- ------ ----- ----- ----------
#0.0 110810832 XMLModuleStore ∘ ∘ 64 ∘ ∘ XMLModuleS
#0.1 109985944 list ∘ ∘ 104 3 ∘ list
#0.2 111043728 dict ∘ ∘ 1048 12 ∘ dict
#0.3 110780112 MixedModuleStore ∘ ∘ 64 ∘ ∘ MixedModul
#0.4 111033776 dict ∘ ∘ 280 1 ∘ dict
#0.5 110809360 LibraryToolsService ∘ ∘ 64 ∘ ∘ LibraryToo
#0.6 111031056 dict ∘ ∘ 280 5 ∘ dict
#0.7 111034352 dict ∘ ∘ 3352 41 ∘ dict
#0.8 110779280 LmsModuleSystem ∘ ∘ 64 ∘ ∘ LmsModuleS
#0.9 111050032 dict ∘ ∘ 3352 26 ∘ dict
#0.10 110783296 module open_ended_grading.u ∘ 56 ∘ ∘ open_ended
#0.11 12611920 dict ∘ ∘ 786712 6069 ∘ dict
#0.12 12619664 dict ∘ ∘ 3352 73 ∘ dict
#0.13 140244885039992 module sys ∘ 56 ∘ ∘ sys
#0.14 13270096 dict ∘ ∘ 3352 54 ∘ dict
#0.15 1294669280 frame ∘ wait 512 ∘ ∘ frame
This series continues in Part 1 where I describe tracking down a single large memory leak.