|
Some of the issues that a private Armory database would face:
1). Character information
The information stored for each character used to be in a single XML sheet but since the Armory changes in August 2007 this is now split across 5 XML sheets (summary, skills, reputation, talents, and arena teams). This requires 5 separate HTTP requests per character to retrieve all of the information; naturally if your project doesn’t need the information from the other pages they aren’t retrieved. For example WowJutsu only needs to see the gear guild members are wearing and probably just retrieves a single sheet per character.
Storing 5 separate sheets per character is also an issue, for my own crawling operation I chose to amalgamate the separate XML sheets into one for each character.
2). Storage
XML as a storage format is very verbose, thankfully it compresses incredibly well - I used ZLIB data compression to store each character sheet. Even so, one million characters equates roughly to 10Gb of storage space.
3). Filesystem
Choice of filesystem is important if you are going to have millions of files, directory structure is also very important.
a. Some filesystems store directory contents as a ‘linked list’, when you have thousands of files in a single directory finding files can be slow. In my own research I experimented with the EXT3 and ReiserFS filesystems on my Linux servers and found ReiserFS to perform far faster.
b. As for directory structure I used “(Region US, EU, TW) / (Battlegroup Name) / (Realm Name) / (Character Name).xml.z” with ReiserFS this proved fast enough for random character access.
c. Filesystems also have limits on the maximum number of files they can store, for Linux filesystems these are the i-nodes used to point to files, they are fixed in number and the limit is set when the filesystem is created.
4). Database
Storing all of the information in a database is infeasible; ‘digesting’ the XML out into a database-friendly structure is also hugely complicated. Instead, using the database as an index to all the character information stored on the filesystem is far more efficient. Storing some meta data about the characters (level, class, race, faction) in the database allows for quick selection of a data set without the need to parse all stored characters.
|