View Single Post
Old 01/28/08, 7:17 AM   #17 (permalink)
Okoloth
Glass Joe
 
Night Elf Hunter
 
Kilrogg (EU)
I have been trawling the Armory periodically since May last year and I have evolved my data collection and storage techniques all along the way. As Kalroth pointed out, a lot of the delay in accessing the Armory is the huge amount of processing being put onto the browser. When you are just fetching XML sheets this is not an issue.

Believe it or not, the Armory as a server is pretty good at sending back the raw XML sheets. Each request made by my crawlers is timed and a typical character sheet request is on average less than 80ms. There are noticeable busy spells at certain times of the day where the response time is longer and occasional oddities that can stall a single request for 60+ seconds until it’s timed out – but that’s normal for most heavily used websites.

Back in the early days I could easily achieve hundreds of character sheets per second on a single crawler, but mostly I kept them throttled back to keep a low profile. Recently I noticed that the Armory servers have wizened up to the requests they are receiving, above a certain threshold of requests per second they stop playing ball and reply with HTTP response headers and zero-length content. Throttling your request rate back resolves this in a matter of seconds. To achieve higher request rates you just need to have requests coming from more / different IP addresses, i.e. through several proxies.

Like everyone else has said having a bulk-data API onto the Armory data would be ideal - whether it’s done by Blizzard or someone else.
 
User is offline.
Reply With Quote