We’ve previously posted about measuring what’s happening on Redbubble, but with NewRelic’s new plugin system our lives just got a bit easier. We’ve been using NewRelic for a while now and frequently make use of it’s application and server monitoring for diagnosing problems. Generally the timescales it keeps data for are sane, integration with other tools is good and the UI is easy to use. There are already a plethora of plugins waiting for you to install, although NewRelic’s definition of a plugin is not what we expected. The plugins are small applications (agents) that poll your apps and report back to NewRelic using your licence key and an agent_guid for identification. So installation isn’t all that straightforward, depending on what you want to monitor.
Tag Archives: graphs
A little while ago, we realised that at having information about what was happening on Redbubble right now would be useful for many things; including tracking and alerting us to the sorts of issues that wouldn’t necessarily be caught by more traditional means (such as Airbrake alerting). Having come across this post by Etsy, we looked at statsd as a way to collect this information in a quick way without adding much processing overhead.
The task Here at Redbubble we’ve been running Ruby on Rails since day one. We’re a small development team, so keeping up with even the latest stable release has been a struggle. Earlier this year we had a gap in product development and took the opportunity to move our stack forward. We’d been on Ruby 1.8.7 and Rails 2.3 for a year or two. After some investigation we decided we’d first move to Rails 3.0 which would make Ruby 1.9 an easier option. After significant library updates and code compatibility changes, we were ready to release Rails 3.0 on Ruby 1.8.7.
Recently we noticed that one of our Solr slave processes (a piece of software we use to power the search feature at RedBubble) was taking longer to respond to search queries than usual – enough to trigger a warning alert on our monitoring systems. On first inspection, we saw that the machine was using a lot of memory – enough to go into swap space, but this was sitting around the same level as our other server, which was not experiencing problems. A little more digging, and we noticed, thanks to the JVM monitoring provided by NewRelic, that the Solr process was spending a lot of time (around 15% on average, but up to 50%) doing garbage collection. This means