Bringing Business Awareness to Your Operation Team

at Nagios World Conference, Saint-Paul, MN

Managing an infrastructure with a global footprint can be a challenging task for a centralized operation team. Bringing business awareness to your operation is key to success. In this talk you will learn how TubeMogul Operation Team easily manage its on-call rotation, how we centralize monitoring information from multiple datacenter for efficient on-call and how we define our business priority with a business focused dashboard.

Scaling on EC2 in a Fast-Paced Environment

at USENIX LISA'11, Boston, MA

Managing a server infrastructure in a fastpaced environment like a start-up is challenging. You have little time for provisioning, testing and planning but still you need to prepare for scaling when your product reaches the tipping point. Amazon EC2 is one of the cloud providers that we experimented with while growing our infrastructure from 20 servers to 500 servers. In this paper we will go over the pros and cons of managing EC2 instances with a mix of Bind, LDAP, SimpleDB and Python scripts; how we kept a smooth working process by using NFS, auto-mount and shell-scripting; why we switched from managing our instances based on tailor-made AMI/Shell-scripting to the official Ubuntu AMI, Cloud-init and puppet; and finally, we will go over some rules we had to follow carefully to be able to handle billions of daily non-static http request across multiple Amazon EC2 regions.