Nicolas Brousse, a Cloud Technology Leader, became Director of Operations Engineering at Adobe (NASDAQ: ADBE) after the acquisition of TubeMogul (NASDAQ: TUBE). As TubeMogul's sixth employee and first operations hire, Nicolas has built and grown Adobe/TubeMogul's infrastructure over the past nine years from several machines to over five thousand servers that handle ±200 billions requests per day for clients like Allstate, Chrysler, Heineken and Hotels.com.
Adept at adapting quickly to ongoing business needs and constraints, Nicolas leads a global team of site reliability engineers, cloud engineers, security engineers, and database architects that build, manage, and monitor Adobe Advertising Cloud's infrastructure 24/7 and adhere to "DevOps" methodology. Nicolas is a frequent speaker at top U.S. technology conferences and regularly gives advice to other operations engineers. Prior to relocating to the U.S. to join TubeMogul, Nicolas worked in technology for over 15 years, managing heavy traffic and large user databases for companies like MultiMania, Lycos and Kewego. Nicolas lives in Richmond, CA and is an avid fisherman and aspiring cowboy.
Technologies: Linux, Puppet, Python, Ruby, PHP, Java, Go, Jenkins, Graphite, Ganglia, Grafana, Nagios, Sensu, AWS, HAproxy, OpenStack, Zookeeper, Kafka, Couchbase, MySQL, ElasticSearch, HBase, Hadoop, Ubuntu, Debian, Docker, Container, Kubernetes, KVM, TCP/IP, Open vSwitch, etc.
The success of Public Cloud is not questionable. It enables companies to accelerate their product development velocity and their time to market with low operational frictions. Many vendors have tried to tap this market with various Private Cloud solutions, with questionable success. In this case study, we will cover the true story of a small start-up, TubeMogul, becoming big, to build the foundations of the Adobe Advertising Cloud. Thru its cloud journey, the operation engineering team focus remained consistent: deliver a cost effective and stable infrastructure. The challenge of scaling thru hyper-growth is real, serving hundredths of billions HTTP requests a day, with large volume of data flowing, and low latency required. Beyond the multi-cloud discussion, our team approaches the challenge as part of their global infrastructure automation effort. After going thru many TCO analysis, research and development efforts, the team delivered a final, but always evolving, implementation of a multi-cloud solution on top of a mix of Public Cloud services and Private Cloud solutions, based on OpenStack, with Public Cloud bursting capabilities. This talk will challenge your cloud strategy by exposing how TubeMogul, now the Adobe Advertising Cloud, moved part of its critical workload back from Public Cloud to an in-house, opinionated framework, based on a hybrid of bare metal and OpenStack, with a large touch of automation.
After successfully moving a large workload from a Public Cloud to an OpenStack Private Cloud, the former TubeMogul Operations Engineering team tackled its next important step toward Cloud Bursting. While experiencing hyper-growth on the Adobe Advertising Cloud, the team had to figure out a simple way to quickly provision new compute resources. Our latency critical workload need our core private resources while some workload can safely leverage public cloud. Cloud Bursting helped to ensure rapid support of the business and provide a more flexible capacity planning strategy. Being able to burst some workload back to the Public Cloud allowed the team to leverage the best of public and private cloud.
It can be easy to come up with a TCO analysis that would challenge any public cloud and make you think, "let's go in-house!" What are the challenges and is it really worth it? The TubeMogul Operation team went thru the technical challenges at building a private cloud. In this presentation you will learn how the team went from a R&D to an automated deployment of a bare-metal servers to finally migrate a large workload from a Public Cloud to its own Private Cloud infrastructure. We will detail how the team dealt with unexpected issues and also how we chose the hardware, estimated capacity, stay cost effective, improve overall performance of the system, and bring better control and visibility.