Public Talks & Papers

Adobe Advertising Cloud: A Lean Puppet Workflow to Support a Multi-Cloud and Cloud-Bursting Infrastructure

October 12th, 2017 at PuppetConf 2017, San-Francisco, CA

Building and scaling a multi-cloud solution that's enabled for cloud bursting is not a trivial task, and requires a lot of automation. While experiencing hyper-growth on the Adobe Advertising Cloud, our operations engineering team had to frequently update and improve its workflow in order to stay nimble and allow fast delivery of new infrastructure. At TubeMogul/Adobe Advertising Cloud, we implemented a lean Puppet workflow that enables the operations engineering team to deploy and support a broad range of services in a complex environment that supports hundreds of billions of requests a day. With over 150 changes released per day on its production infrastructure, the team had to adjust and tune its processes to enforce quality, standards, to review, and to prevent systems from breaking. In this talk, you will learn how we implemented our infrastructure as code by leveraging tools like Puppet, Gerrit, Terraform, and Jenkins, which together enable our private and public cloud infrastructures across 12 locations and four continents.

Abstract | Slides |

The Five Steps to Building a Successful Private Cloud

September 25th, 2017 on InfoQ

Increased competition among public cloud vendors, territorial regulations, and business demands have all contributed to a rise in multi-cloud strategies. In this article, Nicolas Brousse from Adobe explains five key components of successful private cloud implementation.

Full Paper |

From Start-Up To Fortune 500: Building A Lean Multi-Cloud Solution

June 9th, 2017 at Global Software Architecture Conference, Santa Clara, CA

The success of Public Cloud is not questionable. It enables companies to accelerate their product development velocity and their time to market with low operational frictions. Many vendors have tried to tap this market with various Private Cloud solutions, with questionable success. In this case study, we will cover the true story of a small start-up, TubeMogul, becoming big, to build the foundations of the Adobe Advertising Cloud. Thru its cloud journey, the operation engineering team focus remained consistent: deliver a cost effective and stable infrastructure. The challenge of scaling thru hyper-growth is real, serving hundredths of billions HTTP requests a day, with large volume of data flowing, and low latency required. Beyond the multi-cloud discussion, our team approaches the challenge as part of their global infrastructure automation effort. After going thru many TCO analysis, research and development efforts, the team delivered a final, but always evolving, implementation of a multi-cloud solution on top of a mix of Public Cloud services and Private Cloud solutions, based on OpenStack, with Public Cloud bursting capabilities. This talk will challenge your cloud strategy by exposing how TubeMogul, now the Adobe Advertising Cloud, moved part of its critical workload back from Public Cloud to an in-house, opinionated framework, based on a hybrid of bare metal and OpenStack, with a large touch of automation.

Abstract | Lanyrd |

Adobe Advertising Cloud: The Reality of Cloud Bursting with OpenStack

May 8th, 2017 at OpenStack Summit, Boston, MA

After successfully moving a large workload from a Public Cloud to an OpenStack Private Cloud, the former TubeMogul Operations Engineering team tackled its next important step toward Cloud Bursting. While experiencing hyper-growth on the Adobe Advertising Cloud, the team had to figure out a simple way to quickly provision new compute resources. Our latency critical workload need our core private resources while some workload can safely leverage public cloud. Cloud Bursting helped to ensure rapid support of the business and provide a more flexible capacity planning strategy. Being able to burst some workload back to the Public Cloud allowed the team to leverage the best of public and private cloud.

Abstract | Slides | Recording | Lanyrd | #theCUBE (Youtube) |

Moving Large Workloads from a Public Cloud to an OpenStack Private Cloud: Is It Really Worth It?

July 12th, 2016 at USENIX SRECon16Europe, Dublin, Ireland

It can be easy to come up with a TCO analysis that would challenge any public cloud and make you think, "let's go in-house!" What are the challenges and is it really worth it? The TubeMogul Operation team went thru the technical challenges at building a private cloud. In this presentation you will learn how the team went from a R&D to an automated deployment of a bare-metal servers to finally migrate a large workload from a Public Cloud to its own Private Cloud infrastructure. We will detail how the team dealt with unexpected issues and also how we chose the hardware, estimated capacity, stay cost effective, improve overall performance of the system, and bring better control and visibility.

Abstract | Slides | Recording |

Look back, look now, see forward

June 22nd, 2016 at Velocity Santa Clara 2016, Santa Clara, CA

The Internet is increasingly complex and routinely experiences outages, instabilities, and attacks. While cloud providers, CDNs, and acceleration services may claim to be always available, that doesn’t mean that they’re “always reachable.” In fact, they are almost certainly experiencing a constant rate of low-level failure that is largely outside of your control yet is still impacting your users. But what Internet performance management tools are available to help you obtain visibility and shorten the mean time to innocence? What’s going on right now in the CDN, cloud, and Internet transit performance space?

Abstract | Recording |

Mega Volume: How TubeMogul Leverages NetSuite

May 18th, 2016 at SuiteWorld16, San Jose, CA

TubeMogul handles over one trillion ad auctions per month. This session will showcase the successful integration that helped TubeMogul improve its order, fulfillment, and payment workflow of every one of those auctions. You'll be given the complete breakdown on the end-to-end workflow, as well as the technical and organizational implementation challenges overcome in handling this tremendous volume.

Abstract | Slides |

Moving Large Workloads from a Public Cloud to an OpenStack Private Cloud: Is It Really Worth It?

April 7th, 2016 at USENIX SRECon16, Santa-Clara, CA

It can be easy to come up with a TCO analysis that would challenge any public cloud and make you think, "let's go in-house!" What are the challenges and is it really worth it? The TubeMogul Operation team went thru the technical challenges at building a private cloud. In this presentation you will learn how the team went from a R&D to an automated deployment of a bare-metal servers to finally migrate a large workload from a Public Cloud to its own Private Cloud infrastructure. We will detail how the team dealt with unexpected issues and also how we chose the hardware, estimated capacity, stay cost effective, improve overall performance of the system, and bring better control and visibility.

Abstract | Slides |

How TubeMogul Handles over One Trillion HTTP Requests a Month

November 12th, 2015 at USENIX LISA15, Washington, D.C.

TubeMogul grew from few servers to over two thousands servers and handling over one trillion http requests a month, processed in less than 50ms each. To keep up with the fast growth, the SRE team had to implement an efficient Continuous Delivery infrastructure that allowed to do over 10,000 puppet deployment and 8,500 application deployment in 2014. In this presentation, we will cover the nuts and bolts of the TubeMogul operations engineering team and how they overcome challenges.

Abstract | Slides | Lanyrd |

How TubeMogul Reached 10,000 Puppet Deployment in One Year

May 26th, 2015 at Puppet Camp, Silicon Valley, USA

TubeMogul grew from few servers to over two thousands servers and handling over one trillion http requests a month, processed in less than 50ms each. To keep up with the fast growth, the SRE team had to implement an efficient Continuous Delivery infrastructure that allowed to do over 10,000 puppet deployment and 8,500 application deployment in 2014. In this presentation, we will cover the nuts and bolts of the TubeMogul operations engineering team and how they overcome challenges.

Slides | Recording | Lanyrd |

Improving Operations Efficiency with Puppet

April 17th, 2015 at Puppet Camp, Paris, France

Talk from Puppet Camp Paris 2015 by Nicolas Brousse and Julien Fabre, presenting a Continuous Delivery workflow used by the Operations Teams that allowed them to do over 10,000 puppet changes deployment in 2014.

Slides | Recording | Lanyrd |

Scaling Bleeding Edge Technology in a Fast-paced Environment

October 9th, 2014 at IDCEE 2014, Kiev, Ukraine

In the past 7 years, TubeMogul has been dealing with various load and technologies to end-up with a platform that manage multi-billions requests a day. During this talk, I will introduce you to TubeMogul business and some key points that helped us growing our product and infrastructure thru this unique journey.

Slides | Recording | Lanyrd |

Bringing Business Awareness to Your Operation Team

October 4th, 2013 at Nagios World Conference 2013, Saint-Paul, MN

Managing an infrastructure with a global footprint can be a challenging task for a centralized operation team. Bringing business awareness to your operation is key to success. In this talk you will learn how TubeMogul Operation Team easily manage its on-call rotation, how we centralize monitoring information from multiple datacenter for efficient on-call and how we define our business priority with a business focused dashboard.

Abstract | Slides | Recording |

Optimizing your Monitoring and Trending tools for the Cloud

September 28th, 2012 at Nagios World Conference 2012, Saint-Paul, MN

Nowadays most start-up are using cloud solution, while some will go from public cloud to hybrid solution, all have to deal with fast growing infrastructure. In this presentation, I will go over few solutions that we implemented at TubeMogul while growing from 20 servers to over 700 servers in 4 years and dealing with over 10 billions HTTP requests a day. With so many informations and data every day, it’s hard to get a good read of what really matter and to alert the right person. You will learn how we integrated Nagios with Google Calendar for easy on-call rotation management, how we centralized our Nagios information from 5 different DC to a common dashboard, how we make daily maintenance report for pro-active action.

Abstract | Slides | Recording |

Demo of a Web Application development workflow with Git and Gerrit

February 16th, 2012 at Yahoo Surf Cafe for #lspe Meetup

Scaling on EC2 in a Fast-Paced Environment

December 8th, 2011 at LISA11, Boston, MA

Managing a server infrastructure in a fastpaced environment like a start-up is challenging. You have little time for provisioning, testing and planning but still you need to prepare for scaling when your product reaches the tipping point. Amazon EC2 is one of the cloud providers that we experimented with while growing our infrastructure from 20 servers to 500 servers. In this paper we will go over the pros and cons of managing EC2 instances with a mix of Bind, LDAP, SimpleDB and Python scripts; how we kept a smooth working process by using NFS, auto-mount and shell-scripting; why we switched from managing our instances based on tailor-made AMI/Shell-scripting to the official Ubuntu AMI, Cloud-init and puppet; and finally, we will go over some rules we had to follow carefully to be able to handle billions of daily non-static http request across multiple Amazon EC2 regions.

Abstract | Full Paper | Slides | DBLP |

Monitoring a Cloud Infrastructure in a Multi-Region Topology

September 29th, 2011 at Nagios World Conference 2011, Saint-Paul, MN

Managing a server infrastructure in a fast paced environment like a start-up is challenging. You have little time for provisioning, testing and planning but still you need to get ready to scale when your product reach the tipping point. Amazon EC2 is one of the cloud provider that we use at Tubemogul. Within a growing infrastructure, monitoring over 500 servers across multiple region can be challenging. In this presentation we will go over the solutions we put in place to be able to monitor over 6000 services and 500 hosts in 4 different regions.

Abstract | Slides | Recording |