Career Profile

, a Cloud Technology Leader, became Director of Operations Engineering at Adobe (NASDAQ: ADBE) after the acquisition of TubeMogul (NASDAQ: TUBE). As TubeMogul's sixth employee and first operations hire, Nicolas has built and grown Adobe/TubeMogul's infrastructure over the past ten years from several machines to over eight thousand servers that handle ±350 billions requests per day for clients like Allstate, Chrysler, Heineken and Hotels.com.

Adept at adapting quickly to ongoing business needs and constraints, Nicolas leads a global team of site reliability engineers, cloud engineers, software engineers, security engineers, and database architects that build, manage, and monitor Adobe Advertising Cloud's infrastructure 24/7 and adhere to "DevOps" methodology. Nicolas is a frequent speaker at top U.S. technology conferences and regularly gives advice to other operations engineers. Prior to relocating to the U.S. to join TubeMogul, Nicolas worked in technology for two decades, managing heavy traffic and large user databases for companies like MultiMania, Lycos and Kewego. Nicolas lives in Danville, CA and is an avid fisherman and aspiring cowboy.

Highlights:

  • Built from the ground up and lead a global team of 60 operations engineers (FTE, vendors worker, contingent workers)
  • Global Team with staff in 4 different timezone (Ukraine, China, India, US) to ensure 24/7 support (Follow The Sun)
  • Support a ±250 global product and engineering team
  • Built and support a ±8,000 assets infrastructure with 6 datacenter locations in US, Europe, and APAC.
  • Built a multi-cloud solution with cloud bursting capabilities to support product scale and latency requirements
  • Design and deployed a solution to deliver services in Mainland China with a POP in Beijing and direct connectivity to HKG Data Center
  • Responsible for infrastructure P&L with goal on TI cost as percent of Gross Profit
  • Define strategy and tactical plan to ensure SOC2/ISO/SOX compliance

Technologies: Linux, Puppet, Python, Ruby, PHP, Java, Go, Jenkins, Graphite, Ganglia, Grafana, Nagios, Sensu, AWS, HAproxy, OpenStack, Zookeeper, Kafka, Couchbase, MySQL, ElasticSearch/ELK, Splunk, HBase, Hadoop, Ubuntu, Debian, Docker, Container, Kubernetes, KVM, TCP/IP, Open vSwitch, etc.

Public Talks & Papers

Use of Self-Healing Techniques to Improve the Reliability of a Dynamic and Geo-Distributed Ad Delivery Service. (Won Best Disruptive Idea Award)

October 15-18th, 2018 at, the 29th IEEE International Symposium on Software Reliability Engineering (ISSRE 2018), Memphis, TN

The advertising industry faces numerous challenges in achieving its goal of targeting a given audience dynamically and accurately in order to deliver a meaningful brand message. Near real-time, low latency delivery of dynamic content, the sheer volume of information processed, and the sparse geographic distribution of the intended eyeball traffic all drive the complexity of building a successful experience for the end user and the brand. Additionally, the competitiveness of the industry makes it critical to preserve low operational expenses while delivering reliably at scale. In attempting to address the above, we have found that a distributed infrastructure that leverages public cloud providers and a private cloud with open infrastructure technologies can deliver dynamic advertising content with low latency while preserving its high availability. But network or physical utility infrastructures can’t be relied on to ensure the service dependability. We show that the complexity of the networks, the sparse geographic distribution of eyeballs, the risk of data center failures, and the increase of encrypted transactions call for thoughtful architectures. The introduction of modern practices, failure injections, and self-healing mechanisms allowed us to improve the service fault tolerance while optimizing for latency and significantly improve our service reliability.

Abstract | Award | Teaser |

Adobe Advertising Cloud: A Lean Puppet Workflow to Support a Multi-Cloud and Cloud-Bursting Infrastructure

October 12th, 2017 at PuppetConf 2017, San-Francisco, CA

Building and scaling a multi-cloud solution that's enabled for cloud bursting is not a trivial task, and requires a lot of automation. While experiencing hyper-growth on the Adobe Advertising Cloud, our operations engineering team had to frequently update and improve its workflow in order to stay nimble and allow fast delivery of new infrastructure. At TubeMogul/Adobe Advertising Cloud, we implemented a lean Puppet workflow that enables the operations engineering team to deploy and support a broad range of services in a complex environment that supports hundreds of billions of requests a day. With over 150 changes released per day on its production infrastructure, the team had to adjust and tune its processes to enforce quality, standards, to review, and to prevent systems from breaking. In this talk, you will learn how we implemented our infrastructure as code by leveraging tools like Puppet, Gerrit, Terraform, and Jenkins, which together enable our private and public cloud infrastructures across 12 locations and four continents.

Abstract | Slides |

The Five Steps to Building a Successful Private Cloud

September 25th, 2017 on InfoQ

Increased competition among public cloud vendors, territorial regulations, and business demands have all contributed to a rise in multi-cloud strategies. In this article, Nicolas Brousse from Adobe explains five key components of successful private cloud implementation.

Full Paper |

See All Archived Public Talks And Papers