A short while ago a recent article in Wired described how physicists are about to rule Silicon Valley. The opening of the article resonated strongly with me when I also used to tackle difficult research questions at the world’s most renown laboratories for particle physics: CERN in Geneva, Switzerland and the Fermi National Laboratory in Chicago, IL. For more than 10 years I’ve tried to uncover the origins of our Universe before transitioning to the private sector and joining Blue Yonder, a cloudbased company that delivers automated, machine learning solutions in the Retail space.
Why did I make this move? Has physics become boring? Well, yes and no. The LHC has yet to discover anything new – even the Higgs boson discovered in 2012 was about the same mass it was previously expected to be. In fact, if its mass had been about 10% lighter, it might already have been discovered at the LEP, the predecessor of the LHC at CERN about 10 years ago. My former PhD student Dr Harry Cliff, now curator at the British Science Museum and researcher at the University of Cambridge, has summarized it quite nicely in his TED talk: “Have we reached the end of physics?”
Admittedly, the end of physics has been touted many times – but if the LHC does not discover anything beyond the Higgs boson, it may well be that many open questions about our universe remain unanswered. Not because we haven’t found the right equation to describe the properties of the universe yet – but because fundamentally there is no answer, no underlying reason – blatantly, our very existence may just be a statistical fluke. Time will tell.
Apart from such fundamental deliberations I found that the field of High Energy Physics has become strangely stagnant. Instead of pursuing the latest ideas and developing new revolutionary research methods, physicists became stuck in their own world – while the world outside started to move at an increasingly faster pace.
… to Data Science and Machine Learning Applications
And hence many physicists have turned away from the academic world to seek new challenges. In a way, the emerging field of “Data Science” gives us a new home. There isn’t a universal definition of a Data Scientist yet and indeed many interpret differently what this field entails. For me, the key aspect is to create value from data, either academically by publishing papers and advancing knowledge or in a business setting by helping customers achieve their ambitions or creating these ambitions in the first place. In this sense, Data Science is a very applied science and each project or research question addresses a specific point: Data Science is all about understanding data which has been captured from real-life processes. This includes working with data, developing new algorithms, machine learning techniques and models to generating predictions or inferences using data, which finally enables the ability to create value from data.
Let’s look at the different aspects
- Working with Data
Data Science is all about data – and this data needs to be stored and processed in order to extract any value from it. How do physicists come into play in this environment? In most cases, solutions which are able to handle data processing with the volume, velocity or variety required by data scientists do not exist commercially – or are prohibitively expensive. For example, about half of my own PhD thesis was focused on dealing with the technical aspects of working with data: 15 years back I was part of the central design and development team of FermiGrid and our goal was to develop a globally connected computing infrastructure where the researcher would sit at their desktop (laptops weren’t very common back then), submit their requests – and “The Grid” would make sure that thousands of machine learning jobs would run across TB to PB of data, matching available compute resources with demand, moving the data across the globe automatically as required. Today, we would call it “The Cloud” or “Data Science Platforms”. Later at the LHC I designed a much more efficient file format for the LHCb experiment as it was clear even before the LHC started to produce PB of data that we would not be able to handle it with the present tools. This new method focused on using only relevant data, reducing the amount of data which has to be processed for the actual analysis to about 0.2% of the original data. Hence Big Data becomes Smart Data and much more manageable – although, of course, the buzz word “Smart Data” didn’t yet exist at the time…
At Blue Yonder, several members of the Data Science and software engineering teams are very active members of the open source community as core committers to projects such as Apache Parquet or Apache Aurora and gain international recognition in the community for these and many other contributions.
These anecdotes illustrate an important part of a physicist’s mind set: Instead of detailing requirements and petitioning other departments to develop new software and infrastructure, we try to roll up our sleeves, acquire the necessary skills and get to work.
- Understanding Data
Understanding data is the core of being an (experimental) physicist. The first part of creating value from data is understanding what they mean, what likely (and unlikely!) sources of errors exist in the data, how should errors be dealt with and how should data be cleansed. Understanding the setup and context is crucial to get the most out of the data. Even the best machine learning algorithms assume the data is a fair representation of the physical system, be it a physics experiment or e.g. a supermarket store, this is a natural pre-requisite.To ensure the proper perspective can be placed on data analysis and assessment, relevant domain knowledge must be built up. And while retail domain knowledge for example may not be part of a physicists training – one of the core skills of a physicist is to learn about new areas and dive deep into new fields. Therefore, a physicist applying his skill set to the retail environment, while certainly challenging, is no different than focusing on a new research area within particle physics.
- Machine Learning
Developing new machine learning algorithms and applying them is where physicists excel – the whole idea behind Blue Yonder was built on the fact that we can take all our skills, our own machine learning approach NeuroBayes and bring it to a new field. Prof Dr Michael Feindt, the founder of the company and myself spent many years at the most famous laboratories in the world developing a novel machine learning suite. Finding the Higgs Boson, understanding the fundamental building blocks of our universe, optimizing the replenishment of a supermarket or determining the best price for a specific product at every moment differs only in context, not in method or skill.
- Bringing value to customers
A successful company is not a scientific experiment – this is the first and probably hardest lesson to learn for a physicist. This mentality is not in the repertoire we’re taught at university and more often than not, it’s not our “natural” way of thinking. However, getting the right mix of scientists, engineers and business experts together is one of the crucial pieces of the puzzle in making companies successful. It’s not a one-way road or matter of attaching sales and marketing departments to a scientific core – the journey of Blue Yonder shows that this only works if everybody is dedicated to following the same mission, all teams must work together and pull on the same strings. Data Scientists need to learn to understand how businesses work, how customers tick, what their needs are and how to communicate with them – and business and engineering experts need to understand how Data Scientists think and work, what is feasible in the short term and long term and what may sound great but isn’t a good idea in practice.
In my next blog article I will illustrate how physicists are disrupting retail with a real-world example.