We were very fortunate to host 3 talented students for a Data Science internship at MediaGamma. They'd interact directly with our engineering and data science teams to gain real-world experience working as a data scientist.
Tak Loon Ng (University College London, MSc. Web Science and Big Data Analytics) was kind enough to share his experience.
Describe the problem you're working on?
I am working in fraud detection in mobile online advertisement. Publishers (content producers such as newspapers, game developers, etc.) can earn money by displaying ads to interested users. Advertisers are entities that pay money to have their ads shown to a target audience with a view to finding new customers for their products and services.
On the web, it is possible to create fake users that consume ads. It's a problem that costs billion of dollars to advertisers each year. Fake users will not buy their products and services. My research project at MediaGamma consists of detecting mobile devices ad fraud.
What's your output at the end of the internship? I use unsupervised network techniques (co-visit network) to detect ad fraud.
Describe a typical day?
At MediaGamma, I have freedom to work remotely or at their office. I usually go to office because it facilitates interaction with my supervisor at MediaGamma.
My supervisor is helpful to keep me on track and focus on my research topic. He constantly challenges my assumptions and presses me to find evidence for my claims. His practical suggestions are invaluable. Nothing beats experience.
Can you describe any challenges you've had to overcome?
Even with a strong programming background, it is challenging to work with big data. It is worthwhile to learn Spark with a real world dataset. Spark is a programming language especially suited for distributed processing of very large volumes of data. MediaGamma provides me with the processing environment which otherwise would be prohibitively expensive.
The lesson I have learned is to test and fail fast. When doing research on new problems, it is important to test ideas cheaply and move on fast.
What additional skills have you gained from this internship?
This internship provides me a real world experience of a data science project. A mathematical understanding of models and algorithms is essential to interpret their results.
What do you plan to do after the internship? Has this internship changed your mind about your career trajectory? My plan is to apply what I have learned in this internship and in my MSc. to problems in other domains, either at work or data science competitions. It is especially important to read research papers and test new ideas.
Anything else you'd like to add?
The main advantage of working in a startup like MediaGamma is the opportunity to learn the whole process of a data science project. It large companies, there are division of work, and each team might concentrate on a specific part of the project. I also get fast response time from my supervisor and co-workers. Time spent in bureaucracy is minimum.