Photonics often deals with large data –: this is nothing new. Digital images are an obvious example especially in 3D imaging. A 3D image with 2000 x 2000 x 500 pixels, 2 color channels and a 16 bit resolution per channel is 8 GB (gigabytes) in size. If this type of image is captured 100 times, e.g. as a part of dynamic analysis, then we're talking about 800 GB of data. This is hardly a challenge for modern data storage devices, but this data volume could be the result of just a single biological analysis. Performing this sort of experiment every hour using modern imaging systems, such as the ZEISS Lightsheet Z.1 microscope, yields 19 TB of data within 24 hours.
Everyone's talking about big data these days, but what does the term actually mean? The term big data is used in the context of very different data sets, e.g. people's purchasing behavior or movement patterns, the operation or use parameters of machines, financial transactions or health data. It's easy to appreciate how analyzing this data could affect our day-to-day lives, e.g. for constructing models and making predictions. Consequently, the term is closely linked with the social transition to a (digital) information society. Thus it is possible, for example, to trace the spread of flu epidemics based on the search behavior of internet users. However: big data also raises ethical and social questions ("Big Brother"). It can be disconcerting that personality profiles are created using large data quantities.
On an abstract level, big data refers to data quantities that are too large or too complex, change too quickly or are too weakly structured to be evaluated using manual or traditional methods of data processing. The term is often used to encompass the technologies of collecting, processing and evaluating these data quantities too. In addition to raising ethical and social questions, big data also presents us with technical challenges, including (a) working with unstructured data and (b) the scalability of processing in order to handle the ever-increasing data quantities. Traditional methods of processing big data are often based on a relational database model where fixed attributes are assigned to the data sets (called the relational model). A requirement for this approach is temporally stable structured data – i.e. precisely what big data is not. This is why concepts such as machine learning and cloud-based infrastructures have become increasingly important to handle big data.
The large data quantities in photonics mentioned above (especially in optical imaging) tend to be well-structured and consequently this data is – strictly speaking – (traditional) large data rather than big data. This does not mean that there are no challenges when dealing with this data whose volume continues to grow and which can be created more-and-more quickly. Even with modern storage media, there's the issue of saving and transferring such huge data. You can transfer up to 2 GB/s to SSDs (solid state drives) via modern interfaces while writing speeds of up to 0.5 GB/s and storage capacities of up to 16 TB per SSD are possible (this speed and size can be increased further using RAID technology). Fiber optic cables also enable non-local storage via the internet with transfer rates of approx. 120 MB/s. Even if the data is structured, the issue remains how the data can be suitably represented on the storage medium so that it can be accessed efficiently for processing and visualization. These enormous data quantities are not only saved – things get exciting when they are processed and analyzed, i.e. the very reason they're acquired in the first place. Therefore it is vital that data can be efficiently accessed and processed. The I/O (input/output) processes are often the bottlenecks during data processing.
No matter if it's large data or big data – when faced with similar challenges, it's helpful to find similar solutions. Here this includes scalability through parallel architectures. By using parallel architectures, parallel data processing and parallel data storage become possible, albeit only when using suitable parallel algorithmics and interfaces. Often processing techniques do not require the entire data set simultaneously (which would require a shared memory). Instead, a partial data set is often sufficient, and this is where distributed memory comes into play. The latter enables an extremely high degree of parallelism with minimal communication between the parallel threats. Opportunities for data preprocessing and compression (e.g. by eliminating redundant information) should already be utilized during data acquisition. This is where embedded systems – system-integrated platforms on the basis of FPGA, GPU or ARM processors – are particularly useful. These continue to become increasingly powerful and make smart systems possible.
Does big data also have its place in photonics alongside large data? How will the technical developments which accompany big data transform how we deal with traditional large data? And does photonics play an important role for big data?
In view of the fact that there does not seem to be a single area where big data has not left its mark (and if it hasn't happened yet, it will soon), it would be rather odd if photonics did not feature in the discussion of big data. Organizations such as the SPIE (International Society for Optics and Photonics) and the OSA (Optical Society of America) recognize this and have initiated community interactions e.g by launching new conferences like “High-Speed Biomedical Imaging and Spectroscopy: Toward Big Data Instrumentation and Management” as a part of Photonics West or the annual Big Data Photonics Workshop organized by the OSA.
Many applications for optical technologies will be influenced by some form of big data, which will impact the optical technologies themselves. Health care, for example, might undergo a substantial change in the future: diagnostics and the accompanying therapy decisions will benefit substantially from big patient data. This will change e.g. optical diagnostic devices because the role of optical methods will be redefined and new applications will be possible (computer-supported diagnosis, home care, etc.). In many cases, optical methods will depend on large analyzed data sets in order to provide clinically-relevant information. Or to cite a different example: Industry 4.0 will hardly be able to deal with complex processes without big data. Optical sensors will have new jobs to perform and optical measuring systems in manufacturing processes will take on a new role.
And light is central for big data in a lot of different ways: in spite of all the achievements in digital signal transmission, optical systems still have an excellent transmission bandwidth by comparison. For example an optical lithography system can currently transfer data at approx. 1 TB per second. All micro-electrical components for the digital revolution are manufactured using optical lithography. Optical technologies such as DWDM also enable the fast, loss-free transfer of data across large distances (optical telecommunications). And let's not forget: the different applications of photonics are an important source of data, from the photos continuously generated by almost 5 billion cellphone users all the way to the images and 3D point clouds created in production processes by optical measuring systems.
The entire situation can be summed up as follows: increasingly larger, well-structured data (e.g. images) continue to be generated using photonics. This large data will often become a part of big data in the application context, effectively changing the applications and possibilities of photonics in the coming years. The answer to the question posed in the title of this blog post can be obtained by changing one word: there is big data and large data in photonics.