Data Warfare: How Big Data and Data Engineering are Revolutionizing Modern Combat.

8 min readApr 30, 2023

As modern warfare continues to evolve, so do the technologies and strategies used on the battlefield. Today, warfare has become highly technical with a need for quick data-driven decisions. Soldiers and combat vehicles have now become an integrated part of a complex web of data systems.

One of the most significant developments in recent years has been the rise of big data and data engineering, which are revolutionizing the way militaries collect, store, process, and analyze data. With the ability to gather and process vast amounts of information in real-time, big data and data engineering have the potential to provide critical insights and decision-making capabilities on the battlefield, ultimately giving military leaders a significant advantage over their adversaries. In this article, we will explore the role of big data and data engineering in modern combat and how they are changing the face of warfare.

Military Comms and network in combat scenarios. Source: codancomms.com/sectors/military

In recent times, some of these technologies have been used in interesting ways. For instance, big tech companies, financial institutions, and government agencies use it to power data engineering pipelines and support massive data-intensive applications. Facebook, Netflix, and Amazon leverage these technologies to gain insights, enhance efficiency, and drive growth. Financial institutions use them for real-time fraud detection and risk management, while government agencies use them for cybersecurity and intelligence gathering. Data engineering use cases involve ingesting, processing, and analyzing massive volumes of data in real-time, enabling organizations to make data-driven decisions that can drive growth, improve efficiency, and enhance security.

Cutting-edge computing and routing capability. The SAVOX Warrior Core System offers system modularity offering simple and easy integration of new systems. Source: info.savox.com/savox-tactical-information-system

There are some data engineering workflows that can be useful to combat control in militaries, namely:

Real-time data processing: With the help of streaming data processing tools such as Apache Kafka and Apache Spark, militaries can collect, process, and analyze data in real-time. This can be useful for monitoring troop movements, detecting enemy activity, and identifying potential threats on the battlefield.

Data integration: Data integration tools such as Apache NiFi and Talend can help militaries bring together data from various sources, including sensors, drones, satellites, and other military systems. This can help create a unified view of the battlefield, enabling military leaders to make more informed decisions.

Data cleansing and normalization: Military data can be messy and come in various formats. Data engineering workflows can help clean and normalize this data, making it easier to work with and analyze. Tools such as Apache OpenRefine and Trifacta can be useful for this purpose.

Machine learning and AI: Military data can be used to train machine learning models that can help predict enemy movements, identify potential threats, and optimize military operations. Tools such as TensorFlow and PyTorch can be used for this purpose.

Data storage and retrieval: With the help of distributed storage systems such as Hadoop Distributed File System (HDFS) and Amazon S3, militaries can store and retrieve large amounts of data quickly and efficiently. This can be useful for storing intelligence reports, drone footage, and other critical military data.

Special Series — AI and National Security — warontherocks.com

Eric Feige wrote an article in 2020 arguing that the US Army is in need of full-stack data scientists and analytics translators. Eric claims that the US Army must prioritize the training and deployment of full-stack data scientists and analytics translators in order to gain an advantage over rival nations in the modern battlefield. The ability to rapidly process and analyze large volumes of data is crucial in making quick data-driven decisions and gaining an edge in future conflicts. Full-stack data scientists and analytics translators can assist in various domains including social media analysis, computer vision algorithms for terrain and enemy locations, and predicting drivers of poor soldier performance. He argued that it is critical that the Army invests in training uniformed soldiers in this critical skill and that it develops formal courses to ensure a long-term solution. Failure to do so would cede the advantage to China, Russia, and other nations that continue to invest heavily in AI and analytics.

Kafka and Spark are two powerful data engineering pipelines that enable organizations to process large volumes of data in real-time. Kafka is a distributed messaging system that is capable of handling millions of events per second, while Spark is a fast and powerful data processing engine that enables the execution of complex data transformations. Together, these technologies provide organizations with the ability to ingest, process, and analyze large volumes of data at scale, and transform that data into meaningful insights quickly. By leveraging Kafka and Spark, organizations can unlock the value of their data, and gain a competitive advantage in today’s fast-paced business environment.

Apache Kafka’s continuous ingestion capabilities make it an ideal solution for streaming massive volumes of data in real-time, and when combined with Spark’s Structured Streaming, it provides unparalleled real-time processing capabilities. Incorporating SSL encryption, compression options, and other features, Kafka enables organizations to ingest and stream data at incredible rates, making it an indispensable tool for combat scenarios that require real-time insights from vast amounts of data. With Spark’s ability to handle petabytes of data and thousands of nodes within a cluster, it’s capable of scaling to meet the demands of even the largest data-intensive applications. This powerful combination can allow military commands to gain a competitive edge, enabling them to make data-driven decisions and stay ahead of the curve.

Spark doing real-time processing of sensor-ingested data from Kafka brokers — Source Author of the article.

Apache Spark is a powerful distributed computing framework that offers high performance and fault tolerance for large-scale data processing. It provides low latency and high throughput by caching data in-memory and enabling efficient data processing across nodes in a cluster. Spark’s fault tolerance is achieved through its resilient distributed datasets (RDDs) that allow for automatic recovery of lost data partitions. Its scalability is demonstrated through its ability to handle petabytes of data and thousands of nodes in a cluster. Spark’s speed is due to its ability to perform batch processing, stream processing, and machine learning tasks efficiently. Load balancing is achieved through Spark’s dynamic allocation feature, which automatically adjusts resource allocation to optimize performance. Integration with various data sources, including Hadoop, Cassandra, and Amazon S3, is facilitated through Spark’s connectors. Overall, Spark is a versatile and robust platform for data processing and analysis, offering advanced capabilities for handling large and complex datasets.

Spark in action. Console displays for live processing of data. — Source Author of the article.

There are various ways in which data can be collected from troops and companies in a battle and sent to Kafka/Spark for analysis. One common method is through sensors and IoT devices that can be attached to soldiers, vehicles, and other equipment to collect real-time data on their location, movement, and vital signs. This data can be sent wirelessly to a central hub or data center, where it can be ingested by Kafka and processed by Spark for real-time analysis.

Live monitoring of processed data — Source: Author of this article.

Another method is through manual data entry by soldiers and personnel on the battlefield. This data can be entered into a mobile app or other software application that can then send the data to Kafka for ingestion and processing by Spark.

In addition, data can also be collected from other sources, such as satellite imagery, weather data, and social media feeds, which can be ingested into Kafka and analyzed by Spark to provide situational awareness and insights into battlefield conditions.

Savox Mounted Soldier System, Soldier Interface for Mission-Critical Information. Source: info.savox.com/savox-tactical-information-system

Savox Dismounted Soldier System, Soldier as an Integrated Part of Data Systems. Source: info.savox.com/savox-tactical-information-system

Central commands of armies can use these technologies in a variety of ways, including:

Real-time intelligence gathering: Spark’s stream processing capabilities and Kafka’s continuous ingestion allow for real-time analysis of intelligence data, enabling command centers to stay ahead of threats and make timely decisions.
Cybersecurity: Kafka’s ingestion capabilities can be used to ingest log data from various sources, and Spark can analyze that data in real-time to identify potential security threats and respond to them quickly.
Logistics optimization: Spark’s machine learning capabilities can be used to optimize logistics operations by analyzing supply chain data and predicting demand, enabling command centers to make better decisions and improve efficiency.
Sensor data analysis: Kafka’s ingestion capabilities can be used to ingest data from sensors on the battlefield, and Spark can analyze that data in real-time to provide insights into troop movements, weather conditions, and other factors that can impact military operations.
Simulation and training: Spark’s ability to handle massive amounts of data and perform complex computations can be used to power simulations and training exercises, enabling soldiers to train in realistic scenarios and improve their readiness.

Predictive Analytics in the Military — Current Applications — Image source: emerj.com

The data required for these processes can be collected from sensors that provide real-time awareness, improve decision-making, and enhance the safety and effectiveness of soldiers in the field.

The range of technical considerations for a real time physiological status monitoring system includes much more than the algorithms and physiological measurements discussed in this paper. Close partnerships with the developer and user communities are necessary to the actual implementation of a soldier useable system. Source: Reed Hoyt (USARIEM) and Jeffrey Palmer (MIT Lincoln Labs), unpublished.

Concepts for real time physiological status monitoring (RT-PSM) include a common sensors and communications architecture for a system that supports soldier readiness status and performance, and will also be able to support medical needs. Source: Friedl

Some of these sensors could be; GPS sensors to provide real-time location information about soldiers in the field, Biometric sensors to measure vital signs such as heart rate, respiration rate, and body temperature, Acoustic sensors that can pick up audio signals and transmit them back to the central command, Environmental sensors capable of detecting environmental factors such as temperature, humidity, and air quality. Imaging sensors to capture images and video of the soldier’s surroundings, which can be useful for situational awareness and intelligence gathering and Chemical sensors to detect the presence of hazardous materials or chemicals in the environment.

Raytheon’s FoxTen Open Intelligence Platform for the U.S. Army

In summary, the integration of technologies like Kafka and Spark into military operations has transformed the way warfare is conducted. By collecting and analyzing real-time data from troops, vehicles, and other equipment on the battlefield, commanders can make informed decisions and adjust their strategies in real-time to respond to changing conditions.

With Kafka, data can be ingested and processed in near real-time, allowing for immediate situational awareness and faster decision-making. This enables troops to respond more quickly to threats and take advantage of new opportunities on the battlefield.

Moreover, Spark’s advanced analytics capabilities allow for the processing of massive amounts of data, enabling commanders to identify patterns and insights that would be impossible to discern with traditional methods. This can provide a significant advantage on the battlefield, allowing for more effective use of resources and personnel.

Finally, the integration of Kafka and Spark into military operations has and will continue to transform warfare by providing unprecedented situational awareness and analytical capabilities. By leveraging these technologies, commanders can make faster, more informed decisions, and gain a decisive edge in battle.

Data Warfare: How Big Data and Data Engineering are Revolutionizing Modern Combat.

Written by Chuka J. Uzo

No responses yet