Incorporating Sustainability into Data Architectures and Software

April 25, 2023

Sustainability Blog Banner

Data Architecture Choices for Lower Environmental Impact

In the quest for lower emissions and reduced energy consumption, companies can use data ─ or, more specifically, data architecture. Data architecture refers to how data is managed throughout its lifecycle, from collection through consumption.

By applying a “sustainability” lens to decision-making regarding how data is handled, organizations can make more environmentally conscious choices. The following provides a few examples.

1. Smarter Data Storage

    It takes energy to run data storage, whether on-premise or in the cloud. Reduce the amount of data you have, and you reduce the amount of storage required ─ as well as the energy to power that storage. Deduplication and other types of software can help identify unnecessary data.

    Implementing data classification and retention policies can also help by defining what data to store, where, and for how long. The idea is to minimize data movement, the overprovisioning of storage, and not retain data any longer than necessary.

    2. Modern Application Design

      Applications use and process data ─ in many cases, a lot of data and considerable energy-powered resources. Many legacy monolithic applications, comprised of a single unified unit, can be modernized using more resource-efficient microservices.

      Microservices architecture breaks down monolithic applications into a collection of smaller independent units. Microservices are independently scalable and can be individually configured, so fewer resources are wasted. In addition, they can be auto-scaled to handle traffic peaks, so fewer resources are sitting around idle the rest of the time. Individual VM sizes can also be reduced to maximize utilization and lessen energy consumption.

      The catch is that by decomposing a monolith into microservices, the handful of in-app connections is replaced by a lot of microservices, all talking to each other over various networks and increasing network traffic. To prevent this, use the most appropriate transfer protocol for the traffic; consider implementing services in gRPC rather than REST and compressing large payloads before sending them over the wire.

      How you deploy apps makes a difference too. For example, deploying applications in containers will generally lead to a lower total energy cost than deploying in VMs because containers don't require hypervisors.

      3. Green Coding

        Regarding app coding, some coding languages are greener than others. One research project that measured programming languages’ energy efficiency based on energy, memory used, and run-time deemed C, C++, and Rust to be the greenest. However, according to the Computer Language Benchmarks Game analysis, Java is one of the fastest and most energy-efficient object-oriented programming languages. (Note: Comparing software languages is complicated and complex, and the fact is that a program written in a given language can become faster through improvements to the source code or optimizing libraries or the compiler.)

        But “green” coding isn’t just about programming language. To reduce the amount of energy needed to process code, developers can adopt less energy-intensive, lean coding principles into their DevOps lifecycle. This entails using a minimal amount of processing for delivering an application.

        Lean coding also can reduce unnecessarily long or slow code that uses resources inefficiently. Open-source code can contribute to software bloat because it’s designed to serve a broad range of applications. As a result, a lot of code goes unutilized for the specific software.

        4. Communications Cutbacks

          Communication between an application’s internal and external services and between the application and its end users accounts for the majority of energy usage. Cutting down on the volume of data requests and reducing redundant information can generate significant energy savings.

          One way is to reduce data transfer by specifying the data to be included in each payload on the server side and removing redundant communication. This also works for internal services, such as communication between microservices. The volume can be reduced by merging microservices that solve similar problems or limiting the services involved in each task.

          5. Serverless Computing

            Serverless computing refers to the cloud execution model in which the cloud services (CSP) allocate resources on demand, handling the services for customers. Servers are still used to execute code for developers, but the developers don’t have to worry about configuration, management, maintenance, etc., of the infrastructure.

            Computing is done in short bursts with the results persisted to storage. When an app isn’t in use, no computing resources are allocated to the app. By utilizing shared infrastructure and reducing idle resources, serverless computing is a more environmentally friendly option than traditional server-based architectures. (Note: containers have sustainability benefits too.)

            6. Optimized Data Ingestion

              Data ingestion is the process of importing large, assorted data files from multiple sources into a data warehouse, database, or other single, cloud-based storage medium where it can be accessed and analyzed. It can be energy-intensive, particularly if you’re dealing with a lot of data. However, there are ways to improve the process.

              Start by avoiding unnecessary data ingestion. Look at your business needs and determine what datasets you need. If you can use existing publicly available datasets that are already cleaned and curated, you can avoid duplicating the computing and storage resources required to ingest that data.

              You can also reduce the size of data before ingestion by using strategies such as compression, filtering, and aggregation to reduce the size of ingested data. This will permit smaller data sizes to be transferred over the network and stored in the data lake.

              To extract and ingest data from data sources such as databases, use change data capture (CDC) or date range strategies instead of full-extract ingestion. You can also use event-driven serverless architecture for your data ingestion so it only provisions resources when work needs to be done

              More Sustainability Tips to Come

              US Signal is committed to helping our customers achieve their sustainability goals and reap the rewards of energy efficiency. We’ve got many more suggestions, so watch for more tips for doing so in future blogs.