Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News


2008 West
DIAMOND SPONSOR:
Data Direct
SOA, WOA and Cloud Computing: The New Frontier for Data Services
PLATINUM SPONSORS:
Red Hat
The Opening of Virtualization
GOLD SPONSORS:
Appsense
User Environment Management – The Third Layer of the Desktop
Cordys
Cloud Computing for Business Agility
EMC
CMIS: A Multi-Vendor Proposal for a Service-Based Content Management Interoperability Standard
Freedom OSS
Practical SOA” Max Yankelevich
Intel
Architecting an Enterprise Service Router (ESR) – A Cost-Effective Way to Scale SOA Across the Enterprise
Sensedia
Return on Assests: Bringing Visibility to your SOA Strategy
Symantec
Managing Hybrid Endpoint Environments
VMWare
Game-Changing Technology for Enterprise Clouds and Applications
Click For 2008 West
Event Webcasts

2008 West
PLATINUM SPONSORS:
Appcelerator
Get ‘Rich’ Quick: Rapid Prototyping for RIA with ZERO Server Code
Keynote Systems
Designing for and Managing Performance in the New Frontier of Rich Internet Applications
GOLD SPONSORS:
ICEsoft
How Can AJAX Improve Homeland Security?
Isomorphic
Beyond Widgets: What a RIA Platform Should Offer
Oracle
REAs: Rich Enterprise Applications
Click For 2008 Event Webcasts
SYS-CON.TV
Top Links You Must Click On


Large Scale Software Development in Java: Issues and Solutions
Large Scale Software Development in Java: Issues and Solutions

As developers are increasingly using Java for advanced applications, they've become dependent on the availability of scalable technologies and tools to support their development, including quality assurance (QA), testing, maintenance, release and customer support requirements. The technologies available today have been inherited largely from those available for languages such as C and C++, including visual IDEs and a host of other tools that offer a solution to a particular problem. A few tools have been tailored specifically for Java and enhance the strengths of the language (like the incremental IDE VisualAge and InstallShield's installer for Java that allows Java applications to be installed onto any Java-compliant platform).

This article discusses specific issues related to large-scale software development in Java, suggests ways to address them and concludes with an overview of the Metamata (derived from meta-automata) toolsuite's answer to some of these problems.

While large-scale software development in Java faces some of the same issues as those of other languages, some are unique to Java. Factors related to C and C++, such as memory leaks, don't exist in Java because of garbage collection. On the other hand, Java introduces different issues such as thread analysis and memory debugging.

Standard IDEs don't address most of the problems that arise during large-scale software development. In fact, they aren't designed to be a complete solution for large-scale development. Hence their functionality must be augmented with specialized tools.

Organization and Maintenance of Software Components
This is one of the big tasks of large-scale software development. The system must be arranged into a set of small, manageable components that interact with each other. The interaction should take place through well-defined, organized interfaces, simplifying the task of managing and maintaining the components.

Typically, time constraints and insufficient experience combine to introduce defects in the way software systems are architected, leading to decreased quality and larger overheads in managing and maintaining the system.

As an answer to this problem, a number of studies have measured software systems for complexity, which has led to a standardized set of software quality metrics. While there's no substitute for experienced project managers, the metrics do offer insight into assessing software complexity and quality.

It's also important to be able to detect inconsistencies in the program as it changes. Typically, a program may be changed in one place, but the effect of these changes in other places is overlooked. For example, by changing a type it's possible to make an existing type cast located in a different module no longer necessary, and also overlook this type cast.

Time Constraints
A problem faced by large software systems development (in any language) is waiting for the system to be rebuilt after every small change. The time required to rebuild after each change increases with the size of the system, adding up to expensive overhead costs. After a certain point, the necessary, endless rebuilds significantly reduce productivity.

Organizing a system into well-architected components and reusable libraries goes a long way toward solving this problem. Yet developer tools still need to be smart about how much rebuilding they have to do for each small change.

The best solution to this problem is incremental development environments such as VisualAge. They recompile only the minimum amount necessary when a system is changed. This concept of incrementality can also be extended to other activities beyond the standard development steps to include QA, testing and so forth.

Memory Management
Large systems tend to use a lot of memory, and unless it's managed carefully the capacity of the underlying hardware can quickly be exhausted. In systems written in C and C++, the developer has complete responsibility for making sure that unused memory is recycled for future use rather than retained indefinitely. Java addresses this problem with automatic garbage collection, i.e., the Java Virtual Machine periodically searches for memory that is no longer in use and recycles it for future use.

Unfortunately, garbage collection can take a significant amount of time when systems use a lot of memory, severely contributing to performance degradation. Most Java programmers today assume that they have to live with this in large Java programs.

The solution is to actively manage memory, and simply let the garbage collector kick in for the smaller chunks of memory as well as what slips through the cracks of the explicit memory management routines. Hopefully, better garbage collection algorithms will become available shortly in Java Virtual Machines and the problems related to garbage collection will soon be a memory; the next six months will reveal this possibility.

Another memory-related problem specific to Java is leaks due to the unrelinquishment of memory that's no longer necessary. In Java the garbage collector can only recycle memory that isn't being retained by the user program. Memory retained erroneously by the user program will never be collected, even if it's not used anymore.

To solve these problems, debuggers and profilers need to provide specialized features for Java. Debuggers should provide capabilities to determine whether a memory leak is occurring, and profilers should provide insight into the details of memory allocation.

Performance
Clearly, performance is the biggest issue for Java programmers. Performance of Java programs, in relation to C and C++ programs, necessarily suffers because of the following reasons:

  • Java is an interpreted language, and by nature integrated languages run slowly. A lot of work is being done to improve performance while remaining in an interpreted environment (e.g., Just In Time [JIT] compilers). However, performance will never catch up with compiled languages.
  • Garbage collection contributes to performance degradation, especially in large programs.
  • Java is a richer and more secure language than C and C++. It offers features such as serialization and reflection that are inherently inefficient although they significantly enrich the language. Also, Java performs checks at runtime, such as bounds checks for every array reference that causes a degradation in performance.

    Good profilers are important with Java. Information provided by profilers can help developers modify their programs to run faster. Furthermore, a certain amount of code optimization during the compilation process can also improve performance. For example, field access can be inlined, and certain classes and methods can be made final during final packaging of a system.

    Threads
    Given that threads are an integral part of the Java language, there has been a significant increase in thread use to solve problems more elegantly than within a sequential framework. However, programming with threads is inherently more complex than sequential programming because there are more ways in which multiple threads of control can interact with each other and as many ways for things to go wrong. Furthermore, it's usually difficult to reproduce a problem caused by the interaction between threads since multithreaded programs are nondeterministic in their execution.

    A lot of research has been devoted to understanding the issues of multithreaded systems over the past 20 years. The Java language design offers state-of-the-art features based on this research, which does contribute to simplifying thread-based systems. However, a good language alone is not enough - there's also the need for good debugging and analysis tools to facilitate a better understanding of how a multithreaded system works.

    I believe the best way to deal with threads is to have a diagnostic capability in which probes are permanently inserted within the Java program. These probes save information pertaining to program execution, which can then be used later to analyze the program's behavior. This analysis can be performed to ensure that certain properties always hold (e.g., no two threads simultaneously execute a certain portion of code).

    Safety
    When building large systems a lot of assumptions are made regarding how the system works. If the system is correct, these assumptions are met by the system execution. However, since bugs in software are always expected, these assumptions may not always hold. The debugging process essentially means running the system in a controlled manner to determine if these assumptions are met, and looking for ways to make corrections when they aren't.

    In mission-critical systems some assumptions are important and require enforcement. Similarly, in multithreaded systems where it's often impossible to reproduce a problem, the violation of any assumption must be reported.

    Providing diagnostic APIs solves this problem, allowing assumptions to be built into the program as constraints that must hold during the program's execution. Tools to help manage them are needed to encourage users to write such constraints. One important capability necessary to encourage using diagnostic constructs is an easy way to strip out these constructs when it's time to package the system for final shipping.

    Portability
    Compiled Java files can be moved to different platforms and executed using different Java Virtual Machines with no recompilation required. To facilitate portability, the Java language definition has gone to great lengths to specify exactly how a Java program must run. Very little ambiguity remains.

    Only a few problems exist in writing portable Java applications. The most important:

  • Thread scheduling can vary from platform to platform. For example, one platform may provide small-time slicing for thread swapping while others may not. This can cause system liveness to differ on various platforms.
  • Use of platform-specific notation - the most obvious example is to refer to a file as a raw string (such as "C:\METAMATA\Test.java"). Clearly the presence of such a string in a program will cause it to perform poorly on a UNIX platform.
  • Bugs in Java compilers and Virtual Machines can cause an otherwise correct program to behave differently in different environments.
  • User of nonstandard APIs: certain APIs are available on only a few platforms. Making your program depend on such APIs will (obviously) cause porting problems.

    While these problems are really quite trivial when compared to the issues involved in porting programs written in other languages, the promise of "write once, run everywhere" exacerbates these problems when developers expect their Java program to run smoothly everywhere.

    The only real way to solve this problem is to test Java systems on as many platforms as possible. In addition, several heuristics to writing portable Java programs have been developed over the past couple of years. Facilitating portable testing and checking Java programs for certain portability heuristics violations can help developers in writing portable Java programs.

    Developing Multiplatform Applications
    There's no better approach to facilitate this development than to perform development on multiple platforms. Ideally, individual developers should already work on different platforms. It must also be possible for the same developer to move between platforms. The message here is that development must be performed using tools that port to multiple platforms. An IDE that runs on only one platform can be severely constraining on a development team building multiplatform software.

    Platform Accessibilty
    If a Java application is developed, taking care to ensure that it's portable, and then shipped for use by customers on a wide variety of platforms, you can be sure that customers will run the application on platforms you don't have access to. Furthermore, there are bound to be problems reported by these customers. Special care needs to be taken to ensure that it's possible to support them.

    One thing to do is to ask the customer to run a general probing tool that provides full information on the customer's Java environment. It's also useful to have a version of the software that's heavily instrumented and then ask the customer to attempt to reproduce the problem using this instrumented version. Usually, it should simply be the shipped application running with a special environment setting. Then it's possible to study why the application runs differently on the customer's machine.

    Build Reusable Libraries
    Java encourages better organization and maintenance by making it much easier (compared to other languages) to build reusable libraries of software components. Widespread reuse of third-party components is common and developers tend to build ones that are as general and reusable as possible. This leads to a software bloat, which occurs when the system contains a large amount of useful functionality that's never used by the system itself but exists for possible future use. In many cases it's difficult to identify the "system" from a set of reusable libraries.

    As a result, it's necessaray to trim these libraries down to only the essential pieces of code to enable systems to be packaged for release.

    Obfuscation
    Compiled Java code is rather high-level. Hence it's possible for someone to (illegally) reverse-engineer compiled applications. Therefore, care needs to be taken to properly obfuscate the compiled code so that it still runs in the same manner but looks different. There are a variety of ways to obfuscate Java code, from schemes as simple as changing the names of variables to sophisticated schemes where the compiled Java code is encrypted. However, regardless of the scheme used, it must be possible to interpret error messages, stack traces, etc., for customer support purposes.

    Conclusion
    Java has been adopted rapidly by both industry and academia, and software developed in Java is growing in complexity. The challenge to Java-tool developers so far has been simply to keep up with the pace of growth. Now they face a greater challenge of building tools designed to solve specific issues related to software development in Java, rather than simply to retrofit C and C++ technology for Java. Over the next few months we should see many new and exciting tools to solve problems related to, for example, garbage collection, performance and portability.

    One year ago a 100,000-line Java program was large and there were only a handful of them. If a system could handle tens of thousands of lines of code, it was good enough. Today many Java programs exceed 100,000 lines, and soon we should start seeing a few programs reach a million lines of code. Tool builders will therefore be required to scale up their tools to handle such large systems efficiently.

    We should also see new Java Virtual Machines capable of running much faster than current ones and significantly closing the gap between native code and interpreted execution. There'll also be native Java compilers that can compile code to execute natively on a platform-by-platform basis. The performance issues will essentially disappear once this happens.

    These are very exciting times indeed for the Java community and I look forward to a more mature set of Java developer tools at next JavaOne!

    About Sriram Sankar
    Sriram Sankar is vice-president engineering at WebGain, a Java developer products company. Sriram moved to WebGain through the acquisition of Metamata, where he was founder and CEO. Prior to Metamata, Sriram has worked at Sun Microsystems and Stanford University. His technical areas of interest includes software development environments, language design, compiler technology, formal methods, and software testing. Sriram holds a Ph.D. in computer science from Stanford University.

  • In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    While I agree to some of the items that were discussed such as better profiling, logging, debugging, build process etc., writing a program that exceeds 1000's of lines of code isn't an good practice. It is important to design the components to allow for simplicity by using well-known design patterns and common techniques that will solve this problem.


    Your Feedback
    Krishna K wrote: While I agree to some of the items that were discussed such as better profiling, logging, debugging, build process etc., writing a program that exceeds 1000's of lines of code isn't an good practice. It is important to design the components to allow for simplicity by using well-known design patterns and common techniques that will solve this problem.
    Enterprise Open Source Magazine Latest Stories . . .
    Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP). The...
    With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference... We have technical and st...
    AMD said late Tuesday that its chief sales officer Emilio Ghilardi had left the company and that CEO and president Rory Read is going to do his job while a replacement is sought. AMD didn’t say why Ghilardi left but it’s assumed Read wants his own people. Read is relatively new to th...
    During the lifespan of M3 (Monitis Monitor Manager) there has always been something lacking – timers. M3 execution procedure was outlined in this previous article. The execution mentioned in the latter was a one-time-execution, whereas server monitoring requires periodic invocati...
    Red Hat is putting its bought-in Gluster scale-out NAS storage technology, acquired in October, on the Amazon cloud. It’s styled Red Hat Virtual Storage Appliance for Amazon Web Services and other clouds are supposed to follow in short order.
    A new episode of the screencast series is now available at the OpenNebula YouTube Channel. This screencast demonstrates the new easily-customizable self-service portal for cloud consumers. Its aim is to offer a simplified access to shared infrastructure for non-IT end users. The scree...
    Subscribe to the World's Most Powerful Newsletters
    Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
    Click to Add our RSS Feeds to the Service of Your Choice:
    Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
    myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
    Publish Your Article! Please send it to editorial(at)sys-con.com!

    Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021


    SYS-CON Featured Whitepapers
    ADS BY GOOGLE