Does generative AI dream of building apps? The disruptive potential of AI-enabled software engineering

Terry Room

23 May 2023

Like many, I have been keenly watching the recent developments in AI, with the advent of next-generation large language models like ChatGPT. I recently attended a CTO roundtable event where we discussed the transformative power of AI.

One domain we centered on was software engineering and the disruptive potential of AI assistive tools such as GitHub Copilot and Amazon CodeWhisperer.

Some of the questions which drove the debate include:

How will these tools make the creation and support of software more efficient?
Will we need less developers in the future?
Will the lines between no-code, low-code and custom-code apps continue to blur?
What governance should be in play, and what risks must we manage?

Here are some reflections and follow-on thoughts.

The magic of software

Arthur C. Clarke, author and futurist, famously wrote:

“Any sufficiently advanced technology is indistinguishable from magic.”

Initial interactions with large language models (LLMs) such as ChatGPT can feel magical – a far cry from Googling and trawling through links to attempt to stitch together the required information to solve a particular problem. The “magic” taps deeply into our consciousness, into the natural language processing areas of our brain, the core of our cognition and how we communicate with the outside world. Interaction with “talking machines” which seem indistinguishable from a human being has simultaneously and rapidly captured our collective imagination and concern.

As a sense check of the state of the art, I think back to the rise of natural language interfaces and BOTs a few years ago. In some cases, a useful and valuable means of enabling channel shift and accessibility – in others, an added source of digital frustration. The believability of the new generation of language models is on a whole new level of credibility by comparison.

For software engineering, seeing a machine generate code can feel even more magical. Especially when compared to the practices of the founders and pioneers of electronic computing (when bugs were actual bugs!). Prompt-based engineering approaches stand in stark contrast to punching holes in cards and then having to wait a day to be scheduled to see if your program ran without errors.

But the hype around generative AI when applied to app creation requires closer inspection. And a good place to start is to look at some of the current offerings available.

GitHub Copilot

Copilot provides code suggestions based on prompts and code context. Recent updates have filtered insecure recommendations such as hard coded credentials and SQL injection vulnerabilities. GitHub’s research claims that the recommended code acceptance rate has increased from 27% to 35% with these changes (with variance by language) The brand positioning is also worth taking note of – i.e., it is a “copilot” and not an “autopilot’ (more on this later).

CodeGPT is an interesting Visual Studio Code add-in, which has a rich set of generative features such as:

Get Code (generates code based on natural language),

Ask Code (ask any generative question – e.g., “write me C# which validates an email address using a regex”)
AskCodeSelected (ask generative question of any selected code – e.g., “generate .md file with markdown” or “convert this to python”)
Explain
Refactor (refactors code, with recommendations to reduce dependencies, improve readability, etc.)
Document

Interestingly, it plugs into a number of LLMs (OpenAI, Cohere, AI21, Anthropic).

Visual Studio Intelicode

This is the AI-enabled next gen of Intelisense. Intelicode prompts are based on the context of the code you are writing, and not just an indexed list of overloads. It also has features for addressing stylistic consistency across your code in an effective manner. And it will even flag potential typos, like mistakes in variable use. Also, of note is that it runs locally on your dev stack and does not back off to cloud-hosted APIs, a key requirement for highly secure code creation processes in regulated industries such as financial services and health.

Amazon CodeWhisper

Similar in intent to Copilot, providing contextual code complete suggestions.

 CodePal

https://codepal.ai/

An interesting service which offers various code generators, by language, as well as translators, unit test writers, query writers, schema resolvers, and security analyzers.

We’ve also seen generative AI being included in no code/low code apps. Microsoft PowerApps for example provides an OpenAI-powered natural language interface (“create a workflow…”, “build a table…”) to take even more of the toil away from building this type of app, further blurring the boundaries between no-code, low-code, and custom.

Evolution vs Revolution

A little perspective can be helpful sometimes.

Programming has evolved considerably from the first days of electronic computing to harness increasingly powerful hardware and has been applied to create an increasingly complex array of applications. We’ve gone from punch card mainframe systems to highly complex distributed systems.

We evolved to client server, to cloud, and from monoliths to distributed API-enabled microservice architectures.

The evolution of the tools, skills, processes, and practices for the successful creation of these systems had had to keep pace. In fact, it can be argued it has set the pace. The innovations in tooling and practices have had to come hand in hand and provided the ability to exploit increasing computing power.

The modern developer is massively enabled compared to the early pioneers – frameworks and languages, inteli-sense tools, static analysis and linting tools, managed run times, increasingly safe compilers, increasingly powerful code collaboration platforms, security analysis, automated build and release tools, and branch management tools. Yet the demands on the modern developer have increased proportionally – faster, higher quality with fewer bugs, and more secure against increasing cyber threats. And as an overall trend, in the last five years we have seen the increase in that demand accelerate, fueled by mass adoption of cloud computing and the digital transformation of many industries.

So is generative AI a gamechanger, or a next natural step in the computing tradition of the last 50 years?

Productivity

It is also necessary to ask: so, where (and how) does “generative software development” play in the end-to-end value chain? I would start with – if you are looking for productivity gains you should start elsewhere! There is still a lot of productivity upside opportunity in many of today’s enterprises through a rightsized approach to standardization of architecture patterns, tools, platforms, templates, libraries, and methodologies. Furthermore, it is important to look holistically and to frame the problem that actually needs solving – for example:

“How we can deliver digital products which underpin our business strategy faster and at higher quality, whilst being robustly secure?”

Copilot, not autopilot

In his best-selling book Outliers, Malcom Gladwell cited the safety transformation story of Korean Airlines. Central to this transformation was not technology, but dealing with a cultural legacy – in particular, an inbuilt deference to one’s superiors, such as first officers being deferential to the captain, even when it was clearly obvious that the captain was about to make a catastrophic error. Transforming from one of the worst safety records to one of the best was achieved by dealing with this cultural legacy – by training flight staff in clear and concise communication, and empowering staff (and making it an imperative) to validate and challenge each other.

Similarly, when we consider driverless cars, there are many legal and ethical issues to address before vehicles ever become totally driverless. While assistive driving technology can improve safety and has potential to improve road safety on aggregate, there remains the gnarly question of who is responsible in the event of an accident. The driver, the other road user, the software, the model, or the hardware? Are we going to outsource this to our AI lawyers and AI insurance underwriters? Of course not – ultimately the driver must still be accountable.

As a side note, on the state of the art in driverless tech, Ford’s system has just been approved for use on UK motorways. It monitors the driver whilst providing a “hands-off” driving experience. Think “next-gen cruise control” rather than fully driverless.

Ford launches hands-free driving on UK motorways – BBC News

The key point here is that AI should be considered assistive, and never in full control. Even when the AI takes increasing levels of control, the system should still have effective fail safes built in.

The same mindset should apply to the code you create. As the creator you own the app – intent, features, architecture, technology building blocks, and non-functional characteristics – a point made clear on the landing page of GitHub Copilot. And any AI-enabled app architecture must fail safe.

All apps != All code

Another perspective of assistive AI software generation needs consideration. In the technology industry, we have a tendency to talk about code in a somewhat generic sense. To the lay person, code is code. But all apps are not equal. And all code is note equal.

For illustration, consider a payment processing engine, or a software control system for a power station. Both have non-functional characteristics such as extreme availability, robust resiliency features such as idempotency and transaction management, and stringent security controls to protect processes and data from many threat vectors. The consequences of failure of such systems can be severe.

By comparison, consider a field service app – maybe one enabled by a no-code, low-code app platform, where forms, processes, and data storage are generated, and appropriate security and data controls are built in.

The end-to-end process for the construction of these apps is vastly different because they have vastly different required characteristics. Like comparing the creation of a 12-bar blues to a symphony. Yes, they are both still music, in the same way that the software in all systems is still code. The implication is that the mileage of generative software development will and should therefore vary based on the type of app or system that you are developing.

Maintaining the craft of building apps

But there is more to it than this. The actual creation of code is just one part of the process of constructing digital products, albeit an important one. But what about architecture (enterprise, solution, app, system, service, data), what about a security model based on threat models and regulatory compliance needs? More fundamentally, generative AI will not identify the need for an app or platform – what needs does it service, what value does it create, and what investment is required? Nor will it manage the complexity of delivery and execution – which processes should we use, what does the team structure look like, what quality and assurance controls should we apply, how should it be operated, and (most importantly) how will we manage people (strengths, weaknesses, communication, culture, aspirations, hopes, fears)?

It is safe to assume that generative AI will take away some of the toil of the code creation process, allowing the developer to spend their time on higher order and higher value tasks. But we must still endeavor to maintain the craft of building software and not outsource it all to the machines. This is because we still need to know what good looks like. We need to know what secure code looks like. We need to know whether the architecture of the system being created is appropriate and fit for purpose (with appropriate failsafe mechanisms built in). And, of course, we need to guard against “AI hallucinations.” Fundamentally, we should not sleepwalk into the enablement of our developer androids without the right controls being firmly in place – software that generates software (that generates software!?). We must continue to own the craft of creation. Anything else would be denigration of responsibility.

AI-enabled development futures

It is clear that generative AI has high potential to offer efficiency gains and increase developer productivity, and to improve the developer experience.

Whether we will need less (or more) developers is impossible to predict accurately (like most predictions to complex questions!), but it is clear that the developer experience will continue to evolve at pace. The opportunity is there to harness these productivity gains to create better software, and at greater speed. Will we see the rise of the Chief Prompt Engineer? Maybe. But code generation is just part of the story. If our “AI assistant developers” take away some of the toil from repetitive coding, we can focus on the creation of new classes of distributed systems and apps to help solve the prescient issues of our time (sustainability, the environment, food poverty, the health of an increasing and aging population), backed by emergent capabilities such as machine learning and quantum (which your LLM deep learning model will probably not be able to help you with btw!). We can build apps more effectively, where cost and value are more in line, and where risks of delivery and operation are significantly reduced, even against an increased landscape of cyber threats.

But there are many issues to address, and sooner rather than later. It took many (many!) years for the various policy and regulatory frameworks that govern the web to come into place (some would argue even this is by no means done yet). This time, governance needs to move faster, and to keep pace with the innovation in technology.

“Copilot not autopilot” practices should be mandated and need to not just be a safe practice, but one which is backed practically, such as with “generative transparency” tools (where generated parts of a code base are clearly tagged, and therefore validated and tested as such), as well as by appropriate policy controls and regulation. What are the (Hippocratic) responsibilities of the developer (and architect) here? And intellectual property and copyright issues need greater clarity.

Sustainability needs to be addressed (training LLMs uses a lot of compute power, and Moore’s Law may be on the wane!).

Models will need to be diversified, to align with the compliance needs of different industries. And the compliance needs of different industries will need to be redefined.

Does the current generation of generative AI technology dream of building apps? Maybe, but the extent to which we allow that is entirely up to us. We need to maintain control.

Terry Room is a Distinguished Architect with over 20 years of technology industry experience, including the delivery of several mission critical platforms to production. He currently works as a Global Cloud CTO, supporting Capgemini customers in the inception and delivery of digital and cloud transformation programmes of significant value and scale.