Data management and security remain unfinished projects for many companies, hindering the training of AI models with reliable and accessible data. To get generative AI out of labs and PoCs, we will need to overcome these challenges…
The year 2023 was marked by the appearance of ChatGPT and generative artificial intelligence. The prospect of profit has led tech giants to invest heavily, accelerating their own efforts, creating a tidal wave of innovation that promises to reshape the way companies and users harness the power of technology to increase productivity.
But what we see so far is just the beginning! AI will only be fully realized when it can meet the challenges of data reliability, development costs and compliance. Today, it would be a question of going beyond the laboratory phase to achieve large-scale implementation.
Reliable data for reliable results
In addition to discussions about the threat of artificial intelligence to the sustainability of the human species, the use of artificial intelligence for geopolitical and military purposes, the misuse of artificial intelligence by computer hackers, the incredible potential of artificial intelligence… it is appropriate to approach the topic from the perspective of the limitations to the realization of AI. Reflection would prosaically consist of asking oneself if it is possible to transition from science fiction to science, which fortunately cannot do everything.
Not wanting to exhaust the topic immediately, let’s confirm itno organization can hope to create value or use data as a decision-making tool if AI models are trained on inappropriate data.
In the flood of literature or commentary on artificial intelligence, we forget, consciously or not, to remember that data management and security issues are still in the project phase for many organizations. How can we talk about artificial intelligence if we do not address the issue of data availability and quality. It’s almost as if we perceive AI as a beautiful object, and we are not at all interested in the processes and conditions of its production. It is true that many companies have large reserves of critical data. But many of these sources are locked in silos and the costs of integrating them are difficult to bear.
What needs to be understood here is that the data needed to train artificial intelligence is certainly available, but it is often unavailable and often lacks context… Let’s add that these data, depending on the geographical area, are subject to a legislative framework that limits or regulates their processing (GDPR, DSP2 or even NIS2). Should we turn a blind eye and use sources of questionable quality? We know that “poor quality” data leads AI to formulate hallucinatory answers, seemingly correct but unacceptable.
Let’s remember that too the race for innovation that is currently taking place has a cost that few organizations can bear, without significant long-term investors. And as proof, a state-of-the-art graphics processing unit (GPU), designed specifically to run large language models (LLM), costs around $30,000. An organization that wants to train a model with, say, 175 billion parameters might need 2000 GPUs… The solution? Companies might resort to outsourcing to reduce development costs or compensate for unavailable knowledge and experience. This outsourcing also has its limitations: confidentiality of data shared with third parties, regulatory compliance, cybersecurity risk, etc.
From lab to production, via the cloud
At this stage we understand that would be a question separate the potential and feasibility of artificial intelligence.
What if we assumed that the cloud could enable reliable and cheaper artificial intelligence? Today, cloud providers have GPU resources to enable companies to scale their GenAI projects and pay only for what they use. This would allow the model to be experimented with and “turned off” after experiments have been conducted, rather than having to provision GPUs in local environments.
The advantages are immediately visible: reduction of research and development costs, the possibility of conducting tests and internal adoption of the process if the results are final. Cloud adoption could come down to concepts Test and learn or Sandstone. Realize that the company would have minimal risk taking, while having the ability to test the data and evaluate possible uses, before exploiting it in the market.
” Return your work to the loom twenty times: keep polishing it and polishing it again; Add sometimes, delete often », suggests the writer and poet Boileau in Poetic art, a didactic poem in which he shares, say today, his good practices for approaching perfection in writing. This principle could just as easily be applied to AI if we follow these five rules:
To shape : First, create a modern data architecture and a universal enterprise data network. There are two benefits: hosted on-premises or in the cloud, this will allow an organization to gain visibility and control over its data, and also help establish a single ontology for mapping, securing and compliance across data silos. Look for tools that not only meet current business needs, but also have the ability to adapt to market developments. Open source solutions often offer more flexibility.
Refine : Then refine and optimize the data based on existing business needs. At this stage, it is important to predict future needs as accurately as possible. This will reduce the chances of migrating too much unnecessary data, which will not add any value but could significantly increase costs.
Identify : identify opportunities to use the cloud for specific tasks. A workload analysis will be helpful here to determine where the value might be greatest. It’s about connecting data across different locations—either on-premises or across multiple clouds—to optimize a project.
Experiment : Try pre-built GenAI frameworks and third-party frameworks to find the one that best suits your business needs. There are several, including: Bedrock from AWS (Hugging Face), OpenAI from Azure (ChatGPT), and AI Platform from Google (Vertex). It is important not to rush the choice. The model must be tightly integrated with existing business data to ensure a chance for success.
Scale and optimize : Finally, after choosing a suitable platform, consider choosing one or two use cases to scale into a production model. Constantly optimize the process, but monitor GPU costs. As an organization’s GenAI capabilities begin to grow, you’re looking for ways to optimize their use. A flexible AI platform is critical to long-term success.
We understand that the business and technology worlds are excited about the potential of AI (improved customer service, supply chain optimization, acceleration of DevOps, etc.). But all this emulation must not make us lose our sense of reality by underestimating the road ahead. AI has certainly already enabled significant paradigm shifts in key sectors of our economies. However, organizations still need to review their copies to implement a sustainable cloud data roadmap.
___________________
Per Sophie Papillonregional vice president of France and the Maghreb, Cloudera