Bigpicture hits milestone with first clinical dataset submission
Bigpicture aims for a repository of 3 million whole slide images (WSIs) for the development of AI algorithms and the acceleration of computational pathology. On the 1st of June, the combined efforts of several work packages, task forces, and nodes, led to an important milestone for the consortium: the first clinical dataset was successfully submitted to the platform! “This milestone represents a significant step forward for one of the main goals of the project", says Joel Pettersson, Senior Information System Specialist at Region Östergötland, Sweden.
Joel is involved in Bigpicture’s work package 3, the skin node, and the metadata task force, thus he has been heavily involved in reaching this milestone. Joel: “This is the first time we tested the system all the way through. From the extraction of data at the slide contributors’ side to the validation of data in Bigpicture’s repository. Reaching this milestone successfully, proves that we have the tools, the knowhow and the competence to realize Bigpicture’s platform; on a regulatory level, being ethically and legally compliant, but also technically."
The road to come to this milestone was not easy, and although Joel mainly focusses on the technical aspects of the project, e.g. the development of tools for data extraction and validation, the legal side of it proved to be just as challenging. Joel laughs: “I’m a bit of an IT nerd, so naturally I’d say we had to solve some of the biggest hurdles in the technical area. But a huge part of the legal ground work was done before I joined the team in September 2021, and without the legal issues being solved I wouldn’t be able to do anything with the data. Both aspects are equally essential to the project.”
It starts with.. legal approvals
The dataset that was uploaded to Bigpicture’s repository includes 80 whole slide images (WSIs) of 8 melanoma (skin cancer) patient cases. The first step was to make an ethical approval. The approval was granted by the national ethical board for the creation and submission of anonymous datasets for use of the bigpicture datarepository. Joel: “We extracted anonymous data, meaning both the WSIs and the connected metadata. How to define data as anonymous from a personal record is a delicate work and there have been a lot of collaboration and dialogue with the internal body of lawyers to reach a mutual understanding of goals and objectives for the dataset to both ensure that we as an organization adheres to regulatory guidelines and deliver maximum value in the datasets. We see these datasets as cornerstones for the development and implementation of AI in clinical digital pathology, with the hope for increased quality of diagnostics to the benefit of the patients.
Extracting and converting the data
One of the first steps that Joel had to take in order to reach the milestone was to map all the existing data in the local database. “It was an old system and we had to do many things manually, so this was really time consuming.” And then there is the extraction and conversion of data: “Within Bigpicture we agreed on using a standard DICOM format. All imagedata had to be converted to this format and in the beginning it took me 30 minutes per WSI. Luckily one of the project partners, Sectra, developed a tool that does the conversion 20 times faster, and this proved to be an essential part for us reaching the milestone.”
WSIs and the metadata arena
Having the WSIs extracted and converted, the teams bumped into their next challenge: the metadata. Metadata tells everything you want to know about the WSIs, e.g. contributor, type of cancer, clinical/non-clinical, etc. Joel: “One of the difficulties in this area is that the same attribute might have a different name in the database. For instance, I may have the data in Swedish, and another will have the data in French or English. And not only on language level, but the same attribute can mean something completely different in different institutes. For instance, a cancer in the right shoulder can be coded as something, but a local variant of this type can also be coded as something entirely different. We need to map all these meanings.”
Foreseeing the complicated nature of defining the metadata on such a scale, a task force was founded to tackle the difficulties. “Within the project we need to make certain we have the same system for coding. And to map all these bits of metadata taking into account the different ontology on all levels, that’s a huge task that hasn’t been done before.”
Bridging the gap between experts
This milestone could not have been reached without the joint efforts of the Bigpicture community. And working with so many people with different backgrounds, is a great learning experience. Joel: “Before I started in Bigpicture, I didn’t know much about pathology. I have a background in computer science. Something that is logical for me, might not be logical for a pathologist and vice versa. To learn from each other and really try to understand each other, is an opportunity that will help us build a big collection of datasets that can be used in the decades to come. Bridging the gaps between the different backgrounds lays the foundation for future scientists and researchers!”
What can we do for other slide contributors
Having the first dataset extracted, converted, and successfully submitted to the Bigpicture repository gives direction to other slide contributors to follow. But how can we help them to smoothen the workflow as best as we can? Joel: “I met with different slide contributors the other day, and I think it would make sense to collaborate from the contributors’ side. We have done the upload and should share our experiences with others as much as we can. To reach this milestone, close collaboration between competent people with diverse back grounds proved to be essential. We should continue in that spirit.”
What’s next?
Though the first dataset has been submitted, it also emphasizes the work ahead. Joel: “You can see this as a version 0.1, so we continue working on the next version of the system. You can think of further mapping the metadata, and improving the tools for extraction, conversion and uploading.”
Furthermore, the legal puzzle is yet to be finalized. Joel: “We need consensus in GDPR compliant workflows; what does a joint controllership mean, who is the controller, who is a dataprocessor, etc. We need a common understanding on these roles and what responsibilities come with them.”
Check out the team and main objectives of work package 3 here.