13 Dec High performance in data syndication with CatalogExpress
When it comes to the processes surrounding your data syndication, you should definitely take the issue of performance into consideration. There is nothing more annoying for the user than long waiting or reaction times within product data preparation. In turn, poor performance also means longer workflows and, as a result, lower performance for data recipients (retailers, customers, wholesalers, marketplaces). Lower visibility, worse listing or even wrong orders will be the consequence if data is not updated in time.
That’s why it’s both right and important to focus on the performance of your data syndication tool – preferably before the tool rollout.
In our latest article, you will learn how to identify if a data syndication solution is performing well. We also show you some facts and figures about the performance of our CatalogExpress.
What are the key factors for high performance in product data management?
First of all, it is difficult to make a general statement about performance in data syndication or product data management, because it depends very much on the complexity of your product data and the performance of your source systems (ERP, PIM) or target systems (marketplaces, customer portals, etc.), or even your Internet connection.
A few 1,000 products can logically be prepared more quickly than 80,000 products that are to be displayed in a complex BMEcat structure. If printable assets have to be submitted, the transfer itself will take longer.
Nevertheless, there are indicators that give you important clues about performance in data syndication:
Technology gives you initial information about computing power or resource consumption, or even downtime during updates. Only with a cloud-capable architecture scaling is truly possible. In that case, you will be able to handle several million data sets.
It is also important that the data syndication process runs smoothly and that local systems (workstation, server, source system) are not “paralyzed” by it.
A very important key figure is availability. After all, you certainly don’t want to invest in a solution that constantly crashes or is even unavailable for several hours just when you want to update data.
Duration of data deployment (end-to-end)
A complete data deployment process involves various process phases. Typical steps include:
- Data import from source systems
- Data transformation
- Data visualization
- Asset retrieval
- Asset transformation
- Generating exports or product feeds
- Distribution of the data
As already explained at the beginning, it is difficult to standardize the duration of data provision due to different framework conditions. For example, how quickly are source systems able to deliver data?
In our experience, there are systems that do not support multiple parallel data queries or at least become very slow. Therefore, the only way to make statements about performance is to compare different data syndication solutions with similar or, even better, identical use cases and to measure performance independent of external influences.
Facts and figures about the performance of CatalogExpress in data syndication and BMEcat generation
You are now familiar with a few general facts that can be used to better evaluate the performance of data syndication solutions. Now you are probably asking yourself, what can CatalogExpress actually do? Here are some exciting insights into the world of numbers and performance data from our CatalogExpress product data generator.
With its cloud-ready system, CatalogExpress offers you high performance and saves your computing power. This is because the required cloud resources are only used when they are needed. While your electronic catalog is being created in the background, you can work on the next catalog or do other work on your computer without any problems. In short: CatalogExpress does not “paralyze” your PC, and CatalogExpress also works highly efficiently on the server side without taking up too much technical resources.
CatalogExpress was unavailable for less than 5 minutes last year. That’s more than 99.99% availability.
Due to our CI/CD strategy, there is virtually no downtime for updates. Among other things, updates are only applied when there are no ongoing actions on the systems, so your users are totally unaware of any updates. In 2023, we will go one step further and eliminate even these “non-availabilities” with rolling updates via the Kubernetes cluster.
Duration of product data provisioning – speed is no magic.
The duration of data provisioning, i.e. catalog creation, will be demonstrated by two use cases from practice.
Scenario 1: How long does it take to generate a BMEcat from over 1 million products?
In concrete terms, this is about generating 1,074,595 products from different XML files as BMEcat 2005 with ECLASS classification and text features at the push of a button and without parallel processes. This use case is one of the large catalogs generated with CatalogExpress. Here are the performance data in detail once extrapolated:
- 364 products are processed in 1 second
- 21,820 products are generated in 1 minute
- Total time for BMEcat creation = 49:15 minutes
The duration of the complete BMEcat generation in CatalogExpress takes 49:15 minutes. Here, the duration of the end-to-end process is meant. This means the process from loading the source data (here, for example, several XML sources, approx. 12 minutes) to transforming the data into a thoroughly deep BMEcat structure (approx. 27 minutes) and deploying the data via a distribution (approx. 10 minutes).
Of course, these mere numbers are difficult to interpret without comparative values. Nevertheless, perhaps a comparison of a Word document will help you to better understand the impressive performance of CatalogExpress:
The fully generated BMEcat from this scenario takes up over 5 GB. All of this is text in XML format. For comparison: 200 pages of pure text in a Word document make up about 1 MB (source: https://www.gutefrage.net/frage/wie-viel-text-auf-1gb-). Extrapolated to 5 GB, this corresponds to 1 million pages of text in Word. If you keep that in mind, that’s a lot of data.
Scenario 2 Creation of a product data catalog of 300,000 product data within 15 minutes
Our CatalogExpress went into the competition for a tender in the field of data syndication. The objective was to read out 4 catalogs with 90,000 products each, transform them and make them available again within 15 minutes. CatalogExpress managed to do this in under 20 minutes and thus achieved the best performance of all participating providers.
Fast BMEcat creation and high performance data syndication with CatalogExpress
In fact, with CatalogExpress you will be able to deal with much larger amounts of data than described in our scenarios. We paid a lot of attention to this while designing the architecture.
With the Guru variant of CatalogExpress, you also have the option of increasing your performance even further. With the help of parallel processes, you can create a wide variety of electronic product catalogs simultaneously. Your data syndication processes and the speed of BMEcat generation are further accelerated.
For those who want even more: We can make further individual optimizations. Contact us and arrange a non-binding consultation.