Skip to content

Releases: digitalmethodsinitiative/4cat

v1.54 Database status hotfix

21 Apr 08:33

Choose a tag to compare

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

While deploying the previous 4CAT release (1.53) an issue surfaced where new 4CAT installs did not set up the database correctly leading to failures and crashes when performing certain dataset-related actions.

This only affects new 4CAT installs; older installs updated to 1.53 are not affected by this issue. Upgrading is safe and recommended in either case.

  • Update default database definition to include new status_type column (see #589) (2f1c28f).

Full Changelog: v1.53...v1.54

v1.53 4CAT in bloom

16 Apr 15:17

Choose a tag to compare

This 4CAT updates mostly comprises updates and bug fixes for processors and data sources, as well as improvements to the Web UI (particularly the Explorer, which can now display downloaded dataset media). The LLM-based processors can now also handle media as part of prompts and have been updated to allow using the latest third-party models.

⚠️ Docker users are recommended to rebuild their containers to benefit from some of the speed boosts implemented in previous updates as well as to use the latest available versions of its dependencies (without which some processors, particularly those processing videos, will fail more readily). This will become mandatory in a future release. For now rebuilding is optional and 4CAT will otherwise function normally, but sometimes slower and less effectively than it could be.

⚠️ Please also follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

New and expanded processors and data sources, and other features

  • Add an AGENTS.md file to the repository to facilitate the use of 4CAT by AI agents (b5bf1c4)
  • Add an ‘Upload annotations’ processor to allow adding annotations to an existing dataset via CSV upload or text input (#578)
  • 4CAT extensions are now enabled automatically after installation. Note that data sources still need to be enabled separately. (d8a5a48)

New processor features and other processor updates

  • Update the ‘Import CSV file’ data source to try harder to detect the CSV dialect used (26da7d6)
  • Update Zeeschuimer-based data sources to uniformly add a collected_from_url field to datasets containing the URL the data was collected from (82289fa)
  • Update various processors to now (not) share a queue with other related processors (e.g. all workers using the GPU) (#567)
  • Update the ‘Download videos’ processor to optionally skip URLs that are not videos (9737fc3)
  • Update 4CAT’s proxy handler to fall back to non-proxied requests if all proxies seem to be failing (4178f3f, 97be9c5)
  • Update the ‘Count items’ processor to allow users to choose which column to read timestamps from (eaf8706)
  • Update the ‘Download videos’ processor to use proxies and parallel requests, if configured (#551)
  • Update the ‘Count values’ processor to count the top 25 items by default (up from 15) (ce857bd)
  • Update the ‘Count values’ processor to have an option to negate the provided item filter, and give more fine-grained control over URL extraction (fd392c6)
  • Update the ‘Count values’ processor to include the column being counted in the result dataset’s label (12996c0)
  • Update the ‘Co-link network’ processor to stop and warn the user when not enough memory is available to run the analysis (9c4fe8e)
  • Update the ‘Custom network’ processor to include the names of the networked columns in the result dataset’s label (834f14d)
  • Update the ‘Pinterest’ data source to correctly parse imported items again (4960a5a)
  • Update the ‘Instagram’ data source to correctly parse imported items again (f5389b7, 8b5ebd2)

Processor bug fixes

  • Fix an issue where a processor would not be able to finish if created annotations could not be saved (e6618f7)
  • Fix an issue with the ‘PromptCompass’ processor where it would crash if the list of available models was empty (0cff586)
  • Fix an issue with the ‘Import media files’ data source where zip files would not be properly recognised when running 4CAT on Windows (28c8306)
  • Fix an issue with the Zeeschuimer based ‘TikTok’ data source where files would not be imported if items do not contain a source URL (17db903)
  • Fix an issue with the ‘Text from image’ processor where it would in some cases not assign a valid ID to result items (e11048a)

GenAI-related features and fixes

  • Update the ‘LLM Prompter’ processor to warn the user when providing invalid input or settings (4888734)
  • Update the list of third-party models available in the LLM Prompter to support models released since the previous 4CAT release (e82bc9b)
  • Update the LLM Prompter to allow passing download media files as part of the prompt (for audio and video, only when using third-party models) (#580 🤖)
  • Update the ‘Whisper’ processor to have a setting to use the CPU for running Whisper (e8acb21)
  • Update the ‘Whisper’ processor to have new options for diarization, as well as more robust error handling (2895bc2)
  • Fix an issue with the ‘LLM Prompter’ processor and related ones where the configured system prompt would not be properly used (4f83ea7)

New Web UI features and other general 4CAT updates

  • Update 4CAT to more clearly signal to users that a dataset has finished, but with issues (instead of only showing ‘failure’ or ‘success’ as status options) (#570)
  • Update the Explorer to display media in-line (instead of only a link), if images/videos have been downloaded for a dataset (#573)
  • Update the ‘migrate.py’ migration script with a --no-pip command-line argument to skip running PIP as part of migration (b4f29c7)
  • Update the flag icons used in the interface (1f68f58)
  • Update the ‘API Access’ page in the interface to allow for renewing expired tokens (6acf76f)
  • Update the 4CAT item expiration worker to optionally send users an e-mail warning 7 days prior to dataset expiration (0784f14)
  • Update the 4CAT notifications fetcher to retry fetching notifications instead of immediately logging a warning (52a5c52)

Web UI and general 4CAT bug fixes

  • Fix an issue where changing global 4CAT settings would not actually change the setting globally (#577 🤖)
  • Fix an issue where processor options where columns could be configured would not always show the correct list of columns (#582 🤖)
  • Fix an issue where dataset status messages in the web UI would not correctly escape HTML (145958e)
  • Fix an issue where the ‘preview’ button or ‘done’ icon for datasets would be shown even for filter datasets that had no content of themselves (d84f83a, b9f2a8d)
  • Fix an issue where a ‘Preset’ processor would not correctly be shown as currently running in some cases (8e7aa03)
  • Fix an issue where the ‘Delete’ button for a processor would reload the page on clicking or not ask the user to confirm before deleting a dataset (cf71e9f, 8361827)

Removals and deprecations

  • Removed the ‘Download 4chan images’ processor (ac0fdde)

Full Changelog: v1.52...v1.53

v1.52 Winter Wonderland

19 Dec 11:14

Choose a tag to compare

This 4CAT updates mostly comprises bug fixes for processors and data sources, as well as a couple of new processors for statistical analysis of datasets and an implementation of PromptCompass as a 4CAT processor. The update also adds support for calling the Deepseek and Gemini 3 APIs from LLM-based processors.

⚠️ Docker users are recommended to rebuild their containers to benefit from some of the speed boosts implemented in previous updates as well as to use the latest available versions of its dependencies (without which some processors, particularly those processing videos, will fail more readily). This will become mandatory in a future release. For now rebuilding is optional and 4CAT will otherwise function normally, but sometimes slower and less effectively than it could be.

⚠️ Please also follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

New and expanded processors and data sources

  • New ‘Regression evaluation’ processor to calculate regression metrics between two numerical columns in a dataset (84a56dd)
  • New ‘Descriptive statistics’ processor to calculate various descriptive statistics (mean, median, std dev, etc.) for numerical columns (01ed21c)
  • New ‘PromptCompass: Test task-specific prompts’ processor that allows choosing from a pre-defined list of prompts from other LLM-based work to annotate the datasets. Implementation of the standalone tool PromptCompass by Erik Borra (#562)
  • Update various network processors to allow disabling the automated community detection; this is now always disabled if the network contains 50,000 or more edges (b3864d9)

Other new features

  • New ‘Statistics’ processor category containing processors exclusively focused on calculating statistics from existing columns (25c518a)
  • Update the background workers that deletes expired datasets to be more efficient (91f79fd)
  • Update the ‘Top Images’ processor to optionally save the top images as annotations (3ff0d8f)
  • Update the ‘Confusion matrix’ processor to halt processing when more than 500 categories are found in the parent dataset (916394c)
  • 4CAT will now periodically log information about its running workers and threads, including a call stack and process ID, when run with --log-level=DEBUG (74968b5)

LLM-related features and fixes

  • Update the ‘LLM Prompter’ processor to allow image analysis with LLM APIs, by sending image URLs as prompts (a3ee966)
  • Update the local LLM API cache and add Deepseek and Gemini 3 and as options for processors that can talk to external LLM APIs (f52e180)
  • Add initial support for vLLM as a local LLM provider (a3ee966)

Fixes to processors

  • Fix an issue with the ‘Import 4CAT dataset’ data source where it would crash if certain metadata was missing from the uploaded dataset (7cabbf5)
  • Fix an issue with the BlueSky data source where it could crash if no query was provided (4ee74b6)
  • Fix an issue with the Instagram data source where items would not be parsed if their ‘owner’ was not the same as their ‘author’ (c563911)
  • Fix an issue with the RedNote/Xiaohongshu data source where items could incorrectly be reported to be missing a timestamp (f9e455b, #557)
  • Fix an issue with the ‘View media metadata’ processor where it would crash if certain metadata was missing (29041e2)
  • Fix an issue with the ‘Toxicity scores’ processor where it would keep processing the data even if the API returned an error (de2184f)
  • Fix an issue with the ‘Classification evaluation’ processor where it could crash if a label was not a string (17900b9)
  • Fix an issue with the ‘Audio to text’ processor where it could crash if the API returned an unexpected response (6fe2a2d)
  • Fix an issue with the ‘Audio to text’ processor where it would not process data if the dataset contained only a single file (dba049a)
  • Fix an issue with the ‘URL co-occurence network’ where it could crash if the source dataset did not contain a ‘thread_id’ column (ddb38b8)
  • Fix an issue with the ‘Hash images’ processor where it could crash if the dataset contained non-image files (9a2fb82)

Other fixes

  • Fix an issue with the Explorer where it would not display the correct post texts for Telegram datasets (4ca46b6)
  • Fix an issue with datasets containing annotations where a crash could occur when annotated item IDs where not a string (d8d5108)
  • Fix an issue with 4CAT’s proxy manager where requests could get ‘stuck’ in limbo when the processor that made them crashed or was interrupted (b7378f6)
  • Fix an issue with processors fetching URLs via 4CAT’s proxy manager where it could crash if a request did not complete successfully (6374eb8)
  • Fix an issue where memcached connections would not get cleaned up properly when using memcached and keeping 4CAT running for long periods of time (#546, #547)
  • Fix an issue where annotations of items that were filtered out would be copied too when copying filtered datasets with annotations (#545)
  • Fix an issue where interrupting processors calling external commands (such as video processors calling ffmpeg) would not terminate the called commands properly (#559)

Docker-related changes

  • The first time a 4CAT Docker container is run, the logic for notifying the user about 4CAT’s URL and other useful information is now more robust (b8f9b14)
  • The 4CAT front-end now no longer uses ‘4cat.local:5000’ as a default domain name, but uses ‘localhost’ instead (4c187cb)

Full Changelog: v1.51...v1.52

v1.51 The R Is in the month

03 Nov 15:58
e090d33

Choose a tag to compare

This updates adds several new processors, to help working with image datasets and LLM-based annotations; many bug fixes and small QoL updates to processors; various bug fixes and updates to mapping code for datasets imported via Zeeschuimer; and other assorted upgrades and tweaks.

⚠️ Docker users are recommended to rebuild their containers to benefit from some of the speed boosts implemented in this update as well as to use the latest available versions of its dependencies (without which some processors, particularly those processing videos, will fail more readily). This will become mandatory in a future release. For now rebuilding is optional and 4CAT will otherwise function normally, but sometimes slower and less effectively than it could be.

⚠️ Please also follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

New and expanded processors and data sources

  • New ‘Hash images’ processor that can calculate hashes (of multiple types, with some resistant to cropping or resizing) of images, to aid the detection of duplicates (d71ea83)
  • New ‘Group similar hashes’ processor that can group images by their hash, to aid the detection of duplicates (d71ea83)
  • New ‘Classification evaluation’ processor to calculate various accuracy-related statistics for LLM-annotated datasets (ca1752a)
  • New ‘Convert items to annotations’ processor to make annotations from the original dataset items’ attributes (129c4cf)
  • The ‘Telegram’ data source can now also use Telegram’s global hashtag search feature; simply add a hashtag as a query when creating a dataset. (0b87c3b)
  • Update the various Network processors to automatically annotate nodes with their community according to networks Louvain and Greedy Modularity algorithms (716f47e)
  • Update the various Networks processors to force community IDs to be strings, for compatibility with other network tools (a489117)
  • Update the ‘Video Wall’ processor to provide clearer error logs when encountering an issue with ffmpeg (66c2306)
  • Update the ‘Monthly Histogram’ processor to use a more relevant default title if none is given by the user (17834ad)
  • Update the ‘Download videos’ processor to have an extra option (configurable via the Control panel) to only allow indirect download for YouTube links (a8fd481)
  • Update the ‘Rank flow’ processor to have an option to only draw connections between adjacent periods (7251207)
  • Update the Douyin data source to include a video thumbnail URL and a more relevant video URL in its output (cfaa113)
  • Update the Xiaohongshu data source to include a column with hashtags in its output (465a13d)
  • Update the ‘TikTok (via URLs)’ data source to only download each video once, even if in its input multiple times (7500718)

Other new features

  • 4CAT now periodically checks for notifications with information about upgrades or security issues. These are loaded from the 4CAT developers’ server, which can be disabled or changed in the 4CAT settings (#520)
  • Working with datasets that have been annotated by processors or via the Explorer is now faster (#525)
  • New datasets for some processors (such as Filter processors) are now labeled more descriptively on the Dataset page (12f81d8)
  • Update login dialog to ignore the case of usernames and leading or trailing whitespace of usernames and passwords (c472c98, 2ed2436)
  • Update the preview of CSV files to properly show new lines in items (ddcf337)
  • Update the preview of CSV files to have a ‘sticky’ header row while scrolling through it (be74158)
  • Update the preview of GEXF files to contain a link to open the network in Retina (ce16526)
  • Update the preview of GEXF files to use more sensible default values for edge colour, node size, and label size (51172a5)
  • Update Explorer to not show internal dataset parameters when viewing a dataset (5b8dcfd)
  • Update logger to output debug-level logs to stdout when running 4CAT interactively; move some high-volume log messages to a new DEBUG2 log level (b60b18a)
  • Update logger to send Slack alerts with more relevant information and clickable traceback stack elements (119693b)
  • Update logger to use the same default log locations regardless of whether 4CAT is running in Docker or not (566a48f)
  • Update front-end to serve favicon.ico properly in non-standard hosting setups (49cd8cd)

LLM-related features and fixes

  • GPT-5 models are now available for selection in the ‘LLM Prompting’ processor (e83a971)
  • The Gemma 3 family of models is no longer available for selection in the ‘LLM Prompting’ processor (7dc6ca2)
  • New ‘Hide reasoning’ option for the ‘LLM Prompting’ processor to remove the reasoning preamble from responses by models that use it (f306903)
  • Update the ‘LLM Prompting’ processor to automatically include separate relevant annotations when output is structured as an array (fb31076)
  • Update the ‘LLM Prompting’ processors to allow setting a limit to the amount of data that is inserted into the prompt per item attribute (3c278c6)
  • Fix an issue with the ‘LLM Prompting’ processor where the default token limit was too low to get output for some reasoning-based models (51bfba7)
  • Fix an issue with the ‘LLM Prompting’ processor where items would be skipped erroneously (cbe5b33)
  • Fix an issue with the ‘LLM Prompting’ processor where structured output would be empty when not using batching (5caf379)
  • Fix an issue with the ‘LLM Prompting’ processor where it would refuse to annotate when input value was empty (2182ad4)
  • Fix an issue with the ‘LLM Prompting’ processor where the ‘input_values’ column in the mapped result file would not contain the right values if batching was not used (6743fc3)
  • Fix a crash in the ‘LLM Prompting’ processor when including the same column in the prompt multiple times, or when encountering an empty batch (e7d5622)

Extensions

  • Extensions are now located in the “config” folder in the 4CAT root for Docker-related reasons; if you already have extensions installed elsewhere, they will be moved (52609ea)
  • Update extension settings so that they can no longer be disabled/enabled on a per-tag level, but only globally (7bdfda6)
  • Fix an issue where admin users would not always be able to manage extensions (539ddd9)
  • Fix an issue where internal Python files and folders starting with ‘__’ would erroneously be recognized as extensions (7ea726a)

Fixes to processors

  • Fix a crash in the ‘Image’/‘Video Wall’ processors when making a collage with 0 items (d44fe64)
  • Fix a crash in the ‘Image’/‘Video Wall’ processors when the dimensions could not be determined for any of the items (cc4afea)
  • Fix a crash in the ‘Image’/‘Video Wall’ processors when a file is determined to have a width or height of 0 (2d2978d)
  • Fix an issue with the ‘Image Wall’ processor where the “Dominant K-means” sort mode would be ignored (a4f7990)
  • Fix an issue with the ‘Video Wall’ processor where it would not be able to get a proper result from ffmpeg if the source videos used non-standard pixel formats (432bf50)
  • Fix a crash in the ‘Download images’ processor when encountering exotic characters in the file name (b932cac)
  • Fix an issue with the ‘Download images’ processor where it would not correctly handle “0” as the value for the amount of images to download (b3aa7f2)
  • Fix an issue with the ‘Download Images’ processor where it could occasionally crash the front-end due to other image downloading processors using the same class ID (19db0b5)
  • Fix an issue with the ‘Download Images’ processor where the included metadata file would not refer to the correct filename in some cases (2296915)
  • Fix an issue with the ‘Download TikTok videos’ processor when it would repeatedly fail to download a video (624dcab)
  • Fix an issue with the ‘Export dataset’ processor where it could in some ...
Read more

v1.50 We're all going on a summer holiday

18 Jul 11:36
4e02902

Choose a tag to compare

This updates comprises various under-the-hood changes to make 4CAT faster and more robust. It also includes more features for annotating datasets with LLMs, processors, or manual input; easier extension management via the web interface; a number of new and updates processors; several new features for its web interface; and various bug fixes and updates to mapping code for datasets imported via Zeeschuimer.

⚠️ Docker users will need to rebuild their containers to benefit from some of the speed boosts implemented in this update. This may become required in a future release. For now rebuilding is optional and 4CAT will otherwise function normally (but sometimes slower than it could be).

⚠️ Please also follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

New processors

  • New processor (Video Wall) to make video collages of video datasets (5af02b3)
  • New processor (LLM Prompting) that replaces the 'OpenAI prompting' processor and can interface with various LLM providers for LLM annotation of datasets, and allows for batched prompts and requesting structured output (#509, #515)
  • New processor (Confusion matrix) to generate a confusion matrix of the values of two columns of a dataset (e7a9a8e)
  • New processor (Replace text) to replace text within a dataset, resulting in a new dataset with the changed values (#512)

Updated processors

  • Updated various processors to now have an option to save their results as annotations, meaning they will be available as an additional column on CSV exports of the parent dataset and can be edited manually in the Explorer (#507)
  • Updated filter processors to copy the original dataset's annotations when making a new filtered dataset (#512)
  • Updated the 'Download images' and 'Fetch URL metadata' processors to more intelligently utilise HTTP proxies. Requests are now limited per host name even if multiple processors run at the same time. Proxy settings can be configured in the 4CAT Control Panel (#487)
  • Updated the 'Download images' processor to be available for more types of datasets (580e512)
  • Updated the English (Infochimps) and Dutch (Onzetaal) word lists used by, among others, the 'Tokenisation' processor (8886507, e44f676)
  • Updated the 'Image wall' processor to now also run directly on video datasets and offer more options for sorting and sizing (#508)

Web interface

  • More control over extensions, including the ability to install and uninstall them via the web interface and enable and disable them while installed (#463)
  • A new 'Jobs' page in the Control Panel shows an overview of workers running in the backend and allows admins to stop specific workers (#501)
  • 4CAT can now use memcached to speed up the loading of pages in the web interface (#393, #492, 35a0242)
  • Optimisations for dataset pages to make loading faster for datasets on which a lot of processors have been run (35a0242, c6caa0c, 05fe139)
  • Dataset parameters on dataset pages can now be clicked to copy their value to the clipboard (cf2dd84)
  • The Control Panel's 'Logs' page now also shows the web interface's own log, if it is in a separate file (c0d0091)
  • Added various new endpoints to 4CAT's API for automated use (ee3d2d0)

Bug fixes

Data sources and mapping

  • Fixed an issue with the Telegram data source where spaces before or after the API key would mean the API key would not be recognised (c3c57a8)
  • Fixed an issue with the Instagram data source where mapped items would not always have the same columns (da4e29d)
  • Fixed an issue with the X/Twitter data source where imported items with quoted posts that could not be loaded would not map properly (dfab94e)
  • Fixed an issue with the Telegram data source where it would not map an item if the Markdown version of its contents was not included (580f43e)

Processors

  • Fixed an issue with the 'Download images' processor where a limit on the amount of images that could be downloaded was not parsed properly (860625c)
  • Fixed an issue with network processors which would make it crash when trying to create a time-slice network from a dataset with invalid or empty timestamps (09aa847)
  • Fixed an issue with the 'Word Tree' processor where it would not properly include all selected columns in its output (24c6391)
  • Fixed an issue with the 'Datasource metrics' worker where it could crash if files it was scanning were deleted during the worker's operation (4644220)
  • Fixed an issue where 4CAT could crash when workers would automatically add jobs with bad code (#511)

Web interface

  • Fixed an issue with the 'Logs' page in the Control Panel where logs would not properly load if they contained certain UTF-8 sequences (694c0df)
  • Fixed an issue where the 'Preview' button would sometimes be visible for unfinished or empty datasets on dataset pages (4a888a0)
  • Fixed an issue where dataset pages could take extremely long to load for very large datasets that could not be properly mapped with map_item() (#393)

Operation

  • Fixed an issue with the 'create_user.py' helper script that would make it not run properly (f671374)
  • Fix an issue where the front-end would sometimes not properly load user-specific settings (#473, #455, #503)
  • Fixed an issue where 4CAT would become unresponsive after a failed database query. Queries are now retried upon failure to allow for short database outages without crashing 4CAT (#466)
  • Fixed an issue when running 4CAT outside of Docker on Windows where a POSIX-only dependency could not be loaded (1719d45, 9cec441)

Full Changelog: v1.49...v1.50

v1.49 A smörgåsbord of fixes and updates

04 Jun 14:34

Choose a tag to compare

What's Changed

This updates comprises a revamp of the Explorer to make, in particular, the annotation of data via its interface easier and more efficient and facilitate the use of LLMs to create them.

There are also many fixes and updates to item mapping code for datasets imported via Zeeschuimer, as well as a number of other fixes.

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

Explorer

  • The explorer has been rebuilt to allow for more flexible annotation and in particular allow for annotation with the help of large language models, either via the DMI Service Manager, a locally running LM Studio, or the OpenAI API (#428)
  • Sorting items in the Explorer is now more flexible for datasets that support it (i.e. aren't too large for sorting)
  • More data sources now have Explorer interfaces reflecting what items would have looked like in their original context
  • Annotations are now saved separately from the original data files, and added on-the-fly to files and interfaces where necessary, so that they can be manipulated more easily

Updates to interface, data sources, and processors

  • Update Threads data source to use and refer to threads.com as the main domain instead of threads.net (5ae076d)
  • Update Threads data source to properly extract image URLs again (0a8e2cf)
  • Update TikTok data source to put the 'stickers' column right after the 'body' column in CSV files (8347cac)
  • Update Bluesky data source to properly manage login sessions while collecting data (591b170, ecb2147)
  • Update Douyin data source to properly handle a number of fields in the source data (dedac8b)
  • Update the Twitter data source to cope with changes to the Twitter object structure (e2a9511, da02d60, ad69ffd)
  • Update 4chan scrapers to cope with Janitor-removed posts once more (a5eb1ac)
  • Update the PixPlot interface to organise information more usefully when inspecting specific images (ec93e38)
  • Update the OCR processor to more flexibly handle timeouts when doing image-to-text for large datasets (ecfa522)
  • Update the 'CSV import' data source to accept CSV files containing items with no timestamp or an empty timestamp (6320f30, a60032c)
  • Update the 'about this service' field on the 4CAT home page to allow for Markdown formatting (2dc73ac)

Bug fixes

  • Fix an issue with the reading of the restart log file when running on Windows (4bdf59f)
  • Fix an issue with Twitter data mapping when reading hashtags from TCAT-sourced datasets (19d2446)
  • Fix an issue with Twitter data mapping where it would crash on posts without an avatar URL (could occur for users with an NFT avatar) (f4995fd)
  • Fix an issue with Twitter data import where it would crash when encountering tweets with a missing timestamp (dcbcd84)
  • Fix an issue where TCAT data imports could crash when encountering an erroneous item; these are now skipped and logged (ed6b5a8)
  • Fix an issue with the TikTok (URLs) data source where it could run with an empty list of URLs (6948fe0)
  • Fix a crash in the /api/get-standalone-processors/ endpoint (2b0d750)
  • Fix a front-end issues where the page could not be rendered when notifications had a specific format of expiration time (e225d55)
  • Fix a front-end issue when creating a dataset imported from TCAT without selecting a query bin (b5c7658)
  • Fix a front-end issue when trying to load a result page containing datasets with invalid references to parent datasets

Deployment

  • Change Docker default configuration to bind front-end to localhost only rather than listening on the public interface (6a90ac0)

Full Changelog: v1.48...v1.49

v1.48 Spring cleaning 2025

18 Apr 11:25

Choose a tag to compare

What's Changed

This small update focuses on bug fixes that could not made it for the previous release, as well as some fixes for issues introduced in v1.47. There are also a few smaller feature updates.

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

New data sources and processors

  • Add a data source for RedNote comments, to be imported with Zeeschuimer (352668b)
  • Add a processor to convert GEXF files to CSV files (c6a0326)
  • Add an option to the control panel to configure a rate limit for account requests (15e27a6)

Updates to interface, data sources, and processors

  • Updated the Bluesky data source to hide passwords when entered in the 'Create dataset' form (05d13f4)
  • Updated the 'Histogram' processor to allow plotting over other intervals than months (c0ab728)
  • Updated the web interface's logging code to also log HTTP errors (638b552)
  • Updated internal logic to speed up dataset page loading (#484)
  • Update the co-tag network processor to recognise more columns as containing tags (1cf0120)
  • Update the 'Top hashtags' processor to allow choosing which column to read tags from (7a8c446)
  • Updated the 'Download TikTok videos' processor to be compatible with user-uploaded datasets (db302b9)
  • Update the 'Video hasher' processor with an option to generate a CSV file with results (9bbf304)
  • Updated the data source information page with an example of what a dataset for that data source might look like, if available (#485)

Bug fixes

  • Fix an issue with processor presets where datasets would not always be assigned to the correct owner (47b8c67)
  • Fix an issue with the RedNote, Threads, and Pinterest data sources that made datasets lack a 'thread_id' column (536a7e1, 1af775d, 2eccc09)
  • Fix an issue with the Bluesky data source when using it to create pseudonymised datasets (7e1f7fb)
  • Fix an issue with the RedNote data source when parsing items with no 'video media' (cb74df6)
  • Fix an issue with the RedNote data source when parsing items with nested images (7997d39)
  • Fix an issue with the LinkedIn data source when parsing items with no author description (adb174a)
  • Fix an issue with the Threads data source where it would not always recognise embedded links (e4361cb)
  • Fix an issue with the Truth.social data source where it would not map items correctly (723719d)
  • Fix an issue with the PixPlot processor that would make it crash when items had no associated timestamp (207e361)
  • Fix an issue with the Image Wall with Labels processor that would make it crash when encountering invalid images (3282bae)
  • Fix an issue with the 'Nuke dataset' button that made it not work (2510cbd)
  • Fix an issue with the web interface where occasionally an 'undefined' error would show when trying to start a processor (5bab21d)
  • Fix an issue where Zeeschuimer-based data sources would not show up in the filter on the datasets overview page (a727a19)
  • Fix an issue with the web interface where the page would not load correctly after the first run when trying to 'phone home' (dd5eb01)
  • Fix an issue with the restart log in the control panel where it would emit some log messages twice (8e02449)
  • Fix an issue with the control panel where it would in some situations not display the correct checked out git branch (30af6a9)

Full Changelog: v1.47...v1.48

v1.47 Spring is in the air, 2025 edition

18 Feb 16:39

Choose a tag to compare

This update introduces three new data sources; two for data imported via Zeeschuimer, from Pinterest and RedNote/Xiaohongshu. A third new data source allow for direct data capture from Bluesky, if provided with a Bluesky login.

There are also several new processors, focused on image analysis as well as using LLMs. In the latter category, a processor that allows one to prompt the OpenAI API for text generation based on a dataset's content allows for, for example, LLM-based coding or categorisation of a dataset collected with 4CAT.

Additionally, this release includes many bug fixes to processors, data sources and the 4CAT web interface.

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

New data sources and processors

  • Added data source: Bluesky, allowing for the capture of Bluesky posts for a given query - requires a Bluesky login (115a3c1)
  • Added data source: Pinterest, for importing data collected from the Pinterest website with Zeeschuimer (#478)
  • Added data source: Xiaohongshu/RedNote, for importing data collected from the RedNote website with Zeeschuimer (34b8409)
  • Added processor: Deduplicate images, filtering an image dataset for duplicates using a range of comparison methods (aad7d57)
  • Added processor: Bipartite image-item network, which can be used with e.g. Gephi's "Image Preview" plugin to create visual networks (f98addc)
  • Added processor: Vectorise by category, allowing for vectors of tokens grouped by some column from the parent dataset (aeb01f7)
  • Added processor: OpenAI prompting, to interface with the OpenAI API and generate text based on the combination of the prompt and a value from the parent dataset for each item. Requires an OpenAI API key. (c405213)

Updates to interface, data sources, and processors

  • Updated the 'import CSV' data source to better handle files of which the CSV dialect cannot be detected automatically (46b2805)
  • Updated the 'Media upload' data source to warn a user when trying to upload SVG files (which most processors will not handle) (d119225, 2987fd4)
  • Updated TikTok dataset import to include is_sensitive and is_photosensitive columns in the CSV mapping (2f42113)
  • Updated TikTok image downloader processor to allow downloading author avatars (6881cba)
  • Updated co-tag network processor to allow ignoring certain tags (e53b73f)
  • Updated the video downloader processor to better handle download errors and rate limits (d1d9347, d4c43a7)
  • Updated the 'Count values' and 'Thread metadata' processors to better report their progress while processing large datasets (59a1546)
  • Update the 4CAT back-end to log a message when a dataset cannot be deleted due to file permission issues (638413a)
  • Update the Instagram imported data source to no longer consider a lack of geo-tags 'missing data' (3c62f37)
  • Update the Instagram imported data source with a new column 'likes_hidden' that indicates if the amount of likes are hidden by the post author; the 'num_likes' column will be empty if this is the case (79cb297)
  • Update the 'Image wall' processor to use the 'fit height' sizing option by default, instead of 'square' (a43c9aa)
  • Update the dataset status message after importing data from Zeeschuimer to provide clearer information about data fields missing from the imported file (711c8b4)
  • Update the front-end to hide some processors from the list for a dataset if they are technically compatible but do not make sense to run in the given context (#472)
  • Update datasets to keep track of when they finish being created; existing datasets take their 'finished at' date from the dataset log's last update (#462)
  • Update the default 4CAT configuration to enable new data sources (4376b33)
  • Update Twitter-related processors and data sources to reflect the platform's name change to X (1871019)
  • Update Bluesky widget on 'About' page to show smaller link previews (da8328e)
  • Update interface footer to only show the 4CAT version when logged in (0792ef4)
  • Update the look of CSV preview of datasets to be more readable and indicate missing data (cb2ef69, 8da18b3, dd2ab72)
  • Update the dataset overview page to now show empty/unfinished datasets by default (8261b25)
  • Update the list of available processors when creating a follow-up dataset to always show the processor description (68db315)

Docker-related changes

  • Updated the Docker version of 4CAT to use Python 3.11 (2600e55)

Removals and deprecations

  • Removed the 'FAQ' page from the web interface (d5c873a)
  • Removed the 'Convert to Excel-compatible CSV file' processor - use Excel's CSV import wizard instead (6367500)

Bug fixes

  • Fix a crash when importing NDJSON files with invalid entries; 4CAT will now skip the item and warn about it instead (6aa7177)
  • Fix an issue where data sources that could be imported via Zeeschuimer would show as available even when disabled (e09e875)
  • Fix an issue with the Telegram image downloader processor that would stall when hitting a rate limit (5d5a0e3)
  • Fix an issue with the Telegram image downloader where it would crash on a 'bad request' response (3df74c9)
  • Fix an issue with the Telegram image downloader where it could end up in an infinite loop when encountering a deleted image (ac543cc)
  • Fix an issue with image downloader processors where they would attempt to download all available images if that was set to be allowed, even when asking for fewer (99e8fd0, 0638ec2)
  • Fix an issue with the TikTok image downloader processor where it would crash when encountering unexpected errors (b60e8cf)
  • Fix an issue where 4CAT log messages would be logged twice in some cases (ded8d3d)
  • Fix an issue with the video scene detection processor where it would crash when a video in the parent dataset had not been downloaded (9453b76)
  • Fix an issue with the video frame extraction processor where it would crash when no frames could be extracted (176905a)
  • Fix an issue in the TikTok comments data source where comments without information on whether they had been pinned would be skipped (bfe3075)
  • Fix TikTok data import to properly map the post author thumbnail URL (8e660a4)
  • Fix an issue with the word trees processor where it would crash when trying to make word trees of numeric data (5021e85)
  • Fix an issue with the 'group by sentence' setting of the tokeniser where it would crash when choosing certain languages (3f06845)
  • Fix an issue in the video downloading processor where it would crash when the connection broke before the downloading finished (1765e80)
  • Fix an issue when trying to export unfinished or incomplete datasets (a296ff0)
  • Fix an issue when trying to work with tokenised datasets from older 4CAT versions or with duplicate source data (4906887)
  • Fix an issue where importing the same dataset into 4CAT twice would lead to strange side-effects (ffd5c46)
  • Fix an issue where the front-end interface would crash when trying to display datasets made with processors that were removed from 4CAT (817b4ee)
  • Fix the Generate images with BLIP2 processor to better handle images with no metadata (e.g. when uploaded via the 'Media upload' data source) (d69a0c3)
  • Fix an issue with the image classification processors to not crash when encountering SVG files but skip them instead (033b716, 4912ef4)
  • Fix an issue with the image categorisation processor where it would not properly skip empty categories (9465cc2)
  • Fix an issue with the Clas...
Read more

v1.46 Autumn Additions

14 Oct 10:51

Choose a tag to compare

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

  • Added support for extensions, modular additions to 4CAT that can be put in the /extensions/ folder in the 4CAT root (#451)
  • Added a processor to download 4CAT datasets as a Zip file, and updated the 'Import dataset' data source to allow loading these zip files as new datasets (#452)
  • Added a data source for Threads, to allow importing Threads data via Zeeschuimer (a68f5d6)
  • Added a processor for LLM-powered text coding via the DMI Service Manager (693960f)
  • Added an option to the Telegram data source to crawl based on mentions and links in addition to forwarded messages (8f2193c)
  • Added razdel as a tokeniser to the Tokenise processor for tokenising Russian text (0b74569)
  • Added an option to the 'Word trees' processor to allow selecting which column(s) to read text from (e4c0099)
  • Added more stopwords corpora to the Tokeniser and allow using multiple at the same time - by default the one for the chosen text language is used (b9a327a)
  • Added more 'auto-fill' options when importing CSV files (empty values, or the current date and time) (9bd9da5)
  • Added a warning to the 'Media upload' data source when trying to upload too many files at once (ffcb6a4, e4f982b, e304649)
  • Added more indicative dataset status updates when running DMI Service Manager-powered processors (eb76937)
  • Added support for previewing HTML datasets in the web interface (203314e)
  • Added configuration settings to toggle display of Anonymisation controls on the 'Create dataset' page (0945d8c)
  • Added configuration setting to toggle display of the 'you can install 4CAT yourself' message in the login form (cd356f7)
  • Added a feed of the official 4CAT BlueSky account to the 4CAT 'Home' page (3d94b66)
  • Added a delay to the worker that cleans up expired and orphaned datasets to wait 7 days before actually deleting an orphaned dataset (bfaf23b)
  • Fix a crash in the 'Image category wall' processor (ebf39d8)
  • Fix a crash in the 'Google Vision API' processor when running it on an empty dataset (fb09162)
  • Fix a crash in the 'Video hashes' processor when running it on a dataset with no .metadata.json file (d41fa34)
  • Fix a crash in the 'Download images' processor when trying to download images from a malformed URL (579ff64)
  • Fix a crash in the 'Download videos' processor when trying to extract video URLs from a non-text data field (e9b5232)
  • Fix a crash in the 'Hatebase' processor (4ba872b)
  • Fix a rare race condition when running 4CAT via Docker (#396)
  • Fix an issue in the front-end where an incomplete list of available processors was shown in some situations (4323946)
  • Fix an issue in the Telegram data source where it would indicate that the 'app' needs updating to log in (d2a787e, 346150b)
  • Fix an issue in the Telegram data source where crawl depth parameters would not be interpreted correctly (1c0bf5e, #444)
  • Fix an issue in the Telegram data source where some post attributes were not read correctly (2c8c860, 959710a, c67a046)
  • Fixed an issue where the link to a newly created dataset on the 'Create dataset' page would not always work (b542ded)
  • Fixed an issue where configuration tags with no associated users could get deleted (d6064be)
  • Fix an issue in the LinkedIn data source where image URLs would not always be parsed correctly (c27fbbe)
  • Fix an issue in the Douyin data source where stream URLs would not always be parsed correctly (d769be4)
  • Remove Spacy-powered text analysis processors (48c20c2)
  • Remove the Parler data source (ee7f434)
  • Update dependences (#450, a269f96, d2a787e)

Full Changelog: v1.45...v1.46

v1.45 Summer 2024 Special Edition

04 Jul 13:26

Choose a tag to compare

⚠️ Please follow these instructions for upgrading if you have trouble upgrading to the latest version of 4CAT. ⚠️

We also recommend reading the instructions above if you are running 4CAT via Docker and have in any way deviated from the default set-up, or see error messages in the log file when upgrading via the web interface.

Otherwise, you can upgrade 4CAT via the 'Restart or upgrade' button in the Control Panel. This release of 4CAT incorporates the following fixes and improvements:

  • Added a 'media upload' data source that allows uploading media for processing with various image/sound/video processor (#419)
  • Added a 'Visualise images with text captions' processor that generates an image wall including captions for each image (e7e636b)
  • Updated dependencies for video hash processor (aad94f3)
  • Updated 'Help' link in footer and the page it links to to give better information on how to get help with 4CAT (acf5de0)
  • Updated the in-page preview for datasets to more accurately make hyperlinks clickable (8d4f99b)
  • Updated the Telegram data source to optionally allow one to crawl channels (e8714b6)
  • Updated the 'Count values' processor with an option to differentiate between missing and blank values (f2145bd)
  • Updated the item mapping for X/Twitter data to include URLs for the author profile picture and banner in the CSV output (bcb9140)
  • Fixed a crash in the 'Download images' processor when setting the amount of images to download to 0 (e0c55a8)
  • Fixed an issue with upgrading a 4CAT running in a Docker container where pip could not properly run to update Python dependencies (2aaa972)
  • Fixed various bugs with the 'Visualise images by category' processor
  • Fixed a bug in the 4chan data import helper script when processing posts from threads of which the OP had been deleted (d67cf44)
  • Fixed a bug where the wrong worker would be used when converting Google Vision or Clarifai output to CSV (fd3ac23)
  • Fixed a bug in the tokeniser where it could crash when selecting 'other' as a language (f4f8e66)
  • Fixed a bug where a job for the orphaned file cleanup worker would not always be properly added to the queue (1b9965d)
  • Fixed a bug in the 'visualise images by category' processor where setting the max images to 0 would not properly remove the image limit (3580fc9)

Full Changelog: v1.44...v1.45