Plugin information
Version | 1.0.1 |
---|---|
Author | Dataiku (Niklas MUENNIGHOFF, Mehdi HAMOUMI) |
Released | 2021-08 |
Last updated | 2023-05 |
License | Apache Software License |
Source code | Github |
Reporting issues | Github |
This plugin lets you translate text to another language using the DeepL Engine.
Check DeepL’s pricing page for information about their pricing tiers.
How to set up
If you are a Dataiku admin user, follow these configuration steps right after you install the plugin. If you are not an admin, you can forward this to your admin and scroll down to the How to use section.
1. Create a DeepL account & get your DeepL API Key
In order to use the DeepL Translation API, you need to setup an account on their website.
Once your account is setup, DeepL will provide you with an API key. You will need this key to use DeepL in Dataiku DSS.
2. Create an API configuration preset – in Dataiku DSS
In Dataiku DSS, navigate to the Plugin page > Settings > API configuration and create your first preset.
3. Configure the preset – in Dataiku DSS
- Fill the AUTHENTICATION settings
- Copy-paste your DeepL API key in the corresponding field.
- Depending on your subscription select either the Free or Premium URL to which requests will be routed. Note that you cannot select the Free URL if you have a Premium subscription.
- (Optional) Review the PARALLELIZATION and ATTEMPTS settings
- The default Concurrency parameter means that 4 threads will call the API in parallel.
- We do not recommend changing this default parameter unless your server has a much higher number of CPU cores.
- The Maximum Attempts means that if an API request fails it will be reattempted (default 3 attempts).
- Regardless of whether the request fails because of e.g. an access error with your DeepL account or a throttling exception due to too many concurrent requests, it will be tried again.
- The Waiting Interval specifies how long to wait before retrying a failed attempt (default 5 seconds).
- In case of a throttling exception due to too many requests increasing the Waiting Interval may help, however, we recommend first decreasing the Concurrency setting.
- The default Concurrency parameter means that 4 threads will call the API in parallel.
- Set the Permissions of your preset
- You can declare yourself as the Owner of this preset and make it available to everybody, or to a specific group of users.
- Any user belonging to one of those groups on your Dataiku DSS instance will be able to see and use this preset.
Voilà! Your preset is ready to be used.
Configuring additional presets can be useful to segment plugin usage by user group. For instance, you can create a “Default” preset for everyone and a “High performance” one for your Marketing team, with separate billing for each team.
How to use
Let’s assume that you have installed this plugin and that you have a Dataiku DSS project with a dataset containing a column of text to translate.
Input
- Dataset with a text column to translate
DeepL Translation recipe
To create your first recipe, navigate to the Flow, click on the + RECIPE button and access the Natural Language Processing menu. If your dataset is selected, you can directly find the plugin in the right panel.
Settings
- Review INPUT parameters
- The Text column parameter is the column in the input dataset that you wish to translate.
- The Source language parameter is the original language of the Text column . If you would like the translation api to infer the original language, you can select the Auto-detect option.
- The Target language parameter is the language you would like to translate to.
- The Split Sentences parameter sets whether the translation engine should first split the input into sentences. This is enabled by default. For applications that send one sentence per row, it is advisable to set it to Splits: None in order to prevent the engine from splitting the sentence unintentionally.
- The Preserve Formatting parameter sets whether the translation engine should respect the formatting of the input text, even if it would usually correct some aspects.
- The Formality parameter can be used to increase or decrease how formal the output translation should be. It only shows up if formality is available for the chosen target language.
- Review CONFIGURATION parameters
- The Preset parameter is automatically filled by the default one made available by your Dataiku admin. You may select another one if multiple presets have been created.
- The Fail on error parameter lets you choose if the recipe should abort execution if any issues are raised. If unchecked, any errors will be logged in two additional columns in the output.
Output
- Dataset with text translated to another language
The columns of the output dataset are as follows:
- [Input dataset columns]
- All columns from the input dataset will be preserved
- [selected column]_language
- The detected language of the selected column
- Only present if Auto-detect has been selected as the source language
- [selected column]_{target iso code}
- The selected column in its translated version
- translation_api_response
- Raw API response in JSON form
- translation_api_error_message
- The error message in case an error occurred
- Only present if Fail on error is not selected during configuration
- translation_api_error_type
- The error type in case an error occurred
- Only present if Fail on error is not selected during configuration
Happy natural language processing!