diff --git a/api-reference/comments-reply.mdx b/api-reference/comments-reply.mdx index d6321d7..37b5f5a 100644 --- a/api-reference/comments-reply.mdx +++ b/api-reference/comments-reply.mdx @@ -7,6 +7,8 @@ api: "POST https://api.trynawa.com/v1/comments/{id}/reply" Generate an AI-powered reply that matches the commenter's language and cultural context. For Arabic comments, replies match the detected dialect (Gulf, Egyptian, Levantine, MSA). For English comments, replies are natural and platform-appropriate. Language is auto-detected unless overridden. +When a comment is written in **Arabizi** (Latin-script Arabic with number substitutions like `7abibi`, `3ashan`, `9ba7`), NAWA automatically detects the script and replies in the same Arabizi style. The reply mirrors the commenter's dialect register and uses matching Latin-letter and digit conventions. This applies to all supported dialects: Gulf, Egyptian, Levantine, and Iraqi. + Cost: **$0.008** per request (8 credits). Semantic cache hits are free (`X-NAWA-Cache: HIT`). @@ -89,3 +91,71 @@ result = nawa.comments.reply( | `tone` | string | The tone used for the reply | | `original_intent` | string | Detected intent of the original comment | | `original_dialect` | string | Detected dialect of the original comment | + +## Arabizi script mirroring + +When the original comment is written in Arabizi, the reply is generated in the same Latin-script format. NAWA detects the number-letter conventions (e.g. `7` for ح, `3` for ع, `5` for خ) and the dialect register, then instructs the model to reply conversationally in Arabizi. + +### Arabizi example request + + + +```bash cURL +curl -X POST https://api.trynawa.com/v1/comments/cmt_xyz789/reply \ + -H "Authorization: Bearer nawa_test_sk_xxx" \ + -H "Content-Type: application/json" \ + -d '{ + "tone": "friendly", + "context": "Cooking channel focused on Gulf recipes" + }' +``` + +```typescript TypeScript +const { data, error } = await nawa.comments.reply('cmt_xyz789', { + tone: 'friendly', + context: 'Cooking channel focused on Gulf recipes' +}) +``` + +```python Python +result = nawa.comments.reply( + comment_id="cmt_xyz789", + tone="friendly", + context="Cooking channel focused on Gulf recipes" +) +``` + + + +If the original comment was `"9ba7ooo ya 7abibi el video wayid 7elw"` (Gulf Arabizi), the response looks like: + +```json +{ + "success": true, + "result": { + "comment_id": "cmt_xyz789", + "reply_text": "teslam ya '7abibi! wayid yesaadni asma3 chithii, el jay a7san inshallah 🔥", + "reply_dialect": "gulf", + "tone": "friendly", + "original_intent": "praise", + "original_dialect": "gulf" + }, + "errors": [], + "request_id": "req_rep456abc789" +} +``` + + + Arabizi replies are routed to Claude, which handles Latin-script Arabic output. The API shape and pricing are identical to standard Arabic replies. + + +### Supported Arabizi conventions + +| Number | Arabic letter | Example | +|--------|--------------|---------| +| `7` | ح (ha) | `7abibi` = حبيبي | +| `3` | ع (ain) | `3ashan` = عشان | +| `5` | خ (kha) | `5alas` = خلاص | +| `6` | ط (ta) | `6ayeb` = طيب | +| `9` | ص (sad) | `9ba7` = صباح | +| `2` | ء (hamza) | `2ana` = أنا | diff --git a/api-reference/detect.mdx b/api-reference/detect.mdx index 2169b2c..26868c7 100644 --- a/api-reference/detect.mdx +++ b/api-reference/detect.mdx @@ -7,6 +7,8 @@ api: "POST https://api.trynawa.com/v1/detect" Detect language, dialect, script, and text direction from any text input. Uses local NAGL modules only -- no external AI call, so responses return in under 100ms. +Arabizi (Latin-script Arabic with number substitutions like `7abibi`, `3ashan`) is also detected. The dialect is identified from vocabulary markers even when the text is written entirely in Latin characters. + Cost: **$0.002** per request (2 credits). Semantic cache hits are free (`X-NAWA-Cache: HIT`). @@ -139,6 +141,19 @@ curl -X POST https://api.trynawa.com/v1/detect \ -H "Authorization: Bearer nawa_test_sk_xxx" \ -H "Content-Type: application/json" \ -d '{"text": "awesome محتوى"}' +``` + + + + **Input:** "9ba7ooo ya 7abibi el video wayid 7elw" + **Result:** language: `ar`, dialect: `gulf`, script: `latin`, direction: `ltr` + + Arabizi text uses only Latin characters and digits, so `script` returns `latin` and `direction` returns `ltr`. The dialect is still detected from the vocabulary markers (`wayid`, `7elw` indicate Gulf). +```bash +curl -X POST https://api.trynawa.com/v1/detect \ + -H "Authorization: Bearer nawa_test_sk_xxx" \ + -H "Content-Type: application/json" \ + -d '{"text": "9ba7ooo ya 7abibi el video wayid 7elw"}' ``` diff --git a/changelog.mdx b/changelog.mdx index 43e9d75..0f28d32 100644 --- a/changelog.mdx +++ b/changelog.mdx @@ -4,6 +4,17 @@ description: "NAWA API platform updates and releases" rss: true --- + +## Improved Arabizi reply mirroring + +Replies to Arabizi (Latin-script Arabic) comments now more reliably mirror the commenter's script. Previously, replies could fall back to structured Arabic-script output. NAWA now prioritizes the Arabizi script-mirroring instruction, producing conversational Latin-script replies that match the commenter's dialect and number-letter conventions (`7`, `3`, `5`, etc.). + +- **Reply endpoint** (`/v1/comments/:id/reply`) -- Arabizi input consistently produces Arabizi output across Gulf, Egyptian, Levantine, and Iraqi dialects +- **No API changes** -- request and response shapes are unchanged. This is a quality improvement to reply generation. +- [Arabizi script mirroring docs](/api-reference/comments-reply#arabizi-script-mirroring) +- [Arabizi dialect detection](/guides/arabic-dialect-detection#arabizi-latin-script-arabic) + + ## Intelligence Report API + English support diff --git a/guides/arabic-dialect-detection.mdx b/guides/arabic-dialect-detection.mdx index 3c63a24..9294dd8 100644 --- a/guides/arabic-dialect-detection.mdx +++ b/guides/arabic-dialect-detection.mdx @@ -129,3 +129,30 @@ curl -X POST https://api.trynawa.com/v1/feedback \ ``` RLHF feedback is incorporated into model fine-tuning cycles, continuously improving accuracy across dialects. + +## Arabizi (Latin-script Arabic) + +NAWA also detects **Arabizi**, the informal Latin-script writing system used across social media where Arabic speakers substitute numbers for Arabic letters that have no Latin equivalent. Common conventions include `7` for ح, `3` for ع, `5` for خ, and `9` for ص. + +### How Arabizi detection works + +The NAGL pipeline identifies Arabizi by scanning for Latin-letter tokens that contain Arabic-number substitutions (`7abibi`, `3ashan`, `9ba7`) and matching them against a built-in Arabizi dictionary. When Arabizi is detected: + +1. **Classification endpoints** (`/v1/classify`, `/v1/rubric/classify`) transliterate the text internally before sending it to the LLM, so dialect and intent classification work correctly. +2. **Reply endpoint** (`/v1/comments/:id/reply`) mirrors the Arabizi script in the response. If a commenter writes in Gulf Arabizi, the generated reply comes back in the same Latin-script format. +3. **Detect endpoint** (`/v1/detect`) returns the detected language and dialect. Arabizi text that mixes Latin letters with Arabic-script text returns `script: "mixed"`. + +### Supported Arabizi dialects + +Arabizi detection covers the same four dialect groups, each with distinct vocabulary markers: + +| Dialect | Example Arabizi | Markers | +|---------|----------------|---------| +| **Gulf** | `shlonak, wayid 7elw, ya7lailak` | `shlon`, `wayid`, `7elw` | +| **Egyptian** | `ezayak, gamed, 3ashan, ba2a` | `ezayak`, `gamed`, `3ashan` | +| **Levantine** | `kifak, shu, wen, ktir` | `kifak`, `shu`, `wen` | +| **Iraqi** | `shaku maku, shlon, aku` | `shaku`, `maku`, `shlon` | + + + Arabizi is most common in YouTube comments, Twitter replies, and chat-style platforms. If your audience uses these platforms, expect a mix of Arabic-script and Arabizi comments. NAWA handles both transparently with no changes to your API calls. +