A Scala user-agent string parser based on ua-parser/uap-core. It extracts browser, OS and device information.
To use this library in your own project, add the following dependency in build.sbt:
libraryDependencies += "org.uaparser" %% "uap-scala" % "0.16.0"
Instantiating Parser.default also instantiates secondary classes and reads in YAML files. This is slow. If performance is critical or you are handling user agents in real time, be sure not to do this on the critical path for processing requests.
import org.uaparser.scala.Parser
val ua = "Mozilla/5.0 (iPhone; CPU iPhone OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3"
val client = Parser.default.parse(ua) // you can also use CachingParser
println(client) // Client(UserAgent(Mobile Safari,Some(5),Some(1),None),OS(iOS,Some(5),Some(1),Some(1),None),Device(iPhone))The time costs of parsing all the data may be high. To reduce the costs, we can just parse partial data.
import org.uaparser.scala.Parser
val raw = "Mozilla/5.0 (iPhone; CPU iPhone OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3"
val parser = Parser.default
val os = parser.osParser.parse(raw)
println(os) // OS(iOS,Some(5),Some(1),Some(1),None)
val userAgent = parser.userAgentParser.parse(raw)
println(userAgent) // UserAgent(Mobile Safari,Some(5),Some(1),None)
val device = parser.deviceParser.parse(raw)
println(device) // Device(iPhone,Some(Apple),Some(iPhone))The code for this repository can be checked out normally. It uses a git submodule to include the files needed from uap-core so care must be taken to make sure the core directory is properly checked out and initialized.
Checking out the repo for the first time
git clone --recursive https://github.com/ua-parser/uap-scala.git
If uap-scala was checked out and core was not properly initialized, the following can be done
cd uap-scala
git submodule update --init --recursive
To build and publish locally for the default Scala (currently 2.13.11):
sbt publishLocalTo cross-build for different Scala versions:
sbt +publishLocalThis project uses the 'regexes.yaml' file from ua-parser/uap-core repository to
perform user-agent string parsing according to the documented specification.
The file is included as a git submodule in the core directory.
Below, follows a summary of that same specification.
This implementation (and others) works by applying three independent ordered rule lists to the same input user‑agent string:
- User agent parser ('user_agent_parsers' definitions): provides the "browser" name and version.
- OS parser ('os_parsers' definitions): provides operating system name and version.
- Device parser ('device_parsers'): provides device family and optional brand and model.
Each list is evaluated top‑to‑bottom. The first matching regex wins, and parsing for that list stops immediately.
At a high level, 'regexes.yaml' is a YAML map with top-level keys like:
user_agent_parsers:os_parsers:device_parsers:
Each value is a YAML list. Each list item is a small map that always contains a regex and may contain *_replacement
fields.
User agent parser example:
user_agent_parsers:
- regex: '(Namoroka|Shiretoko|Minefield)/(\d+)\.(\d+)\.(\d+(?:pre|))'
family_replacement: 'Firefox ($1)'OS parser example:
os_parsers:
- regex: 'CFNetwork/.{0,100} Darwin/22\.([0-5])\.\d+'
os_replacement: 'iOS'
os_v1_replacement: '16'
os_v2_replacement: '$1'Device parser example:
device_parsers:
- regex: '; *(PEDI)_(PLUS)_(W) Build'
device_replacement: 'Odys $1 $2 $3'
brand_replacement: 'Odys'
model_replacement: '$1 $2 $3'The spec’s core idea is to put capturing groups (...) in your regex to extract parts of the UA string. If you don't
supply replacements, fields map by group order.
If a user agent rule matches and it provides no replacements:
- group 1: family
- group 2: major
- group 3: minor
- group 4: patch
Similarly, OS rules map:
- group 1: family
- group 2: major
- group 3: minor
- group 4: patch
- group 5: patchMinor
Devices are slightly different: if no replacements are given, the first match defines the device family and model, and brand/model may be undefined depending on the rule and implementation.
In case no matching regex is found, the value for family shall be "Other". Brand and model shall not be defined. Leading and trailing whitespaces shall be trimmed from the result.
- Piotr Adamski (@mcveat) (Author. Based on the java implementation by Steve Jiang @sjiang and using agent data from BrowserScope)
- Ahmed Sobhi (@humanzz)
- Travis Brown (@travisbrown)
- Nguyen Hong Phuc (@phuc89)