Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ build_docker_image:
after_script:
- docker logout ${CI_REGISTRY}


deploy_primary_dev:
variables:
ENVIRONMENT_NAME: primary_dev
Expand All @@ -40,6 +39,7 @@ deploy_primary_dev:
only:
- dev
- main
- chore/working-es-search-and-faceting
when: manual
extends: .kube_deploy_script

Expand All @@ -53,6 +53,7 @@ deploy_primary_prod:
only:
- dev
- main
- chore/working-es-search-and-faceting
when: manual
extends: .kube_deploy_script

Expand All @@ -66,6 +67,7 @@ deploy_fallback_prod:
only:
- dev
- main
- chore/working-es-search-and-faceting
when: manual
extends: .kube_deploy_script

Expand Down
6 changes: 4 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@ FROM gradle:8.14-jdk24 AS builder

WORKDIR /app

# Copy module build files only, to warm up and pre-resolve dependencies
# Copy minimal Gradle build files first, to warm up and pre-resolve dependencies
COPY settings.gradle.kts .
COPY build.gradle.kts .
COPY proto/build.gradle.kts proto/
COPY server/build.gradle.kts server/
RUN gradle --no-daemon clean build -x test || return 0
RUN gradle --no-daemon dependencies || true

COPY . .
RUN gradle :server:bootJar --no-daemon -x test
Expand Down
38 changes: 20 additions & 18 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,19 +1,26 @@
services:
# biosamples-search:
# build: .
# image: biosamples-search:latest
# ports:
# - "8080:8080"
# - "9090:9090"
# environment:
# - spring.elasticsearch.username=elastic
# - spring.elasticsearch.password=elastic
# - spring.elasticsearch.uris=http://elastic:9200
# links:
# - elastic
biosamples-search:
build: .
image: biosamples-search:latest
ports:
- "8080:8080"
- "9090:9090"
depends_on:
elastic:
condition: service_healthy
rabbitmq:
condition: service_started
environment:
- spring.elasticsearch.username=elastic
- spring.elasticsearch.password=elastic
- spring.elasticsearch.uris=http://elastic:9200
- spring.rabbitmq.host=rabbitmq
setup:
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
user: "0"
depends_on:
elastic:
condition: service_healthy
command: >
bash -c '
if [ x${ELASTIC_PASSWORD} == x ]; then
Expand All @@ -29,11 +36,6 @@ services:
until curl -s -X POST -u "elastic:${ELASTIC_PASSWORD}" -H "Content-Type: application/json" http://elastic:9200/_security/user/kibana_system/_password -d "{\"password\":\"${KIBANA_PASSWORD}\"}" | grep -q "^{}"; do sleep 10; done;
echo "All done!";
'
healthcheck:
test: [ "CMD-SHELL", "[ -f config/certs/es01/es01.crt ]" ]
interval: 1s
timeout: 5s
retries: 120
elastic:
image: docker.elastic.co/elasticsearch/elasticsearch:${STACK_VERSION}
volumes:
Expand Down Expand Up @@ -101,4 +103,4 @@ volumes:
kibana_data:
driver: local
rabbitmq_data:
driver: local
driver: local
49 changes: 49 additions & 0 deletions docs/cursor/README_search_service_query.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Querying the search service directly

The **biosamples-search** service exposes a REST API for development/testing (core app uses gRPC). Use it to see what the search service returns without going through the core app.

- **Search service** usually runs on **port 8080** (or set `SEARCH_PORT`).
- **Core app** (biosamples-v4) runs on **port 8081** and calls the search service via gRPC.

## Endpoint

- **POST** `http://localhost:8080/search`
Body: JSON `SearchQuery` (see examples below).

## Quick test (PowerShell)

From the repo root:

```powershell
# Exact accession
Invoke-RestMethod -Uri "http://localhost:8080/search" -Method Post -ContentType "application/json" -Body '{"filters":[{"type":"pub","webinId":""},{"type":"acc","accession":"SAMEA26"}],"page":0,"size":20}'

# Wildcard accession (asterisk in JSON is not URL-encoded)
Invoke-RestMethod -Uri "http://localhost:8080/search" -Method Post -ContentType "application/json" -Body '{"filters":[{"type":"pub","webinId":""},{"type":"acc","accession":"SAME*"}],"page":0,"size":20}'
```

## Scripts

- **Bash:** `./docs/cursor/query_search_service.sh` (from project root; needs `jq` for pretty output).
- **PowerShell:** `.\docs\cursor\query_search_service.ps1`

Override port: `$env:SEARCH_PORT=9090; .\docs\cursor\query_search_service.ps1`

## Request JSON format

Filters use a `type` discriminator and type-specific fields:

| type | Example |
|------|--------|
| `pub` | `{"type":"pub","webinId":""}` |
| `acc` | `{"type":"acc","accession":"SAME*"}` or `{"type":"acc","accession":"SAMEA26"}` |

Full query: `{"text":null,"filters":[...],"facets":null,"page":0,"size":20,"sort":null,"searchAfter":null}`

Pre-made bodies: `search_service_query_acc_exact.json`, `search_service_query_acc_wildcard.json`.

## How to interpret results

- If **exact** returns hits and **wildcard** returns hits → search service and ES are fine; the core app or URL handling is likely dropping `*` for `filter=acc:SAME*`.
- If **exact** returns hits and **wildcard** returns 0 → issue is inside the search service (e.g. wildcard query building).
- If both return 0 → check that the search service is pointing at the same ES index and that data exists (e.g. public, not suppressed).
116 changes: 116 additions & 0 deletions docs/cursor/check_characteristics_field.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
# Checking for Documents Without Characteristics Field

## Quick Check Commands

### 1. Count documents WITHOUT characteristics field

```bash
curl -u elastic:elastic -X POST "http://localhost:9200/samples/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{
"size": 0,
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "characteristics"
}
}
]
}
},
"aggs": {
"total_without_characteristics": {
"value_count": {
"field": "_id"
}
}
}
}'
```

### 2. Count documents WITH characteristics field

```bash
curl -u elastic:elastic -X POST "http://localhost:9200/samples/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{
"size": 0,
"query": {
"exists": {
"field": "characteristics"
}
},
"aggs": {
"total_with_characteristics": {
"value_count": {
"field": "_id"
}
}
}
}'
```

### 3. Get sample documents without characteristics (first 5)

```bash
curl -u elastic:elastic -X POST "http://localhost:9200/samples/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{
"size": 5,
"_source": ["accession", "name"],
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "characteristics"
}
}
]
}
}
}'
```

### 4. Check if characteristics field is empty array

```bash
curl -u elastic:elastic -X POST "http://localhost:9200/samples/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{
"size": 5,
"_source": ["accession", "characteristics"],
"query": {
"script": {
"script": {
"source": "doc[\"characteristics\"].size() == 0",
"lang": "painless"
}
}
}
}'
```

### 5. Check mapping for characteristics field

```bash
curl -u elastic:elastic -X GET "http://localhost:9200/samples/_mapping/field/characteristics?pretty"
```

## Understanding the Results

- **Documents without characteristics**: These might cause issues with nested queries if the query doesn't handle missing fields properly
- **Empty characteristics arrays**: Some documents might have `characteristics: []` which is different from missing field
- **Nested query behavior**: Nested queries in `must_not` should handle missing fields, but there might be edge cases

## Potential Issues

If you find documents without `characteristics`:
1. The nested query in `must_not` should still work (it will match documents without the field)
2. However, if there's a query syntax issue, it might fail
3. Check Elasticsearch version compatibility with nested queries in `must_not`



96 changes: 96 additions & 0 deletions docs/cursor/check_documents_without_characteristics.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
#!/bin/bash

# Check if there are documents without the characteristics field
# Replace with your Elasticsearch credentials and index name

ELASTICSEARCH_URL="http://localhost:9200"
INDEX_NAME="samples"
USERNAME="elastic"
PASSWORD="elastic"

echo "=== Checking for documents WITHOUT characteristics field ==="
echo ""

# Query 1: Count documents that don't have characteristics field
curl -u "${USERNAME}:${PASSWORD}" -X POST "${ELASTICSEARCH_URL}/${INDEX_NAME}/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{
"size": 0,
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "characteristics"
}
}
]
}
},
"aggs": {
"total_without_characteristics": {
"value_count": {
"field": "_id"
}
}
}
}'

echo ""
echo ""
echo "=== Checking for documents WITH characteristics field ==="
echo ""

# Query 2: Count documents that have characteristics field
curl -u "${USERNAME}:${PASSWORD}" -X POST "${ELASTICSEARCH_URL}/${INDEX_NAME}/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{
"size": 0,
"query": {
"exists": {
"field": "characteristics"
}
},
"aggs": {
"total_with_characteristics": {
"value_count": {
"field": "_id"
}
}
}
}'

echo ""
echo ""
echo "=== Sample documents without characteristics (first 5) ==="
echo ""

# Query 3: Get sample documents without characteristics
curl -u "${USERNAME}:${PASSWORD}" -X POST "${ELASTICSEARCH_URL}/${INDEX_NAME}/_search?pretty" \
-H 'Content-Type: application/json' \
-d '{
"size": 5,
"_source": ["accession", "name"],
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "characteristics"
}
}
]
}
}
}'

echo ""
echo ""
echo "=== Total document count ==="
echo ""

# Query 4: Total document count
curl -u "${USERNAME}:${PASSWORD}" -X GET "${ELASTICSEARCH_URL}/${INDEX_NAME}/_count?pretty"



Loading