Skip to content

Commit c7b4e4a

Browse files
authored
Merge pull request #26 from farzai/feature/memory-optimization
Replace Guzzle with farzai/transport and optimize memory
2 parents 77bef5c + 73758d4 commit c7b4e4a

22 files changed

Lines changed: 2076 additions & 184 deletions

README.md

Lines changed: 35 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,17 @@ A PHP library for downloading and converting Geonames data. This library provide
1414
- Import data directly to MongoDB with proper indexing
1515
- Memory-efficient processing for large datasets
1616
- Progress bars for all operations
17-
- Support for filtering by feature types (for Gazetteer data)
17+
18+
## How It Works
19+
20+
### Admin Code Resolution
21+
When downloading gazetteer data, the library automatically downloads `admin1CodesASCII.txt` and `admin2Codes.txt` from GeoNames to resolve administrative division names (e.g., converting admin code "40" to "Bangkok").
22+
23+
### Memory-Efficient Processing
24+
Data is processed using streaming to handle large datasets without exhausting memory. For MongoDB imports, records are inserted in batches of 1000 for optimal performance.
25+
26+
### Automatic Cleanup
27+
Temporary files (downloaded ZIP files, extracted data files, and admin code files) are automatically cleaned up after processing completes.
1828

1929
## Installation
2030

@@ -70,22 +80,10 @@ Download geographical data for all countries:
7080
Options:
7181
- `--output (-o)`: Output directory (default: ./data)
7282
- `--format (-f)`: Output format (default: json, options: json, mongodb)
73-
- `--feature-class (-c)`: Filter by feature class (default: P)
7483
- `--mongodb-uri`: MongoDB connection URI (default: mongodb://localhost:27017)
7584
- `--mongodb-db`: MongoDB database name (default: geonames)
7685
- `--mongodb-collection`: MongoDB collection name (default: gazetteer)
7786

78-
Available feature classes:
79-
- `A`: Country, state, region
80-
- `H`: Stream, lake
81-
- `L`: Parks, area
82-
- `P`: City, village
83-
- `R`: Road, railroad
84-
- `S`: Spot, building, farm
85-
- `T`: Mountain, hill, rock
86-
- `U`: Undersea
87-
- `V`: Forest, heath
88-
8987
The Gazetteer data includes:
9088
- Geoname ID
9189
- Name (with ASCII and alternate names)
@@ -184,16 +182,14 @@ The MongoDB collection is indexed for efficient queries:
184182

185183
#### MongoDB Format
186184

187-
In MongoDB, the gazetteer data has the same structure as JSON but includes an additional `location` field for geospatial queries:
185+
In MongoDB, the gazetteer data uses slightly different field names and includes a `location` field for geospatial queries:
188186

189187
```json
190188
{
191-
"geoname_id": 1609350,
189+
"geonameid": 1609350,
192190
"name": "Bangkok",
193-
"ascii_name": "Bangkok",
194-
"alternate_names": ["Krung Thep", "กรุงเทพมหานคร"],
195-
"latitude": 13.75,
196-
"longitude": 100.51667,
191+
"asciiname": "Bangkok",
192+
"alternatenames": ["Krung Thep", "กรุงเทพมหานคร"],
197193
"location": {
198194
"type": "Point",
199195
"coordinates": [100.51667, 13.75]
@@ -217,11 +213,6 @@ In MongoDB, the gazetteer data has the same structure as JSON but includes an ad
217213
```
218214

219215
The MongoDB collection is indexed for efficient queries:
220-
- Unique index on `geoname_id`
221-
- Index on `country_code`
222-
- Index on `feature_class`
223-
- Index on `feature_code`
224-
- Text index on `name` and `ascii_name`
225216
- Geospatial index on `location`
226217

227218
## MongoDB Usage Examples
@@ -267,6 +258,26 @@ foreach ($result as $postalCode) {
267258
}
268259
```
269260

261+
## Error Handling
262+
263+
The library throws `Farzai\Geonames\Exceptions\GeonamesException` for all error conditions:
264+
265+
```php
266+
use Farzai\Geonames\Exceptions\GeonamesException;
267+
268+
try {
269+
// Download or convert operations
270+
} catch (GeonamesException $e) {
271+
echo "Error: " . $e->getMessage();
272+
}
273+
```
274+
275+
Common error scenarios:
276+
- **File operation failures**: Unable to read, write, or open files
277+
- **ZIP extraction failures**: Corrupted or invalid ZIP archives
278+
- **Data not found**: Missing expected data files in downloaded archives
279+
- **Missing dependencies**: MongoDB extension not installed when using MongoDB format
280+
270281
## License
271282

272283
This package is open-sourced software licensed under the MIT license.

bin/geonames

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,14 @@ declare(strict_types=1);
55

66
require __DIR__ . '/../vendor/autoload.php';
77

8-
use Symfony\Component\Console\Application;
9-
use Farzai\Geonames\Console\Commands\DownloadPostalCodesCommand;
8+
use Composer\InstalledVersions;
109
use Farzai\Geonames\Console\Commands\DownloadGazetteerCommand;
10+
use Farzai\Geonames\Console\Commands\DownloadPostalCodesCommand;
11+
use Symfony\Component\Console\Application;
12+
13+
$version = InstalledVersions::getPrettyVersion('farzai/geonames') ?? '1.0.0';
1114

12-
$application = new Application('Geonames CLI', '1.0.0');
15+
$application = new Application('Geonames CLI', $version);
1316
$application->add(new DownloadPostalCodesCommand());
1417
$application->add(new DownloadGazetteerCommand());
15-
$application->run();
18+
$application->run();

composer.json

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,16 @@
77
"require": {
88
"php": "^8.1",
99
"symfony/console": "^6.0 || ^7.0",
10-
"guzzlehttp/guzzle": "^7.0"
10+
"farzai/transport": "^2.1"
1111
},
1212
"require-dev": {
13-
"pestphp/pest": "^2.34",
14-
"spatie/ray": "^1.28",
1513
"laravel/pint": "^1.2",
14+
"pestphp/pest": "^2.34",
1615
"phpstan/extension-installer": "^1.1",
1716
"phpstan/phpstan-deprecation-rules": "^1.0",
18-
"phpstan/phpstan-phpunit": "^1.0"
17+
"phpstan/phpstan-phpunit": "^1.0",
18+
"spatie/ray": "^1.28",
19+
"symfony/http-client": "^6.4 || ^7.0"
1920
},
2021
"autoload": {
2122
"psr-4": {
@@ -48,7 +49,6 @@
4849
"analyse": "vendor/bin/phpstan analyse"
4950
},
5051
"suggest": {
51-
"guzzlehttp/guzzle": "Required to download the data from Geonames.",
5252
"ext-zip": "Required to extract the downloaded data.",
5353
"mongodb/mongodb": "Required for MongoDB output format support."
5454
},

src/Console/Commands/DownloadGazetteerCommand.php

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
use Farzai\Geonames\Converter\GazetteerConverter;
88
use Farzai\Geonames\Converter\MongoDBGazetteerConverter;
99
use Farzai\Geonames\Downloader\GazetteerDownloader;
10+
use Symfony\Component\Console\Attribute\AsCommand;
1011
use Symfony\Component\Console\Command\Command;
1112
use Symfony\Component\Console\Input\InputArgument;
1213
use Symfony\Component\Console\Input\InputInterface;
@@ -35,6 +36,10 @@
3536
* geonames:gazetteer:download all -c P # All populated places
3637
* geonames:gazetteer:download US -f mongodb # Import US data to MongoDB
3738
*/
39+
#[AsCommand(
40+
name: 'geonames:gazetteer:download',
41+
description: 'Download and convert Geonames Gazetteer data'
42+
)]
3843
class DownloadGazetteerCommand extends Command
3944
{
4045
/**
@@ -45,20 +50,6 @@ class DownloadGazetteerCommand extends Command
4550
'admin2Codes.txt',
4651
];
4752

48-
/**
49-
* The default command name.
50-
*
51-
* @var string
52-
*/
53-
protected static $defaultName = 'geonames:gazetteer:download';
54-
55-
/**
56-
* The default command description.
57-
*
58-
* @var string
59-
*/
60-
protected static $defaultDescription = 'Download and convert Geonames Gazetteer data';
61-
6253
/**
6354
* The gazetteer downloader instance.
6455
*/

src/Console/Commands/DownloadPostalCodesCommand.php

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
use Farzai\Geonames\Converter\MongoDBPostalCodeConverter;
88
use Farzai\Geonames\Converter\PostalCodeConverter;
99
use Farzai\Geonames\Downloader\GeonamesDownloader;
10+
use Symfony\Component\Console\Attribute\AsCommand;
1011
use Symfony\Component\Console\Command\Command;
1112
use Symfony\Component\Console\Input\InputArgument;
1213
use Symfony\Component\Console\Input\InputInterface;
@@ -24,22 +25,12 @@
2425
* geonames:download all # Download all countries
2526
* geonames:download US -f mongodb # Import US data to MongoDB
2627
*/
28+
#[AsCommand(
29+
name: 'geonames:download',
30+
description: 'Download and convert postal codes data from Geonames'
31+
)]
2732
class DownloadPostalCodesCommand extends Command
2833
{
29-
/**
30-
* The default command name.
31-
*
32-
* @var string
33-
*/
34-
protected static $defaultName = 'geonames:download';
35-
36-
/**
37-
* The default command description.
38-
*
39-
* @var string
40-
*/
41-
protected static $defaultDescription = 'Download and convert postal codes data from Geonames';
42-
4334
/**
4435
* The postal code downloader instance.
4536
*/

src/Converter/AbstractConverter.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ protected function streamPostalCodeRecords(string $txtFile): Generator
234234
*/
235235
protected function createProgressBar(int $totalLines): ?ProgressBar
236236
{
237-
if ($this->output === null) {
237+
if ($this->output === null || $totalLines <= 0) {
238238
return null;
239239
}
240240

src/Converter/GazetteerConverter.php

Lines changed: 79 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,51 +5,113 @@
55
namespace Farzai\Geonames\Converter;
66

77
use Farzai\Geonames\Exceptions\GeonamesException;
8+
use Generator;
89

910
/**
1011
* Converts GeoNames gazetteer data from ZIP files to JSON format.
1112
*
1213
* This converter extracts geographical feature data from GeoNames ZIP archives
1314
* and outputs it as a JSON file with administrative code name resolution.
15+
* Uses streaming to handle large files with minimal memory usage.
1416
*/
1517
class GazetteerConverter extends AbstractGazetteerConverter
1618
{
1719
/**
1820
* Process the gazetteer data file and write to JSON output.
1921
*
22+
* Uses streaming to process large files with O(1) memory complexity.
23+
*
2024
* @param string $txtFile Path to the source TXT file containing gazetteer data
2125
* @param string $outputFile Path to the output JSON file
2226
*
2327
* @throws GeonamesException When processing fails
2428
*/
2529
protected function processFile(string $txtFile, string $outputFile): void
2630
{
27-
$data = [];
28-
$lines = file($txtFile, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
31+
$totalLines = $this->countLines($txtFile);
32+
$progressBar = $this->createProgressBar($totalLines);
2933

30-
if ($lines === false) {
31-
throw GeonamesException::fileOperationFailed('read', $txtFile);
34+
$handle = fopen($outputFile, 'wb');
35+
if ($handle === false) {
36+
throw GeonamesException::fileOperationFailed('open for writing', $outputFile);
3237
}
3338

34-
foreach ($lines as $line) {
35-
if (empty(trim($line))) {
36-
continue;
37-
}
39+
try {
40+
$this->writeToHandle($handle, '[', $outputFile);
41+
$first = true;
42+
43+
foreach ($this->streamGazetteerRecords($txtFile) as $record) {
44+
if (! $first) {
45+
$this->writeToHandle($handle, ',', $outputFile);
46+
}
47+
48+
$json = json_encode($record, JSON_UNESCAPED_UNICODE);
49+
if ($json === false) {
50+
throw GeonamesException::fileOperationFailed('encode JSON', $outputFile);
51+
}
3852

39-
$record = $this->parseGazetteerLine($line);
40-
if ($record !== null) {
41-
$data[] = $record;
53+
$this->writeToHandle($handle, $json, $outputFile);
54+
$first = false;
55+
56+
$progressBar?->advance();
4257
}
43-
}
4458

45-
$jsonContent = json_encode($data, JSON_PRETTY_PRINT | JSON_UNESCAPED_UNICODE);
46-
if ($jsonContent === false) {
47-
throw GeonamesException::fileOperationFailed('encode JSON', $outputFile);
59+
$this->writeToHandle($handle, ']', $outputFile);
60+
} finally {
61+
fclose($handle);
62+
$this->finishProgressBar($progressBar);
4863
}
64+
}
4965

50-
$result = file_put_contents($outputFile, $jsonContent);
51-
if ($result === false) {
66+
/**
67+
* Write content to a file handle with error checking.
68+
*
69+
* @param resource $handle The file handle to write to
70+
* @param string $content The content to write
71+
* @param string $outputFile The output file path (for error messages)
72+
*
73+
* @throws GeonamesException When the write operation fails
74+
*/
75+
private function writeToHandle($handle, string $content, string $outputFile): void
76+
{
77+
if (fwrite($handle, $content) === false) {
5278
throw GeonamesException::fileOperationFailed('write', $outputFile);
5379
}
5480
}
81+
82+
/**
83+
* Stream gazetteer records from a TXT file.
84+
*
85+
* Uses a generator to yield records one at a time, enabling memory-efficient
86+
* processing of large files.
87+
*
88+
* @param string $txtFile Path to the TXT file containing gazetteer data
89+
* @return Generator<int, array<string, mixed>> Yields gazetteer records
90+
*
91+
* @throws GeonamesException When the file cannot be opened
92+
*/
93+
protected function streamGazetteerRecords(string $txtFile): Generator
94+
{
95+
$handle = fopen($txtFile, 'r');
96+
97+
if ($handle === false) {
98+
throw GeonamesException::fileOperationFailed('open', $txtFile);
99+
}
100+
101+
try {
102+
while (($line = fgets($handle)) !== false) {
103+
$trimmedLine = trim($line);
104+
if (empty($trimmedLine)) {
105+
continue;
106+
}
107+
108+
$record = $this->parseGazetteerLine($trimmedLine);
109+
if ($record !== null) {
110+
yield $record;
111+
}
112+
}
113+
} finally {
114+
fclose($handle);
115+
}
116+
}
55117
}

0 commit comments

Comments
 (0)