Import contacts from CSV by rhelder · Pull Request #350 · lucc/khard

rhelder · 2026-01-20T04:39:37Z

The big-box address-book providers (Outlook, Google, Apple) all support importing contacts from CSV files. Khard does not support this, nor to my knowledge does any console address-book software. Because contact data is often distributed in CSV form, it is useful to be able to import contacts from CSV, and it's practically indispensable if you need to manage a large number of contacts. For example, I'm a teacher at the university level, and my students' contact info (most importantly, their emails, to which I frequently need to send group messages) is given to me by the University in CSV format. Because I usually have fifty students at a time, it is not feasible to to import that many contacts into khard without some kind of scripting solution. Since I was using khard's API anyway, I decided to have a go at implementing the feature. The feature was non-trivial to implement, so apologies in advance for the length of this PR. Thanks for reviewing it, and for your work on khard!

CLI

The new feature is implemented via a new subcommand, import (csv is also provided as an alias). khard import is designed to be as consistent with khard new as possible. khard import takes the same options as khard new:

-a, --addressbook specifies the address book into which the contacts should be imported. The user is asked to specify an address book if this option is not supplied.
-i, --input-file specifies the CSV file from which the contacts should be imported (stdin by default).
--open-editor, --edit gives the user the option to review/edit the contact after the successful creation of each contact (not unlike Apple Contacts, which asks you to review each imported contact).
--vcard-version is the same as for khard new.

khard import takes one additional option, -d or --delimiter, which allows you to specify what field delimiter is used in your CSV file (',' by default).

Like khard new, if no input is supplied to khard import, the user's text editor is opened to edit a temporary file containing a template -- in this case, a CSV template. I don't think there's really a use case for this, since CSV files are not very user-friendly to edit in a text editor. But I thought it was important both for consistency with khard new and also with UNIX tools in general -- if you don't provide any input to cat, for example, it will just hang, because anything you type after executing the command still counts as stdin. So even though opening a CSV template when the user fails to supply any input is not very useful, it at least will be unsurprising to the user.

I modified khard template to be able to show the CSV template if -c or --csv is passed to it. It still shows the YAML template by default (or, superfluously, with the -y or --yaml option), so this change doesn't break anything. (This functionality could be assigned to a subcommand other than template, but I thought it was neater to fold it into template.)

Implementation

There are two main obstacles to implementing the ability to import contacts from CSV. The first is that CSV is a (very) simple data format, and that contacts are complex (evidenced by the fact that khard models them with YAML, which is a complex data format). The solution to this is to specify a clear standard for validly formatting CSV files. Fortunately, Google (https://support.google.com/contacts/answer/15147365?hl=en-GB&co=GENIE.Platform%3DDesktop#zippy=%2Cuse-a-template-spreadsheet-to-create-a-csv-file-to-import) and Outlook (https://support.microsoft.com/en-us/office/create-or-edit-csv-files-to-import-into-outlook-4518d70d-8fe9-46ad-94fa-1494247193c7) provide some clues as to how to specify a standard in a way that might be more or less familiar to people.

The details of this are discussed in a compressed and dry way in the API documentation, and will need to be spelled out in a more friendly way in user-facing documentation (which I am happy to write if you are interested in merging this PR). Here's a quick overview of how column headers need to be specified in order to get certain data structures:

To get something equivalent to the YAML structure First name: Bruce, the column header should be 'First name' (where 'Bruce' is a value, in that column, in one of the subsequent rows of the CSV file).
To get something equivalent to the YAML structure
```
Organisation:
    - Justice League
    - Wayne Enterprises
```
'Justice League' should be in a column named 'Organisation 1' and 'Wayne Enterprises should be in a column named 'Organisation 2'.
To get something equivalent to the YAML structure
```
Email:
    work: thebat@justice.org
    home: bruce@gmail.com
```
'work' should be in a column named 'Email 1 - type', and 'thebat@justice.org' should be in a column named 'Email 1 - value', in the same row. In the same row, 'home' should be in a column named 'Email 2 - type', and 'bruce@gmail.com' should be in a column named 'Email 2 - value', also in the same row.
The same idea as in 3 applies to addresses. To get something equivalent to the YAML structure
```
Address:
    home:
        Street: 1007 Mountain Drive
        City: Gotham City
        Country: USA
```
'home' should be in a column named 'Address 1 - type', '1007 Mountain Drive' should be in a column named 'Address 1 - Street', 'Gotham City' should be in a column named 'Address 1 - City', and 'USA' should be in a column named 'Address 1 - Country'.
Finally, in structures like those in 3 and 4, lists are supported. To get something equivalent to the YAML structure
```
Email:
    work:
        - thebat@justice.org
        - bruce@wayne.com
```
'work' should be in a column named 'Email 1 - type' and in a column named 'Email 2 - type'. 'thebat@justice.org' should be in a column named 'Email 1 - value', and 'bruce@wayne.com' should be in a column named 'Email 2 - value'.

Note that the numbers ('Email 1', 'Email 2') are more or less arbitrary; they are just meant to say that different values are associated with each other (which CSV is not capable of conveying on its own). But the numbers need to start with 1, and they need to be in a sequence. For example, if there's an 'Email 3', there must also be an 'Email 2' and 'Email 1'. (The order that they're presented in the CSV file doesn't matter, though.)

Although the above standard is straightforward enough, processing the data into a form that khard.YAMLEditable.update() can read (to avoid any duplication of the contact-creation logic that's already performed by that method) can get messy fast. To keep things organized, I ended up implementing the CSV parsing code as a separate module. (The specifics are documented, in hopefully a clear way, within the module.)

The second obstacle is that khard's API (both public and private) doesn't readily support creating Contacts out of anything but YAML input. So, although in general I aimed for my PR to only add things, and not change things, there were a couple of places where I had to pry the API open a bit to allow Contacts to be created from the data returned by the CSV parser. The alternative would have been to convert the CSV parser data into YAML, just to be converted back into the same dict -- which seemed pretty silly. However, none of the changes I made were breaking, in the sense that no one who calls into the API will need to make any changes to their code on account of the changes I made.

In particular:

I factored the validation of the data out of the khard.YAMLEditable._parse_yaml() method, so that I could use the same validation logic when creating a new Contact from a dict.
I allowed the argument to khard.YAMLEditable.update() to be either a string or a dictionary. If it's a string, it's parsed as YAML input just like it was before. If it's a dictionary, the dictionary is validated, and then Contact creation proceeds as normal.
I wrapped these changes up in a new public API method, khard.Contact.from_dict().
Finally, I allowed (but did not require) the specification of a suffix other than '.yml' in khard.helpers.interactive.Editor.write_temp_file().

Again, these changes should not be noticed.

Tests

The PR includes tests verifying that YAML files and a CSV file containing the same data produce equivalent contacts. The tests also verify that the order of columns in the CSV file does not change the validity of the final result.

To-do

If you are interested in merging this, the remaining tasks I can think of are

Write user-facing documentation,
And implement command-line completion for the new import subcommand.

I'm happy to do all of these, of course.

I hope this PR can be helpful. Thanks again for your work on khard!

lucc · 2026-02-02T23:44:42Z

@rhelder thanks for your work. I am interesting in merging this and would like to support you with your work on this. I have some initial comments:

CLI

There has been a very short discussion in #346 (comment). Is there a reason why you chose khard import instead of khard new? You explicitly say that you model the args for import after new and I see some code duplication for that.

I would prefer not to use csv as an alias and maybe not even import. Just add a --format option to new (see the --format for other subcommands). If you have strong feelings for an import subcommand we might discuss if it should be an alias for new. That should also reduce the code duplication in cli.py.

This --format option could later be extended to also accept vcard. I think that is the cleaner design and more flexible. (csv for example would not give a hint if it reads or writes csv data)

Implementation

I have no strong feelings about the csv format and the mapping to yaml/dict/lists. As long as we can express all the data formats that we need it is fine, I think.

I am also fine with extending the Contact.parse/create_from/update/... functions for new data types, or adding similar functions. I like the idea that I saw in the code that updating via dict or yaml uses different code paths in the beginning but the converges after the yaml is parsed. (This represents the idea that I have with the new --format option where the code path converges after parsing and also with the list --format option where the code path diverges after selecting contacts in order to print them)

Code

I will look at the code more thoroughly at some later time. Please remember to run the tests and type checker to ensure that the code fulfills some baseline.

If something is unclear just ask :)

lucc · 2026-02-04T07:00:43Z

I have dropped support for python 3.9 in main and updated the type hints to use the "new" union operator. So you can rebase your branch and do not have to fix these type related ci failures.

rhelder · 2026-02-07T01:42:34Z

Thanks for your comments @lucc! I'm glad to be able to keep working on this.

There has been a very short discussion in #346 (comment). Is there a reason why you chose khard import instead of khard new? You explicitly say that you model the args for import after new and I see some code duplication for that.

No, not really. Out of caution, I aired on the side of adding things rather than changing things, at least where that was possible – but that was just for the sake of getting you a first draft, not because I thought it was necessarily good design. (On the contrary, I didn't want to be overly opinionated about the design at this stage.) I like your suggestion to add a --format option to new a lot. I will start working on consolidating the duplicated code this weekend.

The one part of the duplicated code that isn't strictly duplicated is the help text – new uses the singular ('create a new contact'), and the experimental import uses the plural ('create new contacts'). The same goes for their options (e.g., --addressbook is 'Specify address book in which to create the new contacts' for import, and 'Specify address book in which to create the new contact' for new). I'll try to get the language in the help text right as I make the other changes.

Please remember to run the tests and type checker to ensure that the code fulfills some baseline.

I apologize that I overlooked the type checks – I did run the unit tests, but I didn't notice the type checks (I haven't worked with the typing module or mypy before). I think there are still some errors even after rebasing, so I will work on fixing those as well.

Thanks again for reviewing the PR!

rhelder · 2026-02-08T03:30:55Z

Both unit tests and type checks should be passing now. Continuing to work on the other things.

This will make it possible to create contacts from data formats other than YAML without rendering the data as a YAML string.

This module reads CSV files and returns data that can be read by the 'khard.contacts' module. The module will be used to create new contacts from CSV.

'--format' specifies the output format, either 'yaml' or 'csl'. YAML is still the default output format for 'khard template'. The CSV template will be used when importing contacts from CSV. Just as a YAML template is opened in the user's editor when 'khard new' is invoked without any input, a CSV template will be opened in the user's editor when 'khard new --format csv' is invoked without any input. (This will not be terribly useful, since CSV is not very pleasant to edit in a text editor, but it will be consistent with the current behavior of 'khard new').

This is not particularly useful on its own, but it will be used as a fallback when importing contacts from CSV, in the event that the user doesn't supply any input (i.e., like 'khard new' does).

Check whether or not YAML files and CSV files containing the same data produce equivalent Contact objects. Make one of the CSV files 'jumbled' to show that column order doesn't matter to getting the right result.

rhelder · 2026-02-21T00:33:38Z

I've folded all of the functionality from khard import into khard new --format csv. The --format option can also be passed to khard template (earlier I had two separate switches, --yaml and --csv, which was much more awkward).

Type and unit tests are passing, but obviously doc tests are failing. They are also failing on main though, so I'm not sure what to make of that for now.

I'll look forward to your code review, when you have the time. Sorry for my own slow pace – busy time at work. Thanks!

lucc

Sorry for the delay. I had a look at the code and left some comments.

lucc · 2026-03-26T06:52:17Z

khard/helpers/interactive.py

+        """
+        with contextlib.ExitStack() as stack:
+            filename = stack.enter_context(
+                    self.write_temp_file(template, ".csv")


If you only have one file you do not need a stack here. One simple with statement should be enough.

lucc · 2026-03-26T06:55:55Z

khard/cli.py

+        description="print an empty yaml (default) or CSV template",
+        help="print an empty yaml (default) or CSV template")
+    template_parser.add_argument(
+        "-O", "--format", choices=("yaml", "csv"), default="yaml",


I am confused by the -O, I would have expected -f as a short option. What is the reason?

lucc · 2026-03-26T06:58:20Z

khard/csv.py

+
+
+class Parser:
+    """An iterator over rows in a CSV file that returns contact data."""


I do not see any contact data related code in here. I would say this parser can parse csv with nested fields.

lucc · 2026-03-26T06:59:08Z

khard/csv.py

+    """An iterator over rows in a CSV file that returns contact data."""
+
+    def __init__(self, input_from_stdin_or_file: str, delimiter: str) -> None:
+        """Parse first row to determine structure of contact data.


-contact data +nested field structure

lucc · 2026-03-26T07:01:56Z

khard/csv.py

+        first_row = next(self.reader)
+        self.template, self.columns = self._parse_headers(first_row)
+
+    def __iter__(self) -> Iterator[dict]:


Can we constraint the dicts in the type signatures more? I think all keys will be strings. And the values might be a union of string, list and dict. When you have to repeat this complex type a lot you ca add an alias at the top of this file or to helpers/typing.py. Compare the aliases there.

lucc · 2026-03-26T07:06:36Z

khard/csv.py

+                if subkey:
+                    template[key].setdefault(idx, {})
+                    template[key][idx].update({subkey: None})
+                    columns.append(


Can you put these on one line? The line breaks seem unnecessary.

lucc · 2026-04-01T20:10:07Z

khard/csv.py

+
+        return template, columns
+
+    def _get_data(self, row: list[str]) -> None:


It seems to me that this function stores its "output" on self via the local variable data_structure. The comment also tries to explain this somehow.

It seems to me that this data is only read by _process_data and the function _get_data and _process_data are only called right after each other in parse(). Can this function be refactored to return the data it would store on self.template and then the instance variable can be removed?

lucc · 2026-04-01T20:16:26Z

khard/csv.py

+
+        :param first_row: First row of the CSV file, which must contain column
+            headers.
+        :returns: The "template" dict and the "columns" list. The structure of


I am loosing track as soon as I try to understand what this should look like and what it should do. Maybe a very simple example would be good, or a simple test just for this function. The example in the doc comment above is not for the return value of this function, or is it?

lucc · 2026-04-01T20:23:55Z

test/fixture/csv/batman.yaml

@@ -0,0 +1,161 @@
+# Contact template for khard version 0.1.dev1192+g40c9de648


The test file does not need all these comments. I think you can remove them.

lucc · 2026-04-01T20:31:29Z

test/test_csv.py

+
+class TestCSVParser(unittest.TestCase):
+    """Tests the csv module and khard.contacts.Contact.from_dict()."""
+    def test_yaml_and_csv_produce_equivalent_contacts(self):


This is one big test that probably tests all code in the csv parser. There are two problems with this test:

It tests everything in one test case, so if any part of the code changes the whole test fails and the developers have little indication why.

I have to open several other files (the test fixtures) to see what is happening here.
Can you please add some small tests for the parser hat each parse a very small csv data (no files needed) and assert that the dict that is returned is the right one.

E.g:

def test_foo(self): csv = "foo,bar,baz\nx,y,z" expected = [{"foo":"x","bar":"y","baz":"z"}] actual = list(Parser(csv, ",")) self.assertEqual(actual, expected)

Like this you could tests many corner cases of the parser and the tests would be easy to read and understand.

lucc added the enhancement label Feb 3, 2026

rhelder force-pushed the import branch from af19b09 to 227d693 Compare February 7, 2026 00:07

rhelder force-pushed the import branch from 227d693 to 6744b3a Compare February 8, 2026 03:28

rhelder force-pushed the import branch from 6744b3a to 1d799bc Compare February 21, 2026 00:19

rhelder added 6 commits February 20, 2026 16:25

Allow creating contacts from dict, not just YAML

5225f1e

This will make it possible to create contacts from data formats other than YAML without rendering the data as a YAML string.

Initial commit of CSV parsing module

cee7a84

This module reads CSV files and returns data that can be read by the 'khard.contacts' module. The module will be used to create new contacts from CSV.

Add function to create contacts by editing CSV template

1dabd11

This is not particularly useful on its own, but it will be used as a fallback when importing contacts from CSV, in the event that the user doesn't supply any input (i.e., like 'khard new' does).

Import contacts from CSV with khard new --format csv

3fc56ae

Initial commit of unit test for CSV parsing submodule

043a476

Check whether or not YAML files and CSV files containing the same data produce equivalent Contact objects. Make one of the CSV files 'jumbled' to show that column order doesn't matter to getting the right result.

rhelder force-pushed the import branch from 1d799bc to 043a476 Compare February 21, 2026 00:27

lucc requested changes Apr 1, 2026

View reviewed changes



		class Parser:
		"""An iterator over rows in a CSV file that returns contact data."""


		return template, columns

		def _get_data(self, row: list[str]) -> None:

		@@ -0,0 +1,161 @@
		# Contact template for khard version 0.1.dev1192+g40c9de648

Conversation

rhelder commented Jan 20, 2026

CLI

Implementation

Tests

To-do

Uh oh!

lucc commented Feb 2, 2026

CLI

Implementation

Code

Uh oh!

lucc commented Feb 4, 2026

Uh oh!

rhelder commented Feb 7, 2026

Uh oh!

rhelder commented Feb 8, 2026

Uh oh!

rhelder commented Feb 21, 2026

Uh oh!

lucc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants