Tagging System Documentation
Overview
The godocs tagging system provides comprehensive document organization through:
- Free-form tags - Flexible categorization
- Dimensions - Structured metadata with predefined values
- JSON sidecar files - Portable tag storage alongside documents
Quick Start
1. Apply Migration
The tagging system requires database migration 000005:
# Migration will be applied automatically on next startup
# Or manually trigger it through your migration process
2. Create a Sample Document with Tags
Create a file example.tags.json alongside your document:
{
"tags": ["invoice", "personal", "important"],
"dimensions": {
"person": "husband",
"location": "home",
"year": "2024",
"importance": "high",
"retention": "keep_7_years"
}
}
3. Ingest the Document
Place both example.pdf and example.tags.json in the ingress folder. The tags will be automatically imported.
Predefined Dimensions
Person
husband,wife,child1,child2,child3,family,business,other
Location
home,office,bank,medical,legal,insurance,tax,education,other
Importance
low,medium,high,critical
Retention (for future archival)
temporary- < 1 yearkeep_1_year- Keep for 1 yearkeep_3_years- Keep for 3 yearskeep_7_years- Keep for 7 years (tax records)keep_10_years- Keep for 10 yearskeep_permanent- Keep forever (birth certificates, etc.)
Year
- Values can be any year string (e.g., "2024", "2023")
- Used for archival organization
API Endpoints
Tags
Get all tags:
GET /api/tags
Create tag:
POST /api/tags
Body: {"name": "invoice", "color": "#3498db", "description": "Invoice documents"}
Update tag:
PUT /api/tags/{id}
Body: {"name": "invoice", "color": "#e74c3c"}
Delete tag:
DELETE /api/tags/{id}
Document Tags
Get document tags:
GET /api/documents/{ulid}/tags
Add tag to document:
POST /api/documents/{ulid}/tags
Body: {"tag_id": 1}
Remove tag from document:
DELETE /api/documents/{ulid}/tags/{tagId}
Dimensions
Get all dimensions with values:
GET /api/dimensions
Get document dimensions:
GET /api/documents/{ulid}/dimensions
Set document dimension:
POST /api/documents/{ulid}/dimensions
Body: {"dimension_name": "person", "value": "husband"}
Remove document dimension:
DELETE /api/documents/{ulid}/dimensions/{dimensionName}
JSON Sidecar File Format
Sidecar files are named {document_name}.tags.json:
Example: invoice-2024.pdf → invoice-2024.tags.json
{
"tags": [
"invoice",
"utilities",
"electricity"
],
"dimensions": {
"person": "family",
"location": "home",
"year": "2024",
"importance": "medium",
"retention": "keep_7_years"
}
}
Automatic Sidecar Updates
- During ingestion: Tags from sidecar files are imported to database
- After API changes: Any tag/dimension changes via API automatically update the sidecar file
- Portability: Sidecar files travel with documents for backup/migration
Usage Examples
Example 1: Medical Document
File: medical-record.pdf
Sidecar: medical-record.tags.json
{
"tags": ["medical", "test-results", "annual-checkup"],
"dimensions": {
"person": "wife",
"location": "medical",
"year": "2024",
"importance": "high",
"retention": "keep_permanent"
}
}
Example 2: Tax Document
File: tax-return-2024.pdf
Sidecar: tax-return-2024.tags.json
{
"tags": ["tax", "irs", "federal"],
"dimensions": {
"person": "family",
"location": "tax",
"year": "2024",
"importance": "critical",
"retention": "keep_7_years"
}
}
Example 3: Child's School Document
File: report-card.pdf
Sidecar: report-card.tags.json
{
"tags": ["school", "grades", "semester1"],
"dimensions": {
"person": "child1",
"location": "education",
"year": "2024",
"importance": "medium",
"retention": "keep_3_years"
}
}
Testing the System
1. Test via API
# Get all dimensions
curl http://localhost:8000/api/dimensions
# Get all tags
curl http://localhost:8000/api/tags
# Create a new tag
curl -X POST http://localhost:8000/api/tags \
-H "Content-Type: application/json" \
-d '{"name": "test-tag", "color": "#ff0000"}'
2. Test via Ingestion
# Create test document and sidecar
cd /path/to/ingress
# Create a simple text document
echo "Test document" > test.txt
# Create tags sidecar
cat > test.tags.json << 'EOF'
{
"tags": ["test", "sample"],
"dimensions": {
"person": "husband",
"importance": "low",
"retention": "temporary"
}
}
EOF
# Trigger ingestion
curl -X POST http://localhost:8000/api/ingest
3. Verify Tags Were Applied
# Get document ULID from the response or database
# Then query its tags
curl http://localhost:8000/api/documents/{ulid}/tags
curl http://localhost:8000/api/documents/{ulid}/dimensions
Future Enhancements
The tagging system is designed to support:
- Search by tags - Find documents by tag combinations
- Filter by dimensions - Filter by person, location, importance, etc.
- Auto-archival - Automatic cleanup based on retention periods
- Tag suggestions - ML-based tag suggestions during upload
- Custom dimensions - User-defined dimension types
- Tag hierarchies - Parent/child tag relationships
Database Schema
See 000005_add_tagging_system.up.sql for complete schema.
Key tables:
tags- Tag definitionsdocument_tags- Document-tag associationsdimensions- Dimension type definitionsdimension_values- Allowed values per dimensiondocument_dimensions- Document dimension assignments
Notes
- Tags are case-sensitive
- Dimension values must match predefined values exactly
- Sidecar files use UTF-8 encoding
- Invalid tags/dimensions in sidecar files are logged as warnings but don't fail ingestion
- All API operations automatically sync with sidecar files