Back to Main Category

Language Corpus and Language Politics: The Case of the Standardization of Romani

Ian Hancock1


The debate over the standardization of Romani is above all a political one.  It is the language of a people who, lacking a national, governmental structure and economy, are still very much dependent upon the non-Romani world for those things necessary to participate in the global community.  This means that issues of language planning (among other things) have been and continue to be dealt with mainly by non-Roma specialists—an anomaly especially significant for a people desiring to be in control of its own affairs.  And even if the necessary means were available the problem would still exist, because the Romani language is very unevenly represented, in terms of variety of its dialects, in terms of the numbers of speakers of those dialects, and in terms of what the governments of the countries in which those dialects are spoken are prepared to offer by way of support.  The standardization of the Romani language is, more than for most other languages in a similar position, in the hands of outsiders.

In the first essay ever to address the question, Bernard Gilliat-Smith maintained that “Basic [i.e. common standard] Romani is in my opinion theoretically possible, but in the present state of development of the Gypsies of Europe it stands but little chance of being accepted, or even generally understood, by those for whom it is primarily intended” (1960: 34).  More recently, Donald Kenrick has observed that “with no standard written language, and between fifty and a hundred dialects, Romani dialects are not mutually comprehensible except at very basic levels” (2000: 7).  Together, these statements—the earliest and the most recent—pinpoint two fundamental problems facing the creation of a standardized language: the diversity which exists among Romani dialects2, and the general acceptance of such a formalized variety should it ever become established. They also underscore the fact that the creation of a general standard has not been successfully achieved in the forty years which separate them.

In a monograph which appeared in 1975, I examined factors bearing upon

. . . the problems attending the standardization of [Romani], especially orthographic standardization.  This of necessity takes into consideration several related issues: the prevailing non-Roma attitudes towards Roma and Romani, and the consequent effects upon the attitudes of the speakers themselves towards their language.  It is also necessary to decide whether, because of the diversity of dialects a composite union variety should be created, or just one existing dialect selected for the international standard.  A problem also exists for Romani groups no longer speaking Romani per se, but restructured forms of the language (as do, for example, sections of the Romani populations in The United States, Britain, Spain, Finland, etc.), and for whom Romani morphology and syntax are quite foreign.  Hinging upon these considerations is the question of literacy, and of Romani attitudes towards it, and to ‘formal’ education generally (1975:8-9).

The issues listed there remain the same a quarter-century later, and are examined in new detail in the present paper. They are 1. Attitudes towards Romani identity and language, both among Roma and non-Roma, 2. Competence and performance, and 3. the nature of the proposed standard dialect.

1. Attitudes
In order to feel good about one’s language, one must feel good about oneself.  The negative attitudes that Roma encounter in the outside world do nothing to inspire a good sense of self; indeed, antigypsyism is all too often internalized and the resulting anger and frustration can in extreme cases manifest themselves in destructive behaviour within the home.  It can make parents instil a sense of ethnic shame in their children, and tell them not to reveal their identity as Roma to the outside world; it is often the case that they will even withhold the ethnic language from them, believing it to be a liability rather than an asset. The same children can readily provide examples of antigypsyism they have encountered outside the home, particularly at school, either overtly from teachers and classmates, or unintentionally in the reading materials provided, which almost uniformly present “gypsies” in a negative light (Hancock 1988a; 1998).   In schools, multicultural curricula which incorporate sections of Romani history and culture and Romani contributions to society are only now becoming a reality, though still in disappointingly few places; it is still much more commonly the case that Roma children are not even in the same classes as their non-Roma schoolmates, but are kept in separate groups for the “disadvantaged.”  In Bulgaria, “the Gypsy schools were officially dubbed ‘schools for children with inferior lifestyle and culture’” (Tagliabue, 2001:A3).  A Czech study “found that a Gypsy child was 23 times more likely to be placed in a school for the mentally retarded than a white Czech child, even when of normal intelligence” (Ledgard, 2001:29).

Ilona Lacková (1999: 64) writes of being ridiculed by the non-Roma in Slovakia for speaking her own language: 

I wrote in Slovak, and it never occurred to me that you could write in Romani as well.  The peasants made fun of us for speaking cant, gibberish, Pharaoh talk; Romani had no place in school, I had never seen a printed word in Romani in my life, and I thought that Slovak was the only language appropriate for literature (1999: 64).

Academics, too, diminish the worth of Romani, most often claiming that it cannot express abstract or philosophical notions but only the most basic of ideas.  Linguist Jules Bloch in his book on Roma, referred to the language as merely an “argot” (1969: 113), while another linguist, Paul Wexler, believes it to be a composite of the linguistic registers of various marginalised and disenfranchised populations, and furthermore that “Romani is not of Indic origin and did not acquire its Asian component by direct contact with, or by inheritance from, Indic languages”(1997: 16).  Sociologist Judith Okely, whom Wexler cites, had already supported a similar position some years earlier:

The Sanskrit linguistic link may also have been over-constructed . . . The [Roma’s] Indian connection has been used as what Malinowski calls a ‘mythical charter’ to give cultural respectability (1990: 7-8).

The prevalent notion that Romani lacks the refinements which characterize more civilized languages, and which therefore reflect the same characteristics in its speakers, is found in Isabel Fonseca’s widely influential book Bury Me Standing, where she says that it

. . . has a small basic vocabulary . . . a store of mostly ‘domestic’ words, those relating to home and hearth and mostly of Indian origin . . . more pervasive is the spirit of the language, or that which it seems especially well suited to express—hyperbolic, gregarious, typically expressive of extreme emotion . . . With the simple addition of the ancient Indic suffix pen, like ‘hood’ or ‘ness,’ one can create abstract nouns, such as Romipen, ‘Gypsiness’ . . . But among Romani speakers, these big-concept, encompassing words are not much needed (1995: 56-58; emphasis added).

To prove the “inadequacy” of Romani in her book, Fonseca used a portion of Shakespeare’s Romeo and Juliet, which had previously been translated into Romani by a non-Roma lacking native fluency in the language, and then had that same person translate it literally back into English (op. cit., 56-57), in order to compare it with the original and demonstrate its shortcomings.  One would be hard put to find a contemporary British or American author writing in English who had the skills to match William Shakespeare’s, let alone one writing in Romani. She would certainly have been able to provide a better translation had she asked native-speaking poets such as Rajko Djurić or the now deceased Leksa Manush to provide one—writers known for the beauty and power of their Romani verse—and for whom “Gypsiness” would have certainly been a “big-concept.”

The Swedish Lutheran minister Christfrid Ganander went one better, believing that Romani by itself was so inadequate a means of communication, that it required supplementing with hand gestures before it could be understood.  In words reminiscent of those of Isabel Fonseca, who calls the language a “highly aspirated, raucously gutteral vernacular” (op. cit., 58), Ganander wrote in 1780 that the Roma’s mouth and lips were

. . . big, wide and thick, convenient for the pronunciation of their language, which is rather aspirated and full of ‘schz’ or ‘Sclawonska’ words, calling for strong aspiration and a lot of spittle before they can be pronounced.  Their pronunciation or sounds and voices are peculiar, loud, sharp, rough and harsh, and also demand twitches of the body and gestures with the hands, before they can be articulated.3

Contemporary prejudice at the popular level, exemplified by Ilona Lacková’s real-life experience and Isabel Fonseca’s naive journalistic observations, originates in long-standing stereotypes about Romani language and identity which gain additional support at the academic, and hence the administrative, level.  The works of Okely and Fonseca are very frequently listed as sources of information in governmental and other official documentation dealing with Roma—reports which sometimes lead on to policy decisions affecting them.

A fundamental requirement for instituting a positive attitude towards Romani amongst its speakers, and thereby a positive native-speaker attitude towards the possibility of creating a written literary standard and the literature it would support, must be an overall increased awareness of Romani identity and the Romani experience.

2. Competence and Performance
Fluency in Romani varies greatly from place to place, and the capabilities of the dialect likewise vary greatly, reflecting the vitality with which the mother tongue has been preserved in the community.  In parts of the Balkans, for example, it survives with vigour and some of the dialects spoken there are rich in lexical and grammatical resources.  But in other communities it has lost so much of its vocabulary and grammar that it has become a register of the surrounding non-Romani language.

Romani speakers are aware of which dialects are rich and which are poor, and almost universally favour their own over any other:

Most Gypsies look down on speakers of dialects other than their own, and their prejudices are often taken over vigorously by any gaujos [non-Roma] who learn one dialect.  At Epsom in 1970 I heard the visiting speaker of an East European dialect attack all British Gypsies for letting Romani fall into disuse.  Even the best ‘Welsh’ Romani speakers, he assured me, though they might be able to take a fish out of the river in Romani, couldn’t use it to take an engine out of a motor[-car] (Acton 1974: 55).

Interdialectal bias is one manifestation of the discrimination that exists among different Roma populations, where individual groups generally regard their own members as “real” Romanies and all others as “less real.”  This has an historical basis, and is reflected in the different self-ascriptions, thus a Mačvano Vlax would never call himself a Sinto or a Romanichal, and vice versa; nor would he call a Sinto or a Romanichal a Rom; greater commonality is evident among Roma groups saying what they are not—a Sinto may not be a Mačvano, but neither is he regarded by other Romani groups as a gadžo, a non-Romani. 

This multiplicity of names and self-perceptions underlies the problems which journalists and others encounter in selecting an all-encompassing name to cover all peoples of Romani descent.  While Rom (plural Roma) is being increasingly used by Romani activists, and Roma as both a singular and a plural noun and even as an adjective by those writing in English, the label is by no means everywhere favoured, and use of “gypsy” or “Gypsy” remains common, despite growing objection to it. Still, a trend is in place; in November 2000, the US Government declared that it was abandoning “gypsy” as a Library of Congress subject heading because it was “viewed as offensive by some Romani people,” and has replaced it with Romani (plural Romanies); other exonyms such as Zigeuner or Cikan or Yiftos have likewise been declared politically incorrect in different European countries, and are being replaced in print by Roma or Romi.

It is here that education and sense of self come together in the context of standard language.  Most Roma neither know nor care much about the communities outside their own, let alone know about the distant origins in India or what Indian legacy survives in daily life and language. While the first Roma to arrive in Europe were able to say that they had come from India, that history has become lost over time, and now has to be learnt anew.  Fearing what this regained knowledge might mean politically, some administrators, scholars and journalists have sought to dismiss it as either being false history (cf. Wexler) or over-glorified history (cf. Okely); still others argue that, while the roots may be in India, a lot has happened in the millennium since leaving that part of the world, and Roma are permanent inhabitants of the West now, and much mixed with Europeans, so the point is a moot one and not worth pursuing.  But what option does this leave?

a. Integration vs. Assimilation
A people with a strong sense of identity and worth may desire to remain separate, or they may want to integrate into a larger society, becoming a part of it yet retaining its distinctiveness.  A people with a poor sense of identity and worth on the other hand, may want to abandon its identity altogether, and assimilate completely into another society, becoming one with it or, distressingly, taking on a completely new identity, like the “Egjiptani” in Macedonia and Kosovo.  In some countries, ethnic minorities are encouraged to participate in the larger society, while at the same time being allowed to foster their individuality; in others, efforts are made to eradicate ethnic minorities through policies which might, for example, forbid use of the mother tongue, traditional dress or ethnic family name.  Maria Teresa’s and Archduke Joszef’s efforts in Hungary over two centuries ago did irreparable harm to the Romani population there, for which their descendants are still paying.  If a minority population remains distinctive because of a combination of factors such as complexion, clothing, occupation and the area in which it lives, it continues to be visible and different, even though its language and culture and name have been taken from it.  The tragedy is that while such deracinated populations are still discriminated against by the greater population despite forcible efforts to assimilate them, they now also lack the linguistic and cultural wherewithal to enable them fully to function in their own original community.  They are caught between two worlds, not fully a part of either.

A population can only be a part of a group if that group wants to let it in and, for Romanies, this has so far not been much in evidence in Europe.  While a majority of Roma would like to live in the same neighbourhoods as the non-Roma and share with them the same educational, employment and health-care facilities, poll after public opinion poll makes it clear that most non-Roma do not want this.  Integration is a much-desired goal, but nowhere in Europe is it likely to happen soon, leaving a vast Roma population—numbering in the millions—marginalised and having to find a way to explain to their children why things are the way they are, and why people hate Roma so much.  This is a very difficult and painful thing to have to explain to a child.

Hopelessness and despair destroy the will to strive for better things, and to take pride in one’s home or appearance.  They destroy one’s very sense of worth.  As long as families in Kali Oropa—Romani Europe, where the language is most widely spoken and the Romani population the densest—must deal on a daily basis with problems of racism, unemployment, housing and health-care, then abstract issues such as language standardization and the details of the Asian origins rank absolutely nowhere on their list of priorities.  Interest in language comes with improved schooling and social conditions, and both can only come with improved civil and social rights (Hancock 1992)—but these must originate with the non-Romani governments in whose lands Romanies live.  Romani input into governmental programmes is essential, but remains under-represented because of the lack of enough Roma sufficiently qualified to participate, and because of a reluctance driven by prejudice to employ and promote even those Roma who are qualified.  And so the wheel of under-achievement and under-representation and under-empowerment continues to turn in a self-perpetuating cycle.

b. Moving Ahead
Knowing who Roma are and where they came from is fundamental to knowing where the population is going and what its place is in the global family. Until now, Romani history has not been made a part of the educational system anywhere in the world, and while European nations glorify their own histories, popular western culture perceives “gypsies” to be a people without a history, whose church was made of cheese, and who steal (or wander) because they stole (or made) Christ’s nails.   The widespread insistence on spelling “Gypsy” in English without a proper noun’s capital initial letter also denies us some humanity: as Kaplan and Bernays say, “it’s interesting how much weight a large initial letter carries. A noun or adjective is a frog until you give it a capital first letter, at which point it becomes a prince, that is, a proper name (1997: 71).

It is not fanciful to promote the Indian origins of the Romani people, which contemporary genetic studies (most recently Bernasovský and Bernasovská 1999 and Gresham et al., 2001) confirm; indeed, the positive aspects of gaining a global identity are immeasurable.  The Human Genome Project team at the Center for Human Genetics in Perth, after comparing genetic material from large numbers of both Roma and Indian groups, concluded that

The Roma are genetically closer to Asians than to surrounding Europeans.  This conclusion can hardly be described as exciting news; it has taken genetics 70 years and several thousand blood samples to confirm what has been known to linguists for the last 200 years (Kalaydjieva et al., 1999: 13).

Mastana and Pahipa’s serological research determined that “gypsy populations of eastern Europe still have greater genetic affinity with Indian nomadic groups” than with the white population (1992: 50), while Siváková found that “the lowest genetic distance value” was between Roma and Indians, “suggesting a relatively low degree of genetic assimilation of Gypsies with their surrounding [European] populations” (1983: 98).  Bhalla found that “[t]he Rajputs occupy the position nearest the gypsies . . . the gene pool of East European gypsies is markedly different from that of the surrounding non-gypsy population [while . . .] measures of divergence reveal least distance between East European gypsies and the stock of people in India represented by the Jat-Sikh-Punjabi Hindu-Rajput complex” (1992: 331-332).

If a permanent module on Roma origins, history and culture were to become a part of the educational offerings throughout central and eastern Europe and elsewhere, both Roma and non-Roma would gain a better sense of Romani ethnic identity.  If it were demonstrated that all populations of Romani descent speak (or spoke) varieties of the same original language, and retain more or less of a common culture, and if it were shown that where Romani groups throughout Europe differ from each other it is because of influences from outside, both Roma and non-Roma would gain a better sense of Romani historical unity and legitimacy.  And if it were taught that Roma left India as warriors in a unified group (Hancock et al. 1998; Hancock 2000) and arrived in Europe in the same way (Marushiakova and Popov 2001), and subsequently fought in different national armies, and introduced new musical styles and instruments, and produced such notable individuals as Charlie Chaplin, Sonja Kovalevskaja, Matéo Maximoff, Papuša (Bronisława Wajs), Carmen Amaya, Philomena Franz, Cinka Panna, Carlos Montoya, Django Reinhardt, Bob Hoskins, Ceferino Giménez Malla, Birelli Lagrene, Manitas de Plata, Miroslav Holomek, Rita Hayworth and her grandfather Antonio Cansino, Lafcadio Hearne, Vita Sackville-West and others, and maintain a language and culture which are remarkably intact despite slavery, despite deportations and pogroms, despite the Holocaust, despite repeated attempts to eradicate us as a people, then a sense of pride, and even wonder, must be engendered within the Romani population, and perhaps the seeds of respect and understanding planted among the gadžé.  But until this begins to happen, questions of language standardization will not receive much attention from the overwhelming majority of Roma throughout the world.

c. Objections to standardization
We can learn a great deal by comparing the Romani situation with the situations of other speech communities which have had to tackle the problems of standardization. A common charge, for example, is that any standardized dialect is in one sense the property of a small elite, and serves only to distance those without access to it from the spheres of influence even further.  This has been proven to be so in many instances, but if it is true in the Romani case, then the same must be said for competence in the non-Romani national tongue.  Roma with a good command of Czech, say, or Bulgarian or Hungarian are at a distinct advantage over those lacking such fluency.  And whether or not the charge of elitism is true, the fact remains that on practical grounds alone, a written standard is necessary and has been for all emerging nations. 

There are those who say that creating a standardized dialect would rob Romani of its spontaneity and “soul,” and even that it should not be written at all in any form.  This is a subjective argument, and one already lost, given the abundance of publications now being produced in different Romani dialects.  The expense of producing the same materials in many dialects would be considerable, and creating literacy programmes for each of them would be just as much a problem. And instead of having to expand the lexicon for just one new standard dialect, the other dialects would also have to address this need, not necessarily creating the same new words for, say, “hard drive” or “digital” or “phylum,” thus the problems of miscommunication would simply be perpetuated. 

d. Which dialect to learn?

In countries where Romani has disappeared, a dilemma faces those who seek to bring about its revival.  In England, for example, where a Romani-influenced register of English is now the ethnolect, the moribund British Romani described by Sampson would be the most obvious choice for reintroduction on both emotional and historical grounds.  In Spain, it is Caló, a Para-Romani variety of Spanish which has become the ethnolect.  Here, lacking a complete description of any earlier inflected Iberian Romani, a dialect has been artificially reconstructed piecemeal, not unlike Cornish in Cornwall, and its propagation has already met with some success; in 2000 the European Commission held workshops in Perpignan, Lisbon and Valladolid entitled Recuperación del Romano-Kaló: Un Idioma Gitana Universal.  But the question arises: if a Romani population is to learn its lost ethnic language for the first time, should this in fact be the ancestral dialect (or, as in Spain, a newly-created dialect) limited to the area in which it was once spoken or, since the language is being approached for the first time anyway, should a variety be learnt which will allow much wider communication, such as Kalderash Vlax, or a yet-to-be constructed standard?  Ideally perhaps both the lost ancestral dialect and a new common dialect should be taught, for while practicality argues against it, even most natural dialects still require orthographies.  One of the first things acknowledged in the grant proposal drawn up for a Roma school in the town of Tacoma, Washington, in the United States, was that “a strong basic linguistic research component . . . will lead to the creation of an alphabet and grammar for Romani, as the first step in the creation of a bilingual education package” (Anon. 1975: 5; Hancock 1998).  Already-existing local dialects are not after all meant to be replaced by a new standardized Romani, merely supplemented with it.

3. Nature of the Standard

There are two choices in the selection of a standardized Romani: to elaborate an existing natural dialect and bring it into general use, or else to create a completely new, levelled (or koïnéized)3dialect out of several. The answer would seem to be a combination of each: Roma linguist Vania de Gila Kochanowski (1995) believes that a standardized variety, which he has referred to (1983) as Khetani Romani, should be based on his own Baltic dialect, which belongs to the Northern group, arguing that it retains more of the original grammar and lexicon, unlike dialects in the Central, Vlax and Balkan groups, which have been more heavily influenced in all areas by the surrounding European languages. He further believes Romani to have between fifteen and twenty million speakers (1971: 76), and he has advocated its adoption as an international auxiliary language by the world community because it is not associated with any existing government. The Occitan linguist Marcel Cortiade’s proposed standardized dialect—the only other real attempt besides Kochanowski’s and a far more extensive one—is more eclectic, but is based especially on Balkan and Vlax models.

Determining what group of dialects should constitute the basis of a new standard is the initial stage of language planning according to Einar Haugen’s approach (1973: 3), and for national, territorial languages this rests primarily upon political and social considerations.  In the case of Romani, however, these are outweighed by linguistic arguments.

Romani dialects all share a high proportion of common grammar and lexicon; where they differ from each other it is because of both internal and external factors.  In a number of cases, these have been so far-reaching as to produce completely restructured and rephonologized languages, the so-called Para-Romani varieties (Hancock 1975a; Boretzky and Igla 1994; Bakker 1995 and 1998; Bakker and Cortiade 1991).  Since these are typologically no longer Romani per se, they do not bear upon the present discussion, except where they might provide retrievable lexical material. 

External factors are the result of interference from the various non-Romani languages with which all Romani speakers are in ongoing (and usually overwhelming) contact.  These include modification of native phonology, such as the loss of the phonemic aspirate/non-aspirate contrast4 in the speech of some speakers of French and Italian Sinti, or the neutralization of voiced and voiceless stops5 in Finnish Romani, as well as extensive syntactic and idiomatic calquing6.  It is this latter which Gilliat-Smith focussed upon in his 1960 article, and which he believed to be the greatest obstacle, along with Romanies’ “present state of development” as a people, for the success of a standardized dialect.

Sometimes, in the absence of information on the pre-European character of Romani, external factors are not always easily identifiable: did, for instance, the definite articles emerge during the tradipe, the period between India and Europe, being traceable to Old Indo-Aryan pronominal forms?  Or are they later accretions from Mediaeval Greek?  Does the fact that they are absent in Baltic Romani reflect an earlier, article-less Romani, or have they disappeared during the European period under influence from Slavic and Baltic languages?

Many Romani nouns have extended meanings that may or may not be inherited.   We know that the Romani word for “ lungs” is simply a translation of the southern Slavic “white liver” (parno buko in Romani); but we cannot be so sure about e.g. mačho “fish” which also means “calf (of the leg)” or “bicep;” does the word for “fish” also mean these things in any of the coexisting European languages? Does the word for “walnut” also mean “stye” anywhere else, as akhor does in Romani?

Internal factors are those that have occurred within Romani, distinguishing it from other Indian languages but not resulting from identifiable external influences.  They also occur within individual Romani dialects, distinguishing one from the other.  In Romani, besh- is the verb meaning “reside,” as well as its primary meaning of “sit;” But in no Indian language does its related forms share the secondary meaning.  Sometimes changes in Romani are matched by distinctions in India itself, thus the /s/ ~ /h/ (“Sind-Hind”) split, exemplified by Sinti har, hom, hi, for sar, som, si “how,” “am,” “is” in other non-Sinti dialects, or the collapse of /s/ and /š/ to /s/ in some varieties of Vlax (manus, saj, for manuš, šaj “man,” “able”), a widespread and low-prestige alternation also found in languages such as Kumauni and Panjabi.

a. Orthography

Haugen’s second stage he calls codification, which he defines as “the activity of preparing a normative orthography, grammar and dictionary for the guidance of writers and speakers in a non-homogeneous speech community” (1973).  Just one concerted effort has been made to create a standard orthography, and this was the system introduced at the Fourth World Romani Congress held in Serock, Poland in 1990, and which is summarized in Hancock (1995).  Created mostly by Cortiade, its implementation won a majority vote with the proviso that its success or failure would be evaluated at the end of ten years.  While a number of publications have appeared in this orthography, including several children’s books, a newsletter, a computer programme and some technical linguistic material, so far it has not been generally successful.  Members of the Romani community who correspond with each other regularly in the language, including the past and present presidents of the Roma National Congress and the International Romani Union, have simply not availed themselves of it.  Its weaknesses have been most directly addressed by Kochanowski (1995).

Does this reflect a general lukewarm attitude towards orthographic conformity, or is it this specific spelling system which has failed?  Perhaps both; there is no denying that different dialect speakers, either as a group or as individuals, can have strong feelings about formalization, particularly if it is based upon the speech of an outside group.  In the Romani-language preface to a report prepared recently by one human rights organization, the grapheme /rr/, representing a particular phoneme in Romani and one of the orthographic recommendations of the 1990 commission, was used in the preliminary version submitted to the publisher.  It was removed from the final version on a whim, simply because the editor “didn’t like it.”  The phonemic distinction, in that case originally represented by /r/ and /rr/, was not incorporated into the published text.

This particular distinction in fact serves very well as a test case.  Some dialects distinguish between two /r/ phonemes, usually a dental7 [] and a uvular8[“], which have developed historically from different underlying sounds in Old Indo-Aryan. Many dialects have collapsed these to a single, undifferentiated phoneme, but the two sounds nevertheless reflect two separate origins in Old Indo-Aryan and continue to be distinguished in some Romani dialects today—including Kalderash Vlax, the dialect with more speakers than any other. Thus in that dialect [t•]im]s] and [t•]“im]s], or [aj] and [“aj], mean quite different things (“theft,” “poverty;” “gentleman,” “twig”).  The distinction may not always be specifically dental vs. uvular, but this historical distinction is nevertheless maintained in many dialects, and should therefore be acknowledged orthographically.

How it is represented is another matter. Cortiade recommended /rr/, no doubt because the same graphemic distinction (/r/ and /rr/) exists in Albanian, which he has to some extent been influenced by.  Spanish is another language which has it, and it seems to have been the only orthographic recommendation to have gained a measure of wider acceptance, perhaps because it requires no special font or diacritic.  Cortiade’s other symbols include, inter alia, /θ/, /ç/, /ǎ/ and /¿/, not generally present on most typewriters, and while they are accessible easily enough on a modern computer, most Roma own neither a computer nor a typewriter.

Since 1990 there has been a proliferation of Romani-language periodicals from all over Europe.  A very useful overview of these has been provided by Dragoljub Acković  (1997), who reproduces sample pages of several of them.  Here, even a cursory examination reveals that there is still a tendency to use orthographies which draw upon the conventions of the coexisting non-Romani language, but where most do employ a common diacritic, it is the wedge accent or haček (called a  čhiriklorro in Romani).  This is used with <c>, <s> and <z> to represent [t•], [•] and [¥], thus (<č>, <š> and <ž>).  In some systems it is even placed over /r/ to represent [“], thus <ř>. The first three graphemes are found in Serbian, Slovenian, Czech and Slovak, while <ř> is found in Czech, though with a different value9.

Aspiration in Romani is most commonly represented in the samples by <h>, thus <čh>, <kh>, <ph>, <th>, but sometimes by an <x> or an apostrophe.  Palatalization10 is most commonly represented by a <j>, though publications in Czech Romani inconsistently use either an apostrophe, or a wedge accent with /n/: <d’>, <l’>, <ň>, following Czech convention.  Taking a consensus from majority use, a workable orthography would employ the wedge accent, represent aspirated stops with an <h>, and palatalization with a <j>, and not represent differently allophonic variants such as the centralized vowels which reflect, for example, interference from Romanian in the Vlax dialects.  However, since the use of electronic mail to communicate in Romani has increased, even the use of one diacritic has given way to using no accents at all, English graphemes taking their place.  Thus <č>, <š> and <ž> are represented by <ch>, <sh> and <zh>, the letter <h> in these positions not conflicting with its occurrence in the aspirated stops and elsewhere.  While this spelling is not ideal from a narrowly phonetic perspective, for example not distinguishing between the retroflex vs. non-retroflex11 or the palatalized vs. affricated12 sounds in some Vlax dialects, it is functional, and has already been adopted in some Romani-language publications in Hungary. Constructed, non-Latin alphabets for Romani, such as the one devised by Andrzej Mirga on an Indian model, or actual Devanagari (Indian)-based systems, which have occasionally been used, are of academic interest only, but do reflect an awareness of Romani’s historical Asian connection.

b. Lexization

Haugen calls the development of a standardized lexicon lexization.  Both Kochanowski and Cortiade agree that this should incorporate new items by drawing upon Hindi, which both have done in their respective work (e.g. Cortiade’s lekh- and pustik for “write” and “book”).  Kochanowski justifies this because, he says, “the basic vocabulary of Romani and Hindi-Rajasthani is 60% the same” (1971: 76-77), and he supplies a substantial list of such recommendations in his Parlons Tsigane (1994).  Petrovski and Veličkovski have similarly introduced some Hindi words in their Macedonian-Romani dictionary, and Šaip Jusuf and George Sărau have both done the same in their respective Romani handbooks, though in each case without identifying them as being lexical adoptions rather than legitimate retentions.  This is disconcerting for those working on natural dialects, and sometimes proves to be incorrect; Kochanowski, for example, lists recommendations for

les mots sanskrits déjà introduit en hindi” (“the Sanskrit words already introduced in Hindi”) to be brought into his own standardized dialect, but includes (1994: 191) the new Romani feminine noun almar, “cupboard” from Hindi  ālmārī .  Hindi, however, adopted this from Portuguese “almario” some five centuries after the teljaripe—the time when the ancestors of the Roma were already leaving India. Drawing directly upon Sanskrit to supplement Romani lexicon has also been suggested, but the question then arises, would such words remain in their Old Indo-Aryan form, or be modified to incorporate the phonological changes which have taken place in Romani, a New Indo-Aryan language?13 Cortiade also agrees with Kochanowski (loc. cit.) who wants “to replace the technical words by the international vocabulary,” e.g. televizija, komputeri, etc., and while Kochanowski says this should be “mainly by words common to the French and English languages, of course adapting all these words to Romani phonology,” things have changed in Europe in the thirty years since he wrote that; items common in eastern European languages would now have to be considered too.

The role of Greek was fundamental to the emergence of modern Romani—its contribution to the core lexicon is second only to words of Indic origin, and it has also contributed significantly to Romani grammar; indeed, Romani itself seems only to have finally taken shape in the Byzantine Greek-speaking environment. It can therefore be seen as a legitimate part of Common Romani (i.e. Romani as it existed at the time of the aresipe, or arrival in Europe).  With the exception of Istriani Romani which has oddly just a handful, all dialects contain Greek-derived items, and thus words from that language might also be brought in to supplement a standardized, expanded Romani vocabulary. 

The first task, however, might be to scour all the recorded varieties of Romani in order to find legitimate words lost in other dialects.  For example Vlax no longer has the original word for “tree,” and uses the word for “stick” (kašt) to substitute for it, or an adoption from Slavic or Romanian; but other dialects still have the Indic word (rukh), which can therefore be brought into a constructed standard.  Likewise, while most dialects retain words for “hot” and “cold” (tato-šudro/šileno), only a few retain the original word for “warm” (tablo), which could be legitimately introduced.  Some dialects have generalized the verb phen- to mean both “say” and “tell,” where others still have phuker- for the latter.

Resources for lexical augmentation already exist in the language, but their use depends upon the richness of the dialect in question, and the fluency in it which its speakers command.  One of the first grammatical distinctions to be lost in the process of attrition is that of derived verb forms14.

c. Grammization

The creation of a standard grammar, for which Haugen, following Ferguson (1968) suggests the term grammization, should make maximum use of the original resources, retrieving them from the dialects in which they survive and bringing them together into the new standard.  Thus Kochanowski argues that more than any other, his own Baltic dialect retains all the morphological oppositions, those between the optative and the subjunctive15; between the transitive, intransitive and transitional16 verbs; between the aorist and the perfect17; between the present and the future; between the gerund and the absolutive18; and between the infinitive19 and the optative (1995: 98).

These are in fact found in other dialects too, but not one has retained all of them, or other grammatical features such as the native comparative ending {-der} (baro, bareder, “big,” “bigger”), using instead a construction calqued on European languages and using an adopted comparative particle (maj baro in Vlax, meg baro in some Central dialects).  Only a very few dialects have kept the earlier first and second person emphatic pronominal forms (maja, tuja, amaja, tumaja), which would enrich a constructed grammar.

Before a core grammar and lexicon can be abstracted and all of these scattered grammatical features codified, complete linguistic descriptions need to be made of all retrievable Romani dialects.  We still lack such a massive study, though work in this direction is in progress.  Once these data have been collected, the following suggested procedure might be followed to begin the task of codification:

1. All foreign material — lexical, phonological, syntactic, etc. — be removed from the natural dialect.
2. A comparison be made of the remaining corpus of material for each dialect.
3. Their differences identified and extracted.
4. Selection be made of the most suitable non-shared native grammatical and lexical features, to be retained.

Lexical augmentation be made to supplement the existing reconstructed vocabulary.


For political reasons, most of the work on the standardization of Romani should be undertaken by Roma, ideally native speakers of Romani, rather than by non-Roma; of the twenty-two different people who have published on Romani standardization over the past forty years, only five have been Roma, and this has kept the management of the language in mostly non-Romani hands.  The creation of a mixed, but Roma-dominant, team of specialists would redress a long-standing imbalance of representation, but until educational and social conditions improve drastically, achieving the standardization and the practical implementation of the language will continue to be the concern of just a handful of individuals, and remain largely in the academic domain.


Acković, Dragoljub (1997). Čitajte Ljudi - Ginavnen Romalen - Read People.  Belgrade: Rrominterpress.
Acton, Thomas (ed.) (1971).  Current Changes amongst British Gypsies and their Place in International Patterns of Development.  Oxford: Romanestan Publications.
Acton, Thomas (1974).  Gypsy Politics and Social Change.  London: Routledge & Kegan Paul.
Acton, Thomas (ed.) (2000).  Scholarship and the Gypsy Struggle: Commitment in Romani Studies. Hatfield: The University of Hertfordshire Press.
Anon. (1975).  Gypsy Education and Development Program: Grant Proposal.  Tacoma, Washington: Metropolitan Development Council.
Bakker, Peter (1995). “Notes on the genesis of Caló and other Iberian Para-Romani varieties.” In: Matras 1995, pp. 125-150.
Bakker, Peter (1998). “Para-Romani languages versus secret languages: Differences in origin, structure and use.” In: Matras 1998, pp. 69-96.
Bakker, Peter, and Marcel Cortiade (eds.) (1991).  In the Margin of Romani: Gypsy Languages in Contact.  University of Amsterdam: Institute for General Linguistics Publication No. 58.
Bakker, Peter, and Martin Mous (eds.) (1994).  Mixed Languages: 15 Case Studies in Language Intertwining.  Amsterdam: Institute for Functional Research into Language and Language Use.
Bernasovský, Ivan, and Jarmila Bernasovská (1999).  Anthropology of Romanies Gypsies): Auxological and Anthropogenetical Study.  Prešov: University of Prešov Minority Research Centre.
Bhalla, V. (1992).  “Ethnicity and Indian origins of gypsies of Eastern Europe and the USSR: a bio-anthropological perspective.”In: Singh 1992, pp. 323-346.
Bloch, Jules (1969).  Les Tsiganes.  Paris: Presses Universitaires de France.
Boretzky, N., and Birgit Igla (1994).  “Romani mixed dialects.” In: Bakker and Mous 1994, pp. 35-68.
Ferguson, Charles A. (1968).  “Language development.” In: Fishman, Ferguson and Das Gupta, pp. 27-35.
Fishman, Joshua, Charles Ferguson and J. Das Gupta (eds.) (1968).  Language Problems of Developing Nations.  New York: Wiley.
Fonseca, Isabel (1995).  Bury Me Standing: The Gypsies and their Journey.  New York: Random House.
Ganander, Christfrid (1780).  Undersökning om de så kallade Tattare eller Zigeuner. Stockholm: Kongl. Svenska Vitterhetsakademien.
Gilliat‑Smith, Bernie (1960). “Basic Romani?.” Journal of the Gypsy Lore Society, 3rd series, 39(1):30‑34.
Gresham, David, et al., (2001).  „Origins and divergence of the Roma (Gypsies)“, submitted to Nature.
Hancock, Ian F. (1975). Problems in the Creation of a Standard Dialect of Romanés ‑ I (Orthography). Working Papers in Sociolinguistics 25. Austin, p. 65.
Hancock, Ian F. (1977). Problems in the Creation of a Standard Dialect of Romanés ‑ II (Grammar and Lexicon). Presentation to the American Council of Teachers of Uncommonly‑taught Asian Languages, San Francisco, p. 34.
Hancock, Ian (1988a).  “Gypsies in our libraries.”  Collection Building, 8(4): 31-36.
Hancock, Ian (1988b).  “The development of Romani linguistics,” in Winter 1988, pp.183-223.
Hancock, Ian (1992).  “The roots of inequity: Romani cultural rights in their historical and social context,” In: Mayall 1992, pp. 2-17.
Hancock, Ian F. (1995).  A Handbook of Vlax Romani.  Columbus: Slavica.
Hancock, Ian (1998).  “The schooling of Romani Americans: An overview.” The Patrin Web Journal,
Hancock, Ian (2000).  “On the emergence of Romani as a koïné outside of India.” In: Acton 2000, pp. 1-13.
Hancock, Ian, Siobhan Dowd and Rajko Djurić (eds.) (1998).  Roads of the Roma. Hatfield: University of Hertfordshire Press.
Haugen, Einar (1973).  “Language planning: Commentary,” Language Planning Session of the Eighth World Congress of Sociology, Toronto, 1974. 
Kalaydjieva, Luba, David Gresham and Francese Calafell (1999).  Genetics of the Roma Gypsies).  Human Genome Project Special Report.  Perth: Cowan University Centre for Human Genetics.
Kaplan, Justin, and Anne Bernays (1997).  The Language of Names.  New York: Simon & Schuster.
Kenrick, Donald (2000).  “Inflections in flux,” Slavophilia: Slavic and East European Resources, April 5th issue.
Kochanowski, Jan (1971).  “The future of Romani.” In: Acton, 1971, pp. 76-77. Kochanowski, Vania de G. (1994).  Parlons Tsigane.  Paris: L’Harmattan.
Kochanowski, Vania de G. (1995).  “Romani language standardization.” Journal of the Gypsy Lore Society, 5(2): 97-108.
Lacková, Ilona (1999).  A False Dawn: My Life as a Gypsy Woman in Slovakia.  Hatfield: University of Hertfordshire Press.
Ledgard, Jonathan, (2001).  “Europe’s spectral nation,” The Economist, May 12th, pp. 29-32.
Marushiakova, Elena, and Vesselin Popov (2001).  Gypsies in the Ottoman Empire. Hatfield: University of Hertfordshire Press.
Mastana, Sarabjit, and Surinder Papiha (1992).  “Origin of the Romany gypsies: genetic Evidence.” Zeitschrift für Morphologische Anthropologie, 79(1): 43-51.
Matras, Y. (ed.) (1995).  Romani in Contact.  Amsterdam: Benjamins.
Matras, Y. (ed.) (1998).  The Romani Element in Non-Standard Speech.  Wiesbaden: Harrassowitz.
Mayall, David (ed.) (1992).  Gypsies: The Forming of Identities and Official Responses. Special issue of Immigrants and Minorities, 11(2).
Okely, Judith (1990).  “The invention and inventiveness of Gypsy culture.” Paper presented at the Leiden University Fund Congress conference entitled The Social Construction of Minorities and their Cultural Rights in Western Europe, Leiden.
Singh, K.S. (1992).  Ethnicity, Caste and People.  Manohar & Moscow: Institute of Ethnography.
Siváková, D. (1983).  “Estimation of the degree of assimilation of the Gypsy population based on genetic distance calculations,” Anthropologia, 28/29: 95-102.
Wexler, Paul (1997).  “Could there be a Rotwelsch origin for the Romani lexicon?” Paper presented at the Third International Conference on Romani Linguistics, Prague.
Winter, Werner, ed., 19

1. The author is himself of Romani descent, and has regularly taught the Romani language at The University of Texas since 1976, where he is Director of The Romani Archives and Documentation Center.

2. Romani dialects fall into four main groups, Northern, Central, Vlax and Balkan.  Each is further divided into smaller dialects, altogether about sixty in number.  These groups are broadly geographical, but speakers from each are found widely dispersed outside of the original areas.   For various classifications of these see Hancock, 1988b.

3. Aspiration, or the accompaniment of a speech sound by a puff of air, can be phonemic, which is to say essential to the actual meaning of a word. Thus in Romani, per, without a puff of air following the /p/, means “fall,” while pher, which does have it, means “fill”.

4. Voicing is the quality in speech production achieved by allowing the vocal cords to vibrate during the articulation of a sound. This can be heard by making a prolonged hissing “ssss” sound (which doesn’t have voice) and comparing it with a prolonged buzzing “zzzz” sound (which does).

5. Aspiration, or the accompaniment of a speech sound by a puff of air, can be phonemic, which is to say essential to the actual meaning of a word.  Thus in Romani, per, without a puff of air following the /p/, means “fall,” while pher, which does have it, means “fill”. 

6. A calque is the direct translation of an idiom or of the meaning of a word from one language into another language.  An example would be the use of papel (“paper”) in Spanish to mean “newspaper” because “paper” can mean “newspaper” in English.  In Spanish, papel is only the material to write on, while a newspaper is a diario or a gaceta.  American Romanies will sometimes say sar de phuro san?, literally “how old are you?” copying the English phrase, instead of sodengo san (“of how many [years] are you”) in the Romani way.   Calquing is common among bilinguals and second-language learners.

7. Dental sounds are made by bringing the front of the tongue against the back of the top teeth in their articulation, as in the French pronunciation of tu or de.

8. Uvular sounds are made by vibrating the uvula—the piece of flesh that hangs down in the back of the throat.  The common French pronunciation of restaurant contains this sound.

9. In that language, it represents a kind of /r/ made with the tongue curled back (an “apico-postalveolar median fricative” in phonetic terms.

10. Palatalization is made by bringing the blade of the tongue up towards the roof of the mouth in the articulation of a sound, so that (for example) /k/ will sound like /ky/ (‘cute’, as opposed to ‘coot’).

11. Retroflex sounds are made by curling the tongue back towards the roof of the mouth while articulating the consonant.  They are common in Indian languages, although the Indian retroflex sounds are not shared by Romani.

12. An affricated sound consists of a stop consonant (i.e. one in which the air-stream from the lungs is completely stopped at some point, such as /t/ or /d/, followed immediately by the correspondingly-placed fricative sound, i.e. one where the air-stream is constricted but not stopped, made in the same place in the mouth, such as “sh” ([•]), giving a double sound, thus “t” + “sh” ([t] + [•]) makes the fricative [t•], “ch”.

13. Thus would a neologism, say for “fine hair, down”, keep the shape bhãva as in Sanskrit, or be rephonologized to a form such as *phum in Romani in accordance with the corresponding sound changes?

14. Causatives (verb forms expressing causation, e.g. “to cause to fall” = “drop”), inchoatives (verb forms expressing “becoming”, e.g. “to become red” = “redden”) and passives can produce such extended lexical entries as puterdjol “it becomes undone, from putrel “it opens (something),” ankerdjol “it calms down,” from ankerel “it holds,” bisterdjol “it becomes obsolete,” from bisterel “it forgets,” dikhjol “it seems, it is regarded,” from dikhel “it sees,” hamisavel “it is involved,” from hamol “it mixes,” prindžardjol “it introduces,” from prindžarel “it recognizes,” and so on.  Use of thematic constructions instead of adopted foreign forms: dav anglal instead of atvetiv for “I answer,”sosko or savestar instead of če fjalo for “what kind of,” sa gado plus the noun instead of adapat, mizmo or isto to express “same,” maškar koleste for “meanwhile” instead of dotle or intratimpu.  Extended meanings using productive suffixes, such as vazdari “elevator” (from vazd- “lift”), cirdari “drawer” (from cird- “pull”), avrjaluno “outer” (from avri “out”), or avutno “next” (from av- “come”).  Semantic distinctions between words sharing the same root could similarly be made using different productive suffixes: vučipe “elevation,” vučimos “altitude,” or by lexical selection: shajipe “possibility,” dastipe “ability.”  The augmentative possibilities of metaphor are limitless: e.g. drakhin (“grapevine”) for “internet,” or čhiriklorro (“little bird”) for the wedge accent.

15. Optative is a verbal mood expressing the intention or desire to perform an action; the subjunctive mood indicates that the action is hypothetical, or dependent upon a previous action.

16. Transitive verbs are verbs that can govern a noun object, such as “lay” or “set”, while an intransitive verb cannot, e.g. “lie” or “sit”.  Transitional verbs can do either, e.g. “smell” or “wash.”

17. The aorist  is a past tense in which the time of the action is not specific; the perfect is also a past tense, but expresses a single, completed action .

18. A gerund  is a noun formed from a verb, such as “they’re doing the washing”.  An absolutive is the basic, underived form of a noun or a verb: “they’re doing the wash”.

19. The infinitive is the form of the verb which has no subject: “to run” as opposed to the indicative “I run”